Page 1 of 1

Weighting PDF Documents

Posted: Thu Sep 29, 2022 9:23 am
by beaudoind

I was wondering if anyone had any suggestions on how to make PDF documents appear lower in the search results? Sometimes we have PDF documents with too many keywords and they over shadow the web pages in the search results.

Thanks for any help.

Re: Weighting PDF Documents

Posted: Fri Sep 30, 2022 9:47 am
by John
There are a few options:
  • If the problem is that the PDF files are larger, and contain more occurrences of the search terms you could reduce or disable "Document Frequency" from the relevance calculation on the search settings page. You might also want to increase "Position In Text", which will increase the importance of the terms being at the beginning of the document instead of deep into a long document.
  • With the parametric appliance you can create an integer field to use for rank bias. That can be populated with "Data From Field", so you could either bias down PDF documents, or bias up HTML documents. That would also let you add additional biases that can be determined from the content.
  • Depending how the crawl is setup it might be possible to partition the PDF documents, or more "reference" materials into a separate profile, and use a meta search profile to combine different profiles with different weights.