Weighting PDF Documents

Post Reply
beaudoind
Posts: 1
Joined: Mon Jan 31, 2022 10:26 am

Weighting PDF Documents

Post by beaudoind »

Hello!

I was wondering if anyone had any suggestions on how to make PDF documents appear lower in the search results? Sometimes we have PDF documents with too many keywords and they over shadow the web pages in the search results.

Thanks for any help.
User avatar
John
Site Admin
Posts: 2597
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

Re: Weighting PDF Documents

Post by John »

There are a few options:
  • If the problem is that the PDF files are larger, and contain more occurrences of the search terms you could reduce or disable "Document Frequency" from the relevance calculation on the search settings page. You might also want to increase "Position In Text", which will increase the importance of the terms being at the beginning of the document instead of deep into a long document.
  • With the parametric appliance you can create an integer field to use for rank bias. That can be populated with "Data From Field", so you could either bias down PDF documents, or bias up HTML documents. That would also let you add additional biases that can be determined from the content.
  • Depending how the crawl is setup it might be possible to partition the PDF documents, or more "reference" materials into a separate profile, and use a meta search profile to combine different profiles with different weights.
John Turnbull
Thunderstone Software
Post Reply