PDF and DOC have higher ranking than html pages

Post Reply
thomas.amsler
Posts: 18
Joined: Mon Nov 19, 2007 6:17 am

PDF and DOC have higher ranking than html pages

Post by thomas.amsler »

We have problems with the search results, because the first few pages are only pdfs and docs instead of a html page. We have set the index fields to URL, Title, Keywords, Body, Meta, Description. The PDF has the searched word only in its body whereas the html page, we expect to be at the top or at least the first result page, has the word in all fields.
Example:
go to http://www.sgs.com and enter "fumigation" (without quotes) in the search field in the upper left corner and click GO.
The first result (ignore "SGS premium results" as this is a special functionality to promote stuff and is not connected to the webinator search) is a financial report from 2001, which is not very useful
Now select "Agriculture" in the "Scope" drop down on the search result form and click GO again
Then you have the page we would like to have on top as well without filtering the scope "Corporate Site - Fumigation by SGS" (The PDFs do not have this scope indexed and thus only html pages are returned):
http://www.sgs.com/fumigation?catId=824 ... pe=segment

Result No. 3 has the word "fumigation" only in its meta description, so I don't see why the PDFs get such a high ranking as they don't have meta tags at all. Any ideas on this matter?
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

PDF and DOC have higher ranking than html pages

Post by mark »

The pages are ranked by the specified fields only. Their file type doesn't matter.

Using the stock search script go to the advanced search form and adjust the rank knobs until the search works the way you want. You're probably mostly interested in the table and document frequency knobs. When you settle on a group of settings go to search settings and make those settings the default for that profile.
thomas.amsler
Posts: 18
Joined: Mon Nov 19, 2007 6:17 am

PDF and DOC have higher ranking than html pages

Post by thomas.amsler »

Ok, I think the problem is our search script which was tooken over from version 4.X
I'll have to take the custom parts from that file and put them into the Version 5 script. I mainly need to get rid of all the html output and replace the result with an xml syntax. If I still have unwanted results I will get back to you on how I can change the ranking.
thomas.amsler
Posts: 18
Joined: Mon Nov 19, 2007 6:17 am

PDF and DOC have higher ranking than html pages

Post by thomas.amsler »

After spending half a day to figure out how the new script works and queries the db, I managed to make the necessary changes and now the results are as expected.
Post Reply