Limiting subject of indexed pages

Post Reply
User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

Limiting subject of indexed pages

Post by Thunderstone »



We have been searching for a software package that would allow us to
implement a ceramic specific search engine. After getting a basic
installation up and running, I have not found any way to limit the indexing
of pages to those pages that contain 'word1' or 'word2' or 'word3' or
'word4'. We would like to have gw only index pages that appear relevant to
our target audience. By having a group of 8-10 words that are industry
specific, the index would have a greater signal to noise ratio - we hope.
Is there a way through SQL statements to delete pages that don't contain at
least one of a group of words ? Is there a way to have the gw crawler not
index pages if it doesn't see at least one word of a group of words in the
page ?

Paul



User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

Limiting subject of indexed pages

Post by Thunderstone »




You'll have to do this after you are done walking the site, but
the sql goes something like this:

gw -s "delete from html where Body not like '(worda,wordb,wordc...)'"


Post Reply