Excluding PORTIONS of a document

Post Reply
User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

Excluding PORTIONS of a document

Post by Thunderstone »



Is it possible to exclude portions of an HTML document from indexing.
Example: I would like Webinator to ignore all the info in my text based
menus, navigation bars and footers (all asp server side includes
[*.inc], BTW) because they appear on virtually every page and contain
words that would typically be the subjects of a users search and a
search on one of those words (We are a hospital. Search word might be
"heart".) would return almost every page in the site.


User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

Excluding PORTIONS of a document

Post by Thunderstone »




There's a few ways to do this, but all involve involve putzing.

One is to use a post operation in SQL to delete the offending stuff:

gw -s "update html set Body = Body -'delete me please'"

Another way is to use a custom walker and edit the content out
before it's placed in the database. The Texis Webscript source code
for this is available at http://www.Thunderstone.com/demos/dowalk

run it by typing "texis top=http://www.somesite.com dowalk"

You may edit the source code for this to do anything you want.

The third way is to provide an alternative view to the walker when it
looks at your site. This is webserver specific so we cant comment
on how to do it exactly. Basically, look at the AGENT type and if its
the Webinator, give it the different look and feel.





Post Reply