webinator question

Post Reply
User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

webinator question

Post by Thunderstone »




When indexing a site, how does the index engine determine what a sentence, line, paragraph and so forth are. In other words, does the document need to have particular tags in order to delimit particular areas.


Your response would be appreciated.



User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

webinator question

Post by Thunderstone »




`within' processing is handled by REX expressions (see
http://www.thunderstone.com/doc/rex.html for details on REX) that
match the start/end of sentences, lines. etc. These expressions match
formatted text, not HTML tags, so they will work with plain text
documents as well. The `within' expressions are set with the Vortex
<apicp sdexp/edexp> function in the Webinator search script, based on
the appropriate form variable (see
http://www.thunderstone.com/vortexman/node85.html).
If permitted via <apicp alwithin>, they can also be set directly in
the query with the "w/" operator, which understands some shorthands
such as "w/para" for within paragraph.

-Kai


User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

webinator question

Post by Thunderstone »



I did a search using proximity and I placed two words in the search text box
"feet and railway". I wanted the search to retrieve only those two words in
the same sentence. So i set the promixity to sentence.
When viewing the query hit list by -Match Info- I find that it retrieved
and highlighted those two words located everywhere on the document including
the sentence that I wanted. How can I only get the sentence to appear
highlighted that I wanted and not every match word?



-----Original Message-----
From: Kai Getrost <kai@thunderstone.com>
To: ddesumma@sprint.ca <ddesumma@sprint.ca>
Date: Tuesday, December 15, 1998 11:47 AM
Subject: Re: webinator question





User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

webinator question

Post by Thunderstone »




Hit markup is controlled with the <fmtcp> statement in the Webinator
2.5 search script.
By default, the query to <fmtcp> is automatically processed to help
ensure that all words are marked up. Otherwise, the second and later
hits in the document might be unmarked, eg. for within-document, because
they weren't part of the within-block.
To change this behavior to only markup terms as they matched in the
query (including within-processing), add an `e' flag to the <fmtcp>
query. In the <A NAME=context> function in the search script, change:

<fmtcp query "%mhbpH">

to
<fmtcp query "%mhbpeH">

See http://www.thunderstone.com/vortexman/node78.html for details
on Metamorph hit markup, and node80.html for the <fmtcp> statement.

-Kai


Post Reply