Proximate Causes

Post Reply
User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

Proximate Causes

Post by Thunderstone »




Thanks for such a great product.

I have a few questions about proximity:

I used the Webinator 2.5 interface to search Thunderstone for the phrases
'Meta The selected meta data from the page', and
'Depth The number of URLs traversed to reach the page'.

Each of these is a separate line separated by a carriage return/new line
at
http://www.thunderstone.com/gw25man/node15.html

Nevertheless, when I search for the phrases at the same time and append
'w/line' to the query this page is returned even though all these words
are not on the same line. What gives? Is this because a ranked search
will not limit itself strictly to what is requested? How might the query
be successfully changed?

A few more Q's:
1) What exactly does webinator use to demarcate lines (carriage return/new
lines?), sentences (number of characters?) & paragraphs (number of
characters?)?
2) Can these be manipulated by <apicp sdexp/edexp>?
3) Can the '-k' indexing option be used to allow the indexing of tags
elements within the body?

Thanks for any help & keep up the good work.





User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

Proximate Causes

Post by Thunderstone »




Several issues:

1) Use the Options button to do within-line. By default, the
"w/" operator is disabled for speed and sanity, since it isn't needed
for within-page. Thus, your query with "w/line" instead of
Options-Proximity-line was ignoring the "w/line". (Look in
the HTML source of the search results and you'll see a warning
HTML comment to this effect.) This can be changed in a Vortex
script.

2) Those phrases may be on the same line in a browser, but
Webinator indexes HTML according to its own formatter, which
is geared towards text indexing not display. Check the Match Info
link on a search hit to see how the two differ sometimes.

3) Are you using the correct phrase syntax? Phrases are delimited
with double-quotes, not single-quotes, in Metamorph.


REX expressions. "line", "sent", "para", and "page", when used with
the within operator (w/), are shorthand for some pre-defined expressions
to match the appropriate structure. See the Webinator search script and
http://www.thunderstone.com/texisman/node169.html for these, and
http://www.thunderstone.com/vortexman/node92.html for more on REX expressions.


Yes. See http://www.thunderstone.com/vortexman/node85.html.


No. Since Webinator stores and indexes the formatted text (what you
see in the Match Info page), not the raw HTML, HTML tags are already stripped
in the database.

-Kai



Post Reply