Page 1 of 1
Meaning of period in queries
Posted: Mon Jul 17, 2000 11:42 am
by Thunderstone
One of our users asked about the meaning of the "." character. He is
not using a regular expression.
Here's an example of his confusion:
This searches for "expression" in webinator documenation:
http://www.thunderstone.com/texis/webin ... expression
There are 122 results.
This searches for "expression pattern"
http://www.thunderstone.com/texis/webin ... on+pattern
There are 43 results
But this searches for "expression .pattern"
http://www.thunderstone.com/texis/webin ... mit=Submit
All 122 results are returned. What's the meaning of the "."?
Where is this documented? I couldn't explain it to our users.
Thanks
Meaning of period in queries
Posted: Wed Jul 19, 2000 11:07 am
by Thunderstone
A period isn't significant (except at the very end
of a query where it may be stripped). However, the default index
expression only indexes alphanumeric characters, so ".pattern" is
not a word in the index. This means Texis must do a post-process
search for it; this is potentially slow so Vortex does not allow
it by default (check the HTML source for commented errors like
"Query would require post-processing"). So the unindexed ".pattern"
is dropped from the query, hence the same # of results as "expression"
alone. Pattern matchers like REX or NPM have similar constraints.
There are two ways to fix this. Either turn on post-processing
with <apicp alpostproc on>, which will slow queries and still not be
able to report the total # of results accurately, or modify the
index expression to include words like ".pattern" with the -k option
to gw and re-index:
gw -unindex
gw -k"\alnum{2,99}" -k"[\alnum\punct]+>>\alnum{1,99}" -index
-Kai