Page 1 of 1

Potential Search Malfunction

Posted: Thu Sep 11, 2003 6:36 pm
by mmcfadden
I have a web-site for now at http://demo.mtc-inc.com and if I perform the search:

http://demo.mtc-inc.com/cgi-bin/texis.e ... d=Advanced

I get 198 results.

If I then change to a search with any word forms: http://demo.mtc-inc.com/cgi-bin/texis.e ... &rlead=500

I get a "No documents match the query".

Is there any logical explanation of why I could ever get results in an Exact match search and none in an Any words search. I would normally expect to get more results in an any words search than in an exact match search.

Also another question I have Why is it that sometimes I will do a search with two words and get less results then when I put those words in parenthesis. I would normally expect to get less results with the same search in parenthesis. I am just a little confused at what I am experiencing. Thanks

Potential Search Malfunction

Posted: Fri Sep 12, 2003 1:07 pm
by Kai
The "Any word forms" turns on suffix processing. Currently, with a phrase query, the suffix processing can be too aggressive (due to how minwordlen is interpreted), and in this case causes the word `assistance' to be (incorrectly) removed from the search, causing it to fail.

We're working on a fix for this for a future release. In the meantime, the only workaround is to increase minwordlen in the script if the query is a phrase; in particular, increase it by the length of the phrase before the last term (eg. by the length of `financial ' in this case).

The following untested code might help. In the search script, in the <a name=fpar> function, around line 386, change this:

<case 2><apicp suffixproc 1>
<apicp minwordlen 5>
</switch>

to this:

<case 2><apicp suffixproc 1>
<apicp minwordlen 5>
<rex "\x22" $query>
<if "" neq $ret> <!-- it's a phrase -->
<rex "[^\x22]*\space+\F[^\space]+\x22\space*>>=" $query>
<strlen $ret> <!-- length of query before last term -->
<sum "%d" $ret 5><!-- add to original len 5 -->
<apicp minwordlen $ret>
</if>
</switch>

It's not foolproof, but should work for phrase queries with suffix processing on.

As for your other question, all sets are required when searching, but when you use parens, you're creating a single set with two terms (eg. `(word1,word)') instead of two sets with one term each (eg. `word1 word2'). All words in a set are ORed, so the paren search yields more results.

Potential Search Malfunction

Posted: Fri Sep 12, 2003 2:50 pm
by mmcfadden
Let me give an example off my web-site of my second question. The first search is on "reasonable accommodation" in quotes with 199 results and the second is with out quotes of the same term with 275 results. The following are the urls to these two searches. My opinion is that if the system is working properly that you will always get less results on a phrase in quotes then the same phrase without quotes. Right?

http://demo.mtc-inc.com/cgi-bin/texis.e ... d=Advanced

http://demo.mtc-inc.com/cgi-bin/texis.e ... d=Advanced

Potential Search Malfunction

Posted: Fri Sep 12, 2003 5:56 pm
by Kai
(I presume you mean the other way around, ie. the phrase search unexpectedly returns more results than the non-phrase search.)

What might be happening is that a paragraph delimiter is occuring between the two words in some documents. A paragraph delimiter in Webinator is essentially two newlines with some optional horizontal whitespace between them. For the phrase search, this is ignored, as any amount of unindexed chars can occur between two phrase words when there is no post-processing needed (as is the case for the phrase search since it's single-set). But the non-phrase search has two sets and requires post-processing for the delimiters, so the paragraph delimiters are (post-process) searched for and might be excluding some documents if it occurs between the words.

Potential Search Malfunction

Posted: Thu Jul 08, 2004 6:43 pm
by gazim
Kai, does the June 2004 release include the fix that you mentioned above related to suffix processing with a phrase query? Thanks!

Potential Search Malfunction

Posted: Fri Jul 09, 2004 10:11 am
by Kai
Texis/Webinator version 5 builds after June 17 2004 include a change in behavior for minwordlen with phrases. The new <apicp> setting phrasewordproc controls this; details are in the Vortex manual. The default is now to process (and count towards minwordlen) only the last word in a phrase, so the disappearing-word issue above will not happen with a reasonable minwordlen (and it does not need to be changed for a phrase).