Potential Search Malfunction

Post Reply
mmcfadden
Posts: 158
Joined: Tue May 20, 2003 2:17 pm

Potential Search Malfunction

Post by mmcfadden »

I have a web-site for now at http://demo.mtc-inc.com and if I perform the search:

http://demo.mtc-inc.com/cgi-bin/texis.e ... d=Advanced

I get 198 results.

If I then change to a search with any word forms: http://demo.mtc-inc.com/cgi-bin/texis.e ... &rlead=500

I get a "No documents match the query".

Is there any logical explanation of why I could ever get results in an Exact match search and none in an Any words search. I would normally expect to get more results in an any words search than in an exact match search.

Also another question I have Why is it that sometimes I will do a search with two words and get less results then when I put those words in parenthesis. I would normally expect to get less results with the same search in parenthesis. I am just a little confused at what I am experiencing. Thanks
User avatar
Kai
Site Admin
Posts: 1272
Joined: Tue Apr 25, 2000 1:27 pm

Potential Search Malfunction

Post by Kai »

The "Any word forms" turns on suffix processing. Currently, with a phrase query, the suffix processing can be too aggressive (due to how minwordlen is interpreted), and in this case causes the word `assistance' to be (incorrectly) removed from the search, causing it to fail.

We're working on a fix for this for a future release. In the meantime, the only workaround is to increase minwordlen in the script if the query is a phrase; in particular, increase it by the length of the phrase before the last term (eg. by the length of `financial ' in this case).

The following untested code might help. In the search script, in the <a name=fpar> function, around line 386, change this:

<case 2><apicp suffixproc 1>
<apicp minwordlen 5>
</switch>

to this:

<case 2><apicp suffixproc 1>
<apicp minwordlen 5>
<rex "\x22" $query>
<if "" neq $ret> <!-- it's a phrase -->
<rex "[^\x22]*\space+\F[^\space]+\x22\space*>>=" $query>
<strlen $ret> <!-- length of query before last term -->
<sum "%d" $ret 5><!-- add to original len 5 -->
<apicp minwordlen $ret>
</if>
</switch>

It's not foolproof, but should work for phrase queries with suffix processing on.

As for your other question, all sets are required when searching, but when you use parens, you're creating a single set with two terms (eg. `(word1,word)') instead of two sets with one term each (eg. `word1 word2'). All words in a set are ORed, so the paren search yields more results.
mmcfadden
Posts: 158
Joined: Tue May 20, 2003 2:17 pm

Potential Search Malfunction

Post by mmcfadden »

Let me give an example off my web-site of my second question. The first search is on "reasonable accommodation" in quotes with 199 results and the second is with out quotes of the same term with 275 results. The following are the urls to these two searches. My opinion is that if the system is working properly that you will always get less results on a phrase in quotes then the same phrase without quotes. Right?

http://demo.mtc-inc.com/cgi-bin/texis.e ... d=Advanced

http://demo.mtc-inc.com/cgi-bin/texis.e ... d=Advanced
User avatar
Kai
Site Admin
Posts: 1272
Joined: Tue Apr 25, 2000 1:27 pm

Potential Search Malfunction

Post by Kai »

(I presume you mean the other way around, ie. the phrase search unexpectedly returns more results than the non-phrase search.)

What might be happening is that a paragraph delimiter is occuring between the two words in some documents. A paragraph delimiter in Webinator is essentially two newlines with some optional horizontal whitespace between them. For the phrase search, this is ignored, as any amount of unindexed chars can occur between two phrase words when there is no post-processing needed (as is the case for the phrase search since it's single-set). But the non-phrase search has two sets and requires post-processing for the delimiters, so the paragraph delimiters are (post-process) searched for and might be excluding some documents if it occurs between the words.
gazim
Posts: 66
Joined: Sun Feb 18, 2001 1:01 pm

Potential Search Malfunction

Post by gazim »

Kai, does the June 2004 release include the fix that you mentioned above related to suffix processing with a phrase query? Thanks!
User avatar
Kai
Site Admin
Posts: 1272
Joined: Tue Apr 25, 2000 1:27 pm

Potential Search Malfunction

Post by Kai »

Texis/Webinator version 5 builds after June 17 2004 include a change in behavior for minwordlen with phrases. The new <apicp> setting phrasewordproc controls this; details are in the Vortex manual. The default is now to process (and count towards minwordlen) only the last word in a phrase, so the disappearing-word issue above will not happen with a reasonable minwordlen (and it does not need to be changed for a phrase).
Post Reply