Page 1 of 2

Suffix processing and hyphen searches

Posted: Mon Nov 22, 2004 4:14 pm
by Zeus
Hi,
The following search with suffixproc='Off' returns a hit.

"set alpostproc='on';set hyphenphrase='off';set allinear='on';set wordc='[\alnum\X27]';set langc='[\alnum\X27 \-\.]';set defsuffrm=1;set minwordlen=5;set suffixproc='Off';select DOCID,ADDRESSEE from tblnewx where ADDRESSEE like '(jerry-smith,scooby)'"

But, the following with suffix proc 'On' does not return the hit.

"set alpostproc='on';set hyphenphrase='off';set allinear='on';set wordc='[\alnum\X27]';set langc='[\alnum\X27 \-\.]';set defsuffrm=1;set minwordlen=5;set suffixproc='On';select DOCID,ADDRESSEE from tblnewx where ADDRESSEE like '(jerry-smith,scooby)'"

Also, taking the hyphen out of the langc also gets the hit back.

any reasons as to why?

Thanks!!

Suffix processing and hyphen searches

Posted: Mon Nov 22, 2004 4:15 pm
by Zeus
Oh, BTW,
we are looking for jerry-smith as one word. scooby is just a nonmatching term.

Suffix processing and hyphen searches

Posted: Mon Nov 22, 2004 5:26 pm
by mark
I don't replicate that. What's in the text "jerry smith" or some variation? What version of texis?

I do get more hits when removing - from langc though because that will allow substring matching of search terms containing -.

Suffix processing and hyphen searches

Posted: Mon Nov 22, 2004 5:31 pm
by Zeus
The actual text is,
jerry-smith
not a variation of it.

The version of tsql is
Texis Version 04.04.1067366033(20031028) Copyright (c) 1988-2003 Thunderstone EPI

yes, taking the hyphen out of langc returns the hit. But with hyphen in langc, no hit is returned.

Also, our index expressions are,
<apicp keepnoise on>
<sql "set delexp=0"></sql>
<sql "set addexp='\punct{1,5}'"></sql>
<sql "set addexp='\alnum{1,99}'"></sql>
<sql "set addexp='>>\alpha{1,50},=\alpha{1,50}'"></sql>

Suffix processing and hyphen searches

Posted: Tue Nov 23, 2004 2:45 pm
by John
The problem is that you have punctuation indexed as a separate word, so jerry-smith looks like three words in the index, instead of the single term you are looking for. Instead of \punct{1,5} you probably want [\alnum\punct]{1,99}, or [\alnum\-]{1,99}.

Suffix processing and hyphen searches

Posted: Tue Nov 23, 2004 6:39 pm
by Zeus
What if we take hyphen out of the langc? That seems to fix it also.
or, to summarize, what is the sure way to search for literal hyphens between terms and the end of terms, for example,
searching for

jerry-smith

AN-

thanks!!

Suffix processing and hyphen searches

Posted: Wed Nov 24, 2004 9:45 am
by John
If you take the hyphen out of langc then the presence of the hyphen disables any language processing, and it is treated as a literal substring search.

Changing the index expression to include the hyphen would improve the performance of the search with either langc setting as it could find the term directly in the index.

Suffix processing and hyphen searches

Posted: Mon Nov 29, 2004 4:34 pm
by Zeus
We did go down the path of having [\alnum\punct]{1,99} in our index expression. But, we had the other problem,

if, we had data,
zeus@mycompany.com;achilles@mycompany.com;atlas@mycompany.com

the above index expression,
prohibits searching for

FIELD1 like 'achilles@mycompany.com'

that is why we did just \punct{1,5}, which fixed this problem.

Looks like, if you have one you cannot have the other.

just punctuations solves the above problem, but breaks jerry-smith search.
alnum & puntuations breaks the above problem, but solves the jerry-smith search.

Suffix processing and hyphen searches

Posted: Mon Nov 29, 2004 5:08 pm
by John
You may need to be more selective about which puctuation you want to include in the index expression, and which you want to consider as part of a term you want to search for. You can have several index expressions, e.g.

\alnum{1,99}
\alnum=[\-\alnum]{1,99}
\alnum=[\-\.\alnum]{1,99}
\alnum=[\-\.@\alnum]{1,99}

Which should allow you to search for "Jerry", "Jerry-Smith", "mycompany.com" or "zeus@mycompany.com"

Suffix processing and hyphen searches

Posted: Mon Nov 29, 2004 5:12 pm
by mark
Zeus, out of curiosity why index punctuation by itself at all?