Suffix processing and hyphen searches

Post Reply
Zeus
Posts: 31
Joined: Thu Jul 29, 2004 5:12 pm

Suffix processing and hyphen searches

Post by Zeus »

Hi,
The following search with suffixproc='Off' returns a hit.

"set alpostproc='on';set hyphenphrase='off';set allinear='on';set wordc='[\alnum\X27]';set langc='[\alnum\X27 \-\.]';set defsuffrm=1;set minwordlen=5;set suffixproc='Off';select DOCID,ADDRESSEE from tblnewx where ADDRESSEE like '(jerry-smith,scooby)'"

But, the following with suffix proc 'On' does not return the hit.

"set alpostproc='on';set hyphenphrase='off';set allinear='on';set wordc='[\alnum\X27]';set langc='[\alnum\X27 \-\.]';set defsuffrm=1;set minwordlen=5;set suffixproc='On';select DOCID,ADDRESSEE from tblnewx where ADDRESSEE like '(jerry-smith,scooby)'"

Also, taking the hyphen out of the langc also gets the hit back.

any reasons as to why?

Thanks!!
Zeus
Posts: 31
Joined: Thu Jul 29, 2004 5:12 pm

Suffix processing and hyphen searches

Post by Zeus »

Oh, BTW,
we are looking for jerry-smith as one word. scooby is just a nonmatching term.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Suffix processing and hyphen searches

Post by mark »

I don't replicate that. What's in the text "jerry smith" or some variation? What version of texis?

I do get more hits when removing - from langc though because that will allow substring matching of search terms containing -.
Zeus
Posts: 31
Joined: Thu Jul 29, 2004 5:12 pm

Suffix processing and hyphen searches

Post by Zeus »

The actual text is,
jerry-smith
not a variation of it.

The version of tsql is
Texis Version 04.04.1067366033(20031028) Copyright (c) 1988-2003 Thunderstone EPI

yes, taking the hyphen out of langc returns the hit. But with hyphen in langc, no hit is returned.

Also, our index expressions are,
<apicp keepnoise on>
<sql "set delexp=0"></sql>
<sql "set addexp='\punct{1,5}'"></sql>
<sql "set addexp='\alnum{1,99}'"></sql>
<sql "set addexp='>>\alpha{1,50},=\alpha{1,50}'"></sql>
Zeus
Posts: 31
Joined: Thu Jul 29, 2004 5:12 pm

Suffix processing and hyphen searches

Post by Zeus »

What if we take hyphen out of the langc? That seems to fix it also.
or, to summarize, what is the sure way to search for literal hyphens between terms and the end of terms, for example,
searching for

jerry-smith

AN-

thanks!!
Zeus
Posts: 31
Joined: Thu Jul 29, 2004 5:12 pm

Suffix processing and hyphen searches

Post by Zeus »

We did go down the path of having [\alnum\punct]{1,99} in our index expression. But, we had the other problem,

if, we had data,
zeus@mycompany.com;achilles@mycompany.com;atlas@mycompany.com

the above index expression,
prohibits searching for

FIELD1 like 'achilles@mycompany.com'

that is why we did just \punct{1,5}, which fixed this problem.

Looks like, if you have one you cannot have the other.

just punctuations solves the above problem, but breaks jerry-smith search.
alnum & puntuations breaks the above problem, but solves the jerry-smith search.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Suffix processing and hyphen searches

Post by mark »

Zeus, out of curiosity why index punctuation by itself at all?
Zeus
Posts: 31
Joined: Thu Jul 29, 2004 5:12 pm

Suffix processing and hyphen searches

Post by Zeus »

User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Suffix processing and hyphen searches

Post by mark »

If there's special embedded punctuation you want to search include just that in your index expressions, as John suggested above. Otherwise, don't index punctuation at all.
Post Reply