Suffix processing and hyphen searches

Zeus
Posts: 31
Joined: Thu Jul 29, 2004 5:12 pm

Suffix processing and hyphen searches

Post by Zeus »

Hi,
The following search with suffixproc='Off' returns a hit.

"set alpostproc='on';set hyphenphrase='off';set allinear='on';set wordc='[\alnum\X27]';set langc='[\alnum\X27 \-\.]';set defsuffrm=1;set minwordlen=5;set suffixproc='Off';select DOCID,ADDRESSEE from tblnewx where ADDRESSEE like '(jerry-smith,scooby)'"

But, the following with suffix proc 'On' does not return the hit.

"set alpostproc='on';set hyphenphrase='off';set allinear='on';set wordc='[\alnum\X27]';set langc='[\alnum\X27 \-\.]';set defsuffrm=1;set minwordlen=5;set suffixproc='On';select DOCID,ADDRESSEE from tblnewx where ADDRESSEE like '(jerry-smith,scooby)'"

Also, taking the hyphen out of the langc also gets the hit back.

any reasons as to why?

Thanks!!
Zeus
Posts: 31
Joined: Thu Jul 29, 2004 5:12 pm

Suffix processing and hyphen searches

Post by Zeus »

Oh, BTW,
we are looking for jerry-smith as one word. scooby is just a nonmatching term.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Suffix processing and hyphen searches

Post by mark »

I don't replicate that. What's in the text "jerry smith" or some variation? What version of texis?

I do get more hits when removing - from langc though because that will allow substring matching of search terms containing -.
Zeus
Posts: 31
Joined: Thu Jul 29, 2004 5:12 pm

Suffix processing and hyphen searches

Post by Zeus »

The actual text is,
jerry-smith
not a variation of it.

The version of tsql is
Texis Version 04.04.1067366033(20031028) Copyright (c) 1988-2003 Thunderstone EPI

yes, taking the hyphen out of langc returns the hit. But with hyphen in langc, no hit is returned.

Also, our index expressions are,
<apicp keepnoise on>
<sql "set delexp=0"></sql>
<sql "set addexp='\punct{1,5}'"></sql>
<sql "set addexp='\alnum{1,99}'"></sql>
<sql "set addexp='>>\alpha{1,50},=\alpha{1,50}'"></sql>
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

Suffix processing and hyphen searches

Post by John »

The problem is that you have punctuation indexed as a separate word, so jerry-smith looks like three words in the index, instead of the single term you are looking for. Instead of \punct{1,5} you probably want [\alnum\punct]{1,99}, or [\alnum\-]{1,99}.
John Turnbull
Thunderstone Software
Zeus
Posts: 31
Joined: Thu Jul 29, 2004 5:12 pm

Suffix processing and hyphen searches

Post by Zeus »

What if we take hyphen out of the langc? That seems to fix it also.
or, to summarize, what is the sure way to search for literal hyphens between terms and the end of terms, for example,
searching for

jerry-smith

AN-

thanks!!
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

Suffix processing and hyphen searches

Post by John »

If you take the hyphen out of langc then the presence of the hyphen disables any language processing, and it is treated as a literal substring search.

Changing the index expression to include the hyphen would improve the performance of the search with either langc setting as it could find the term directly in the index.
John Turnbull
Thunderstone Software
Zeus
Posts: 31
Joined: Thu Jul 29, 2004 5:12 pm

Suffix processing and hyphen searches

Post by Zeus »

We did go down the path of having [\alnum\punct]{1,99} in our index expression. But, we had the other problem,

if, we had data,
zeus@mycompany.com;achilles@mycompany.com;atlas@mycompany.com

the above index expression,
prohibits searching for

FIELD1 like 'achilles@mycompany.com'

that is why we did just \punct{1,5}, which fixed this problem.

Looks like, if you have one you cannot have the other.

just punctuations solves the above problem, but breaks jerry-smith search.
alnum & puntuations breaks the above problem, but solves the jerry-smith search.
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

Suffix processing and hyphen searches

Post by John »

You may need to be more selective about which puctuation you want to include in the index expression, and which you want to consider as part of a term you want to search for. You can have several index expressions, e.g.

\alnum{1,99}
\alnum=[\-\alnum]{1,99}
\alnum=[\-\.\alnum]{1,99}
\alnum=[\-\.@\alnum]{1,99}

Which should allow you to search for "Jerry", "Jerry-Smith", "mycompany.com" or "zeus@mycompany.com"
John Turnbull
Thunderstone Software
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Suffix processing and hyphen searches

Post by mark »

Zeus, out of curiosity why index punctuation by itself at all?
Post Reply