Eliminating Punctuation

kzinda · Post by **kzinda** » Fri Nov 30, 2001 7:26 am

I had an application developed for me where I pass search terms to TEXIS for retrieval. I would like to eliminate hits separated by punctuation. Is the following syntax correct and what effect does including the punctuation removal have on the performance versus only specifying character proximity?

Submitted search string with 30 character proximity: word1 word2 /[^\,] {30}

bart · Post by **bart** » Fri Nov 30, 2001 8:07 am

You've got the right idea. It's actually word word w/[^\,]{30} or word word w/[^\punct]{30} .

The performance cost of this is that each record containing the two words must be examined for the proximity. If this is a small number (ie <1000) it won't be noticeable, but if its large (ie >10000) it could slow you down a little.

kzinda · Post by **kzinda** » Wed Dec 05, 2001 8:42 am

Will it eliminate hits where the punctuation touches word 1 on the left or word 2 on the right, or only punctuation between word 1 and word 2? I only want it to eliminate hits where the two words are separated by the punctuation.

Post by **mark** » Wed Dec 05, 2001 9:53 am

The above will ensure that there is no punctuation between word1 and word2.

joe103 · Post by **joe103** » Thu Dec 06, 2001 12:41 pm

I created a test table with the following:

id body
------------+------------+
3c0e38512 house blend pjs dsjljas dlfkjasd fljsadfj
3c0e384d2 house blend
3c0e38582 skjdf lksjflkj asldkfjasldjf alskjdf house blend pjs dsjljas dlfkjasd fljsadfj
3c0e385e2 skjdf lksjflkj asldkfjasldjf alskjdf, house blend, pjs dsjljas dlfkjasd fljsadfj
3c0e38622 skjdf lksjflkj asldkfjasldjf alskjdf, house blend pjs dsjljas dlfkjasd fljsadfj
3c0e38672 skjdf lksjflkj asldkfjasldjf alskjdf house blend, pjs dsjljas dlfkjasd fljsadfj
3c0e386c2 skjdf lksjflkj asldkfjasldjf alskjdf house, blend, pjs dsjljas dlfkjasd fljsadfj
3c0e38702 skjdf lksjflkj asldkfjasldjf alskjdf, house, blend, pjs dsjljas dlfkjasd fljsadfj
3c0e38732 skjdf lksjflkj asldkfjasldjf alskjdf, house, blend pjs dsjljas dlfkjasd fljsadfj
3c0e387e2 skjdf lksjflkj, asldkfjasldjf alskjdf house blend pjs dsjljas dlfkjasd fljsadfj
3c0e38852 skjdf lksjflkj, asldkfjasldjf alskjdf house blend pjs dsjljas, dlfkjasd fljsadfj
3c0e38892 skjdf lksjflkj, asldkfjasldjf alskjdf house, blend pjs dsjljas, dlfkjasd fljsadfj

My query and results:
$ tsql "select id, mminfo ('house blend w/[^,]{10}',body,0,1,0) from test"
Texis Version 03.01.996262822(20010727) Copyright (c) 1988-2001 Thunderstone EPI

id #TEMP1
------------+------------+
3c0e38512
3c0e384d2 house blend
3c0e38582
3c0e385e2 , house blend,
3c0e38622
3c0e38672 house blend,
3c0e386c2 house, blend,
3c0e38702 , house, blend,
3c0e38732
3c0e387e2
3c0e38852
3c0e38892

It looks like it is getting hits that it shouldn't and not getting some it should? What am I doing incorrectly here?

Post by **John** » Thu Dec 06, 2001 1:20 pm

What you actually need is:

tsql "select id, mminfo ('house blend W/[^,]{,10}',body,0,1,0) from test"

The capitalized 'W' indicates that you want to include the delimiter in what is searched, and you want 0 to 10 non-comma characters, not exactly 10. What it was doing was looking for 10 characters that were not ',' in a row as the delimiter, and where you had ", house," that doesn't match, so it kept looking further.

kzinda · Post by **kzinda** » Fri Mar 15, 2002 8:43 am

If I correctly use the construct word1 word2 W/[^,]{,8} to eliminate hits where word1 and word2 are within 8 characters and separated by a comma, to also eliminate hits with botha comma and a period would I use

word1 word2 W/[^,.]{,8}? Does the punctuation order matter in [^,.] vs [^.,]?

Post by **mark** » Fri Mar 15, 2002 10:20 am

Yes. And the order of items in a rex character list [] does not matter.