Eliminating Punctuation

kzinda
Posts: 62
Joined: Fri Nov 30, 2001 6:18 am

Eliminating Punctuation

Post by kzinda »

I had an application developed for me where I pass search terms to TEXIS for retrieval. I would like to eliminate hits separated by punctuation. Is the following syntax correct and what effect does including the punctuation removal have on the performance versus only specifying character proximity?

Submitted search string with 30 character proximity: word1 word2 /[^\,] {30}
bart
Posts: 251
Joined: Wed Apr 26, 2000 12:42 am

Eliminating Punctuation

Post by bart »

You've got the right idea. It's actually word word w/[^\,]{30} or word word w/[^\punct]{30} .

The performance cost of this is that each record containing the two words must be examined for the proximity. If this is a small number (ie <1000) it won't be noticeable, but if its large (ie >10000) it could slow you down a little.
kzinda
Posts: 62
Joined: Fri Nov 30, 2001 6:18 am

Eliminating Punctuation

Post by kzinda »

Will it eliminate hits where the punctuation touches word 1 on the left or word 2 on the right, or only punctuation between word 1 and word 2? I only want it to eliminate hits where the two words are separated by the punctuation.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Eliminating Punctuation

Post by mark »

The above will ensure that there is no punctuation between word1 and word2.
joe103
Posts: 1
Joined: Thu Dec 06, 2001 9:05 am

Eliminating Punctuation

Post by joe103 »

I created a test table with the following:

id body
------------+------------+
3c0e38512 house blend pjs dsjljas dlfkjasd fljsadfj
3c0e384d2 house blend
3c0e38582 skjdf lksjflkj asldkfjasldjf alskjdf house blend pjs dsjljas dlfkjasd fljsadfj
3c0e385e2 skjdf lksjflkj asldkfjasldjf alskjdf, house blend, pjs dsjljas dlfkjasd fljsadfj
3c0e38622 skjdf lksjflkj asldkfjasldjf alskjdf, house blend pjs dsjljas dlfkjasd fljsadfj
3c0e38672 skjdf lksjflkj asldkfjasldjf alskjdf house blend, pjs dsjljas dlfkjasd fljsadfj
3c0e386c2 skjdf lksjflkj asldkfjasldjf alskjdf house, blend, pjs dsjljas dlfkjasd fljsadfj
3c0e38702 skjdf lksjflkj asldkfjasldjf alskjdf, house, blend, pjs dsjljas dlfkjasd fljsadfj
3c0e38732 skjdf lksjflkj asldkfjasldjf alskjdf, house, blend pjs dsjljas dlfkjasd fljsadfj
3c0e387e2 skjdf lksjflkj, asldkfjasldjf alskjdf house blend pjs dsjljas dlfkjasd fljsadfj
3c0e38852 skjdf lksjflkj, asldkfjasldjf alskjdf house blend pjs dsjljas, dlfkjasd fljsadfj
3c0e38892 skjdf lksjflkj, asldkfjasldjf alskjdf house, blend pjs dsjljas, dlfkjasd fljsadfj

My query and results:
$ tsql "select id, mminfo ('house blend w/[^,]{10}',body,0,1,0) from test"
Texis Version 03.01.996262822(20010727) Copyright (c) 1988-2001 Thunderstone EPI

id #TEMP1
------------+------------+
3c0e38512
3c0e384d2 house blend
3c0e38582
3c0e385e2 , house blend,
3c0e38622
3c0e38672 house blend,
3c0e386c2 house, blend,
3c0e38702 , house, blend,
3c0e38732
3c0e387e2
3c0e38852
3c0e38892

It looks like it is getting hits that it shouldn't and not getting some it should? What am I doing incorrectly here?
User avatar
John
Site Admin
Posts: 2623
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH

Eliminating Punctuation

Post by John »

What you actually need is:

tsql "select id, mminfo ('house blend W/[^,]{,10}',body,0,1,0) from test"

The capitalized 'W' indicates that you want to include the delimiter in what is searched, and you want 0 to 10 non-comma characters, not exactly 10. What it was doing was looking for 10 characters that were not ',' in a row as the delimiter, and where you had ", house," that doesn't match, so it kept looking further.
John Turnbull
Thunderstone Software
kzinda
Posts: 62
Joined: Fri Nov 30, 2001 6:18 am

Eliminating Punctuation

Post by kzinda »

If I correctly use the construct word1 word2 W/[^,]{,8} to eliminate hits where word1 and word2 are within 8 characters and separated by a comma, to also eliminate hits with botha comma and a period would I use

word1 word2 W/[^,.]{,8}? Does the punctuation order matter in [^,.] vs [^.,]?
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Eliminating Punctuation

Post by mark »

Yes. And the order of items in a rex character list [] does not matter.