Question on "not" searches

Post Reply
jkj2001
Posts: 142
Joined: Fri Mar 29, 2002 1:39 pm

Question on "not" searches

Post by jkj2001 »

Hi,

We've got a question regarding NOT searching.

Our first search went like so:

select count(*) from mytable where field01 like '"yankee group"' and field01 not like 'News'

We got 535 hits.


We then broke things up into two searches:

(search 1)
select count(*) from mytable where field01 like '"yankee group"' (approx 1000 hits)

(search 2)
select count(*) from mytable where field01 like 'News'
(approx 75000 hits)

We then used an xtree to get all hits from search 1 not in search 2. We got 541 hits, or six more than the original search.

This was a little puzzling to us, so we focused in on the six hits. It turns out none of them has "news" in the field but rather, "news4@somewhere.com", variations like that. We were wondering if this hiccup was due to our indexing maybe, or the way NOT searches work, etc.

At a glance it seems (in our case at least) that NOT will consider "news4" as a hit, while a search for "News" won't return that. Is this correct?

Our texis version is Solaris, 4.04.1067329099. Here's what we used to index the field in question:

tsql -q "set keepnoise='on';set delexp=0;set addexp='\alnum{1,99}';set addexp='>
>\alpha{1,50},=\alpha{1,50}';create metamorph inverted index idxfield01 on mytable(field01);"
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

Question on "not" searches

Post by John »

The NOT search does not use an index, and so uses the langc and wordc settings to determine word boundaries. The default is to consider words to be strings of letters, so the digit 4 is not part word, and therefore news matches news4.

The index expression you have treats strings of letters and digits as words, so news4 is indexed, and won't be found for news.

You could either change the indexexpression or the langc and wordc settings for consistency.
John Turnbull
Thunderstone Software
Post Reply