Noise words and multiple like searches

Post Reply
josmani
Posts: 53
Joined: Tue Jun 03, 2003 3:38 am

Noise words and multiple like searches

Post by josmani »

Assume I have a table with columns id, title, countrycode, and body.

When I run the following query:

select title from mytable where countrycode='ARE' and body likep $query

I want to keep noise on countrycode but not on the body. The index on countrycode was built with noise word on. In Vortex when I set keepnoise to on, it enables it on the whole search. Is there a way to only keep it on countrycode?

I am using the following index expressions on the body index.

set addexp='[\alnum\x39\x80-\xff]{2,99}';


Appreciate any help.
User avatar
Kai
Site Admin
Posts: 1271
Joined: Tue Apr 25, 2000 1:27 pm

Noise words and multiple like searches

Post by Kai »

Noise words and the `keepnoise' setting only apply to LIKE and its variants, not `='. So just set keepnoise off, and it'll apply to the LIKEP but not the "countrycode='ARE'" clause. (And make sure the index you built on countrycode was a regular index -- create index IndexName on mytable(countrycode) -- not Metamorph.)
User avatar
Kai
Site Admin
Posts: 1271
Joined: Tue Apr 25, 2000 1:27 pm

Noise words and multiple like searches

Post by Kai »

Also, if countrycode values are always, say, 3 characters or less, the search would be faster if you built a Metamorph compound index and did the LIKEP first:

set keepnoise='off';
create metamorph inverted index myIndex on mytable(body,countrycode);

select title from mytable where body likep $query and countrycode='ARE';

(This assumes countrycode is declared with a short length in the table, e.g. countrycode varchar(4). If it is declared with a large length, e.g. varchar(1024), the compound index declared above may grow too large to be useful.)

With this compound index, both the LIKEP and the `=' can be resolved with one index.
User avatar
John
Site Admin
Posts: 2597
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

Noise words and multiple like searches

Post by John »

A index for an '=' statement should not be a metamorph index, but a regular index, in which case metamorph setting such as keepnoise don't apply.
John Turnbull
Thunderstone Software
josmani
Posts: 53
Joined: Tue Jun 03, 2003 3:38 am

Noise words and multiple like searches

Post by josmani »

Sorry for the confusion. The where clause on the select statement should be "countrycode like 'ARE'" as there is metamorph index on it since the content of the field could be more than one value like "ARE,AFG,CAN,USA".
josmani
Posts: 53
Joined: Tue Jun 03, 2003 3:38 am

Noise words and multiple like searches

Post by josmani »

In the above case we have two metamorphe indexes to search. Can we use the keepnoise on one of them?
User avatar
mark
Site Admin
Posts: 5513
Joined: Tue Apr 25, 2000 6:56 pm

Noise words and multiple like searches

Post by mark »

Currently the keepnoise setting applies to all likes in a given sql query. Not sure of a workaround at the moment.
User avatar
Kai
Site Admin
Posts: 1271
Joined: Tue Apr 25, 2000 1:27 pm

Noise words and multiple like searches

Post by Kai »

The effective noiselist is the union of the noiselist defined at index creation time and the noiselist defined (via <apicp>) at query time: the former are stripped from the index, the latter are stripped from the query.

So you could define keepnoise=1 and make the first Metamorph index (on countrycode), then set keepnoise=0 and make the second Metamorph index (on body), then set keepnoise=1 at query time for "... where countrycode like 'ARE' and body like 'whatever'".

That way, for the first clause, keepnoise=1 for both index and query, so noise words like 'ARE' are searched for; but for the second clause, there is a noise list in the index (even if not in the query), so the noiselist (defined at index create time) still applies to the second clause (but not the first).

Note that you might need to also set maxlinearrows=0 at query time, to force index usage for the second clause as well: otherwise, if the first clause result set is small, Texis may go linear for speed on the second clause, and since query keepnoise=1 and there is no index being used on that second clause, noise words would then *not* be ignored in the second clause.
josmani
Posts: 53
Joined: Tue Jun 03, 2003 3:38 am

Noise words and multiple like searches

Post by josmani »

Many thanks, I'll give it a try and let you know.
Post Reply