Page 1 of 1

Noise words and multiple like searches

Posted: Mon Mar 18, 2013 9:48 am
by josmani
Assume I have a table with columns id, title, countrycode, and body.

When I run the following query:

select title from mytable where countrycode='ARE' and body likep $query

I want to keep noise on countrycode but not on the body. The index on countrycode was built with noise word on. In Vortex when I set keepnoise to on, it enables it on the whole search. Is there a way to only keep it on countrycode?

I am using the following index expressions on the body index.

set addexp='[\alnum\x39\x80-\xff]{2,99}';


Appreciate any help.

Noise words and multiple like searches

Posted: Mon Mar 18, 2013 10:51 am
by Kai
Noise words and the `keepnoise' setting only apply to LIKE and its variants, not `='. So just set keepnoise off, and it'll apply to the LIKEP but not the "countrycode='ARE'" clause. (And make sure the index you built on countrycode was a regular index -- create index IndexName on mytable(countrycode) -- not Metamorph.)

Noise words and multiple like searches

Posted: Mon Mar 18, 2013 10:55 am
by Kai
Also, if countrycode values are always, say, 3 characters or less, the search would be faster if you built a Metamorph compound index and did the LIKEP first:

set keepnoise='off';
create metamorph inverted index myIndex on mytable(body,countrycode);

select title from mytable where body likep $query and countrycode='ARE';

(This assumes countrycode is declared with a short length in the table, e.g. countrycode varchar(4). If it is declared with a large length, e.g. varchar(1024), the compound index declared above may grow too large to be useful.)

With this compound index, both the LIKEP and the `=' can be resolved with one index.

Noise words and multiple like searches

Posted: Mon Mar 18, 2013 10:57 am
by John
A index for an '=' statement should not be a metamorph index, but a regular index, in which case metamorph setting such as keepnoise don't apply.

Noise words and multiple like searches

Posted: Mon Mar 18, 2013 11:09 am
by josmani
Sorry for the confusion. The where clause on the select statement should be "countrycode like 'ARE'" as there is metamorph index on it since the content of the field could be more than one value like "ARE,AFG,CAN,USA".

Noise words and multiple like searches

Posted: Mon Mar 18, 2013 1:13 pm
by josmani
In the above case we have two metamorphe indexes to search. Can we use the keepnoise on one of them?

Noise words and multiple like searches

Posted: Tue Mar 19, 2013 11:49 am
by mark
Currently the keepnoise setting applies to all likes in a given sql query. Not sure of a workaround at the moment.

Noise words and multiple like searches

Posted: Tue Mar 19, 2013 12:56 pm
by Kai
The effective noiselist is the union of the noiselist defined at index creation time and the noiselist defined (via <apicp>) at query time: the former are stripped from the index, the latter are stripped from the query.

So you could define keepnoise=1 and make the first Metamorph index (on countrycode), then set keepnoise=0 and make the second Metamorph index (on body), then set keepnoise=1 at query time for "... where countrycode like 'ARE' and body like 'whatever'".

That way, for the first clause, keepnoise=1 for both index and query, so noise words like 'ARE' are searched for; but for the second clause, there is a noise list in the index (even if not in the query), so the noiselist (defined at index create time) still applies to the second clause (but not the first).

Note that you might need to also set maxlinearrows=0 at query time, to force index usage for the second clause as well: otherwise, if the first clause result set is small, Texis may go linear for speed on the second clause, and since query keepnoise=1 and there is no index being used on that second clause, noise words would then *not* be ignored in the second clause.

Noise words and multiple like searches

Posted: Tue Mar 19, 2013 1:11 pm
by josmani
Many thanks, I'll give it a try and let you know.