Page 1 of 1

Indexing and search speed question

Posted: Tue Jan 07, 2003 5:11 pm
by jkj2001
Hi there,

There's a search we've been running which is much slower than any others we've run, even though they all *appear* similar.

This search runs pretty quickly for us at the command prompt (5-10 seconds):

tsql "select count(*) from mytable where TEXT1 like '\"Vinson & Ellis\"'"

It returns a count of 3.


This search comes back in under a minute:

tsql "select count(*) from mytable where TEXT1 like 'Vinson Elkins'"

We get a count of 28,000.


However, this next search takes 20-30 minutes to run. Notice that the search term is slightly different from the one at top ("Elkins" and not "Ellis"):

tsql "select count(*) from mytable where TEXT1 like '\"Vinson & Elkins\"'"

It returns a count of 26,000.

I can understand a count of 26,000 taking longer to return than a count of 3, but that much difference in time, when they search over the same field and use the same index?

The TEXT1 field I refer to above is a blob, probably 3 GB in size. Good sized.

Here are the params we use to create the inverted metamorph index on the field:

set keepnoise='on';set delexp=0;set addexp='\alnum{1,99}';set addexp='>>\a
lpha{1,50},=\alpha{1,50}'

Our version of texis is 4.00.1019027849, for Solaris.


If you can help me get a clearer idea on why the third search listed above is so much slower than the first two, I'd be grateful. Thanks!

Indexing and search speed question

Posted: Tue Jan 07, 2003 5:52 pm
by John
Since the "&" is not indexed, and exactphrase is on by default in tsql, it needs to look at each of the potential (26,000 - 28,000) results to see if the "&" is there. The other counts can be determined just by looking at the index without actually looking at the records.

The options would be search for

tsql "select count(*) from mytable where TEXT1 like '\"Vinson Elkins\"'"

or add "&" to the index expressions, or turn off exactphrase, which will allow any punctuation or non-index expression terms between "Vinson" and "Elkins".

Indexing and search speed question

Posted: Tue Jan 07, 2003 6:31 pm
by jkj2001
I suspected the "&" had a part to play here, but I'm curious... the first search I listed also has an "&" in it, yet it comes back very quickly.

Since we're not indexing ampersands, wouldn't the first search take just about as long as the third, even though it only finds 3 hits in the end? Or is it the case that texis runs two searches here; a first one involving the index and the words, followed closely by a second, linear search in which it tries to find the ampersand in the results?

Indexing and search speed question

Posted: Wed Jan 08, 2003 9:25 am
by John
Yes, it will use the index as much as possible, so if there are only 5 records with the words "Vinson" and "Ellis" next to each other, it will only look at the 5 records to see if the "&" is there.