likepindexthresh

sourceuno · Post by **sourceuno** » Fri Dec 14, 2001 10:57 am

I'm not sure why setting likepindexthresh would cause a record count to go down if the count is less than the threshold that it's set to. Is it limiting the number of records that it counts?

For example, the following query returns 7008 records:
tsql "select count(*) from threads where Title\Body likep 'oldsmobile' and topics\categories like 't02 c007'

While this query returns only 83 records:
tsql "set likepindexthresh=10000;select count(*) from threads where Title\Body likep 'oldsmobile' and topics\categories like 't02 c007'

Post by **Kai** » Fri Dec 14, 2001 12:03 pm

likepindexthresh reduces the record count from the *index*, not the final result set. Post-processing may be needed on the index result set, reducing the final count, as in the case of this AND. The index may only be used for the first LIKEP, and that result set ANDed to produce the final count. That first index result set could be 200,000 records, and is only 7,008 after the AND. So setting likepindexthresh to 10,000 reduces that initial 200,000 records from the index; consequently fewer matches are produced once ANDed.

As an aside, if you're using count(*) you should generally using LIKE instead of LIKEP: it avoids the overhead of ranking which is just discarded by count(*), and doesn't return just the top likeprows (although that limit doesn't apply in most ANDs such as this).

source1Tamer · Post by **source1Tamer** » Wed Feb 20, 2002 2:59 pm

so,
I'm now setting the "likepindexthresh" parameter to 10000 and I'm getting the correct number of results in "0.2" second(VGood performance), this for only a single expression in the where clause, Ex :


select id from table1 where field1\field2 likep $q


but as discussed before,,, when I add another expression to the where clause I get a wrong number of results...an ex for this:


select id from table1 where field1\field2 likep $q and field3\field4 like 'mummy'


I tried to remove the "likepindexthresh" parameter, but I got the results in "4.27" seconds which is slow.

How could I return the correct number of result set with the current performance?

Is there a way to reduce the record count from the final result set(other than <sql max=number query></sql>)?

Post by **Kai** » Thu Feb 21, 2002 7:00 pm

The AND slows things down, because that second LIKE must be handled, either with an index (if one exists on field3\field4) and an AND merge with the first LIKEP's results (takes time), or a linear post-process of all the first LIKEP's results (also takes time). Either way it's slower than either single query alone would be. (Also, *all* the hits from the first LIKEP must be ANDed, not just the top 100, because hits for the second LIKE might occur in any match, not just the top 100. This slows things further.)

When you set likepindexthresh, you're limiting that first set of LIKEP results, *before* they are ANDed with the second LIKE. So you might be tossing some valid result rows. Let's say the LIKEP alone -- no AND, no likepindexthresh -- returns 30,000 rows. You then AND it -- again, no likepindexthresh -- and get 30 final rows. If you then AND it *and* set likepindexthresh to 10,000, you're tossing the other 20,000 initial LIKEP matches. So chances are you're also tossing 2/3 of the final AND matches, leaving an inaccurate final result of about 20.

likepindexthresh is an accuracy vs. time tradeoff. It saves time on noisy queries at the cost of accuracy. There is no way to return *all* the results of an AND without ANDing *all* of the rows, which may take some time. likepindexthresh only saves time by discarding some of the rows to AND; therefore the result may be inaccurate.