Page 1 of 2

'Paul' vs 'John' require linear search.

Posted: Fri Nov 02, 2007 9:57 am
by sroth
When I do a LIKP search on 'paul' i get the following message: <!-- 115 /search:671: Query `paul' would require post-processing: Index expression(s) do not match term `p' -->

When I do the same search on 'John', the results are good.

What could the difference be between these two searches?

select COUNT(*) cnt FROM Crawdaddy WHERE Content LIKEP ('paul');

select COUNT(*) cnt FROM Crawdaddy WHERE Content LIKEP ('john');

'Paul' vs 'John' require linear search.

Posted: Fri Nov 02, 2007 10:27 am
by mark
What were your index expressions when creating the index on Content?

'Paul' vs 'John' require linear search.

Posted: Fri Nov 02, 2007 10:32 am
by sroth
<SQL "set keepnoise=1;"></SQL>
<SQL "set minwordlen=2;"></SQL>

'Paul' vs 'John' require linear search.

Posted: Fri Nov 02, 2007 10:33 am
by sroth
The index was created using this statement after keepnoise and minwordlen were set.

<SQL "create metamorph inverted index Crawdaddy_Content on Crawdaddy(Content)"></SQL>

'Paul' vs 'John' require linear search.

Posted: Fri Nov 02, 2007 10:42 am
by mark
What about addexp?

'Paul' vs 'John' require linear search.

Posted: Fri Nov 02, 2007 10:44 am
by mark
Or have you setup a custom locale such that "a" would be considered whitespace?

'Paul' vs 'John' require linear search.

Posted: Fri Nov 02, 2007 10:45 am
by sroth
I haven't explicity set addexp and I don't have a custom locale setup (at least not to my knowledge).

'Paul' vs 'John' require linear search.

Posted: Fri Nov 02, 2007 10:48 am
by John
You have minwordlen too low. It is stripping off the a and ul from Paul as suffixes, leaving only P. You could <SQL "set defsuffrm = 0"></SQL> to prevent the "a" being stripped, but generally you would want a more limited set of suffixes if you set minwordlen that low.

'Paul' vs 'John' require linear search.

Posted: Fri Nov 02, 2007 10:54 am
by sroth
Well, setting minwordlen=4 solved the issue with 'Paul' and "john' and 'bob' still work. I guess this a trial-and-error situation to find the best setting. Any advice? Thanks.

'Paul' vs 'John' require linear search.

Posted: Fri Nov 02, 2007 11:06 am
by mark
minwordlen shouldn't be less than 4 or maybe 3. 4 or 5 is typical.