Page 1 of 1

slow search term

Posted: Mon Mar 07, 2005 1:46 pm
by jkj2001
Hi,

When I run the following search I get a count of 400,000 documents:

tsql "select count(ID) from mytable where mytext like 'policy'"

This search returns about 100,000 docs. Like the one above, it takes a few seconds to run:

tsql "select count(ID) from mytable where mytext like 'wonk'"

Together, this search takes maybe 10 seconds and returns a count of 25,000:

tsql "select count(ID) from mytable where mytext like 'policy wonk'"

However, when I put an underscore in there my search positively crawls:

tsql "select count(ID) from mytable where mytext like 'policy_wonk'"

five, ten minutes....no count returns. I can't understand why.

Here's the index statement we use on the field:

tsql -q "set keepnoise='on';set delexp=0;set addexp='\alnum{1,99}';set addexp='>
>\alpha{1,50},=\alpha{1,50}';create metamorph inverted index indexmytext on mytable(mytext)"

We're using your linux version, with a version number of 4.04.1067366033. What's going on here?

Thanks much...

slow search term

Posted: Mon Mar 07, 2005 2:58 pm
by John
You don't have the underscore indexed, so it will look at all 400,000 documents containing policy to see if it is followed by "_wonk". Depending on the size of those documents it could take some time to read and linearly scan them all.

slow search term

Posted: Mon Mar 07, 2005 3:27 pm
by jkj2001
Thanks, John.

Would you suggest some sort of w/1 character proximity term? Could that speed things up possibly?

Maybe--

"select count(ID) from mytable where mytext like '+policy wonk w/1'"?

slow search term

Posted: Mon Mar 07, 2005 3:52 pm
by John
That depends on exactly what you want to find. If you search for the phrase "policy wonk" that would be the most efficient.

slow search term

Posted: Mon Mar 07, 2005 3:54 pm
by jkj2001
Would it also work if I were looking for "policy_wonk", with the underscore included? I realize I'd get false positives with things like "policy wonk", but I can live with that.

slow search term

Posted: Mon Mar 07, 2005 4:11 pm
by mark
Search for policy_wonk is the same as "policy_wonk".