slow search term

Post Reply
jkj2001
Posts: 142
Joined: Fri Mar 29, 2002 1:39 pm

slow search term

Post by jkj2001 »

Hi,

When I run the following search I get a count of 400,000 documents:

tsql "select count(ID) from mytable where mytext like 'policy'"

This search returns about 100,000 docs. Like the one above, it takes a few seconds to run:

tsql "select count(ID) from mytable where mytext like 'wonk'"

Together, this search takes maybe 10 seconds and returns a count of 25,000:

tsql "select count(ID) from mytable where mytext like 'policy wonk'"

However, when I put an underscore in there my search positively crawls:

tsql "select count(ID) from mytable where mytext like 'policy_wonk'"

five, ten minutes....no count returns. I can't understand why.

Here's the index statement we use on the field:

tsql -q "set keepnoise='on';set delexp=0;set addexp='\alnum{1,99}';set addexp='>
>\alpha{1,50},=\alpha{1,50}';create metamorph inverted index indexmytext on mytable(mytext)"

We're using your linux version, with a version number of 4.04.1067366033. What's going on here?

Thanks much...
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

slow search term

Post by John »

You don't have the underscore indexed, so it will look at all 400,000 documents containing policy to see if it is followed by "_wonk". Depending on the size of those documents it could take some time to read and linearly scan them all.
John Turnbull
Thunderstone Software
jkj2001
Posts: 142
Joined: Fri Mar 29, 2002 1:39 pm

slow search term

Post by jkj2001 »

Thanks, John.

Would you suggest some sort of w/1 character proximity term? Could that speed things up possibly?

Maybe--

"select count(ID) from mytable where mytext like '+policy wonk w/1'"?
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

slow search term

Post by John »

That depends on exactly what you want to find. If you search for the phrase "policy wonk" that would be the most efficient.
John Turnbull
Thunderstone Software
jkj2001
Posts: 142
Joined: Fri Mar 29, 2002 1:39 pm

slow search term

Post by jkj2001 »

Would it also work if I were looking for "policy_wonk", with the underscore included? I realize I'd get false positives with things like "policy wonk", but I can live with that.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

slow search term

Post by mark »

Search for policy_wonk is the same as "policy_wonk".
Post Reply