qmaxsetwords question

Post Reply
Mr. Bigglesworth
Posts: 56
Joined: Fri Feb 16, 2001 6:54 pm

qmaxsetwords question

Post by Mr. Bigglesworth »

We've just upgraded from the Nov 1999 version of the software, and we've come across something with the qmaxsetwords flag.

Previously, whenever we ran a search on a single field in the database, we'd get the proper number of hits, but with this sort of message in the log:

Partially dropping term `VERC*' in query `VERC*': Max words per set exceeded


With the new version, the message is still there, but the hits are limited to 500, as per the default.

We don't recall setting the default in the old version, and were wondering if that had changed. Also, is there a way to disable the qmaxsetwords flag, so that as many hits as are in the index are returned-- Just bump up the value to 100,000 or so?
User avatar
Kai
Site Admin
Posts: 1272
Joined: Tue Apr 25, 2000 1:27 pm

qmaxsetwords question

Post by Kai »

The limit hasn't changed, but the behavior on reaching the limit has. In the Nov. 99 release, hitting the limit caused the entire set to be considered noise and not looked up by the index; if you had post-processing enabled it might have then gone linear to pick it up, which can take considerable time (which is why setting allinear=1 or alpostproc=1 should only be done with caution).

In versions after Jan. 2000, the term might only be partially dropped, and as many terms from the wildcard match as possible kept, up to the qmaxsetwords limit, depending on the setting of dropwordmode (if 0, which is the default).

Setting qmaxsetwords to a high number will enable the wildcard to match all terms, at the potential cost of speed for such large matches.
Mr. Bigglesworth
Posts: 56
Joined: Fri Feb 16, 2001 6:54 pm

qmaxsetwords question

Post by Mr. Bigglesworth »

Thanks for the info.

Just to follow up, we'd like to disable the 500 limit, keeping in mind the performance hit, and would the best way to do that be to bump up qmaxsetwords to infinity (or set it to zero or -1, which might disable it?), or to play with the dropwordmode flag?

Also, we've got some compound indexes (like FIELD1\FIELD2\FIELD3) that we search against, and have noticed similar, though not exact behaviour. In tsql, if we run a select statement against the compound index we get 1300 hits, but if we do it via a vortex script only 1067 return.

Why isn't that search also being limited to 500 hits-- some different index limitations involved when using a compound one?
User avatar
Kai
Site Admin
Posts: 1272
Joined: Tue Apr 25, 2000 1:27 pm

qmaxsetwords question

Post by Kai »

Set qmaxsetwords to a high number, like 10000. If you hit that limit the search is going to suffer anyway. Keep in mind that a low qmaxsetwords limit, like most <apicp> query protection limits, doesn't make most queries faster, it's to protect the server from the (hopefully rare) ill-composed query that takes a lot of CPU time, which can slow other concurrent queries as well. So just because your test query looks ok on an unloaded box without the limits, doesn't guarantee performance under load. You're unfastening the seat belts, and your users are driving... :)

tsql and texis have different apicp limits: tsql allows linear searches and post-processing by default, whereas texis (Vortex) does not. So tsql will take the time to post-process the "noise" set. (The defaults are different because tsql is intended as an infrequent-use, command-line, intelligent-admin-user tool, whereas Vortex handles many live user queries.)

(A Metamorph index on FIELD1\FIELD2\FIELD3 is not a compound index, it is an index on a virtual field. A compound index would be an index on FIELD1,FIELD2,FIELD3 where FIELD1 is LIKE-searchable, and FIELD2 and FIELD3 are usually small fixed-size fields (integers, counters, dates, floats) that can be ANDed after the LIKE. Eg.:

where FIELD1 like $query and FIELD2 = 5 and FIELD3 > '-1 week'
)
Mr. Bigglesworth
Posts: 56
Joined: Fri Feb 16, 2001 6:54 pm

qmaxsetwords question

Post by Mr. Bigglesworth »

virtual field-- that's what I meant to say.

Anyway, do you happen to know why doing the wildcard search in the virtual field returned 1067 hits instead of 500? The *true* number of hits should have been around 1300, so something's going on there.

Hey, thanks again for all the help, too.
User avatar
Kai
Site Admin
Posts: 1272
Joined: Tue Apr 25, 2000 1:27 pm

qmaxsetwords question

Post by Kai »

The 500 limit to qmaxsetwords is not a hit limit, it is a distinct *word* limit. Some of those words might occur in more than one row, maybe hundreds. The idea behind qmaxsetwords is to limit the number of *words* that the index is looking up, which may or may not reduce the number of hits.

The 1300 vs. 1067 difference is the differing <apicp> defaults in tsql vs. Vortex.
Post Reply