chkind performance

Post Reply
foosh101
Posts: 61
Joined: Tue Oct 22, 2002 2:13 pm

chkind performance

Post by foosh101 »

When chkind runs and decides to that it is time to reindex, does it just create an index on the new data? Also, while it is running, how much of the computers resources does it take / how much will it degrade the performance of the box (assuming the box has a substantial demand on it to produce results).
foosh101
Posts: 61
Joined: Tue Oct 22, 2002 2:13 pm

chkind performance

Post by foosh101 »

ok, I just read that it does just merge the new records into the index and that it is somewhat resource intensive. What guidelines should I go by to decide how to setup chkind (when to check and at how much new data)? If I reindex every hour and on average I reindex 100 new records, is the degradation of performance proportional to reindexing 50 records every 30 minutes, or is there an advantage to one of them. Is it better to do it every hour in the interest of not forcing the disk cache to update? At what point (# of new records or percentage of new to total records or size of new data) should I reindex? Is that really what it is all about, finding the point at which the number of new records has slowed searches down to the point that it would be faster to suffer the negative performance effects of reindexing/emptying the cache, or is there another major factor I am not considering? Also, if that is what it is all about, is there anything that can help me in my quest to find that point?
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

chkind performance

Post by John »

You are basically correct in the observation that it's finding the balance between searches getting slower, and updating the index. Due to the fact that larger data sets require more work to merge, and also tend to have larger caches for the new records, the actual number of new records will vary with size of record, and size of index.

Assuming you have significantly more records than 100 in your table, the cost to update the index with 50 or 100 new records will be almost the same, so I'd go with the once an hour.
John Turnbull
Thunderstone Software
foosh101
Posts: 61
Joined: Tue Oct 22, 2002 2:13 pm

chkind performance

Post by foosh101 »

The following is an example of a metamorph index I create:

d:\morph3\tsql -d d:\morph3\texis\mktDB "SET ignorecase=1;set addexp='\alnum+[\/\.]{1}\alnum+';CREATE METAMORPH INVERTED INDEX mkt1000000Inverted ON mkt1000000 (Headline\ListingDescription\CustomListingDescription,DateStart,ID1)"

Would I call the same thing to update(add only the unindexed records to the metamorph index) the index?
foosh101
Posts: 61
Joined: Tue Oct 22, 2002 2:13 pm

chkind performance

Post by foosh101 »

Also, is chkind more efficient at updating the indexes, or does it basically just call the same "create index" call once an index reaches a certain size?
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

chkind performance

Post by John »

Yes, you would call the same thing, and chkind has the same efficiency as calling "create index", however it does some initial work to determine whether it is needed or not.

The ingorecase setting is currently ignored in a metamorph index, it always ignores case. You do not need to specify the addexp setting when updating the index, but it is good practice to do so as it allows the same script to update or recreate the index.
John Turnbull
Thunderstone Software
Post Reply