Creating the right indexes

Post Reply
tony.malandain
Posts: 57
Joined: Sat Mar 27, 2004 6:39 pm

Creating the right indexes

Post by tony.malandain »

Hi,
I have the following table:
DocId => char(50)
PageIndex => Integer
Binary => Blob
PlainText => varchar

The table may contain up to 10M lines.
That table is frequently queried to return the most relevant page of a document. I use this query for that:
Select max(rank), PlainText from table where PlainText like 'xxx' and DocId='YYY' group by DocId

The table are also frequently modified: typical changes include:
Delete all records with Id = xxx
Set Id= xxx where Id=yyy
Add new files.

Which indexes would you recommend for that table?

How can I monitor the performance of sql statements to detect which one are costly?

Thanks,
/Tony.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Creating the right indexes

Post by mark »

You'll want a metamorph inverted index on PlainText,DocId and a regular index on DocId.

You can use <sysinfo proctime> to measure how long things take.
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

Creating the right indexes

Post by John »

A better query to do with the metamorph inverted index suggested above would be:

select PageIndex from table where PlainText likep 'xxx' and DocId = 'YYY';

The first result will be the highest rank with LIKEP. If using Vortex you can use <SQL MAX=1 ... to get just the first row.
John Turnbull
Thunderstone Software
tony.malandain
Posts: 57
Joined: Sat Mar 27, 2004 6:39 pm

Creating the right indexes

Post by tony.malandain »

Thanks,

Yes, we do use likep, but we run the query on multiple DocIds at onces that's why we use the group by...

One question though: If I create the metamorph inverted index on PlainText,DocId will it do a full-text indexing of DocId? We only need a regular index on the DocId.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Creating the right indexes

Post by mark »

No. The text index is on the first field. The remainder are aux data put into the index to reduce table accesses when resolving the rest of the query.
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

Creating the right indexes

Post by John »

The other problem is that the group by doesn't tell you which page id had the highest rank. Depending on how many docids you are searching for it may be more efficient to either do it document by document getting 1 at a time, or get all the results in rank order and pick the first page you see for each document.
John Turnbull
Thunderstone Software
Post Reply