Data retrieval from disk

skalyanaraman · Post by **skalyanaraman** » Thu Sep 07, 2006 3:37 pm

Hi,
I was informed few years back, in respect to a question, that doing "select DOCID from tbldoc" actually, pulls the whole row into memory before dishing out the DOCID field. To make this faster, I was told to use one of the big fields usually OCR or DOCTEXT as a blob field. Here are my questions:
1. is this still true? i.e. For "select DOCID from tbldoc", does texis have to pull the whole record into memory to dish out the DOCID field?
2. is the same true for "select count(DOCID) from tbldoc"? I guess not, but, just in case.
3. if answer to 1 is "not necessarily", is there a sql setting I can use to just get DOCID from disk?

Sorry, if any of these questions are obvious or silly/dumb? It is just, that we are seeing lots of disk I/O off late.

Thanks!!

Post by **John** » Thu Sep 07, 2006 3:49 pm

Do you usually have a WHERE clause?

1. Yes, the entire record is read to get the fields you want. Generally the record is smaller than the size of the data the drive will read anyway.

If you have no where clause, and an index on DOCID you can select directly from the index. If you do have a where clause and an index is used that includes DOCID it may pull the DOCID from there.

2. A select count() will use an index if possible rather than read through the table.

skalyanaraman · Post by **skalyanaraman** » Thu Sep 07, 2006 4:21 pm

Hi John,
Thanks.
I found the big disk I/O.
We are doing the following,
<sql MAX=10 SKIP=0 "select DOCID from tbldoc where DOCTEXT like 'test' order by DOCID">

Looks like when I do order by, I get hit on the disk big time, the strace of the process id shows seek,read of all kinds of data from disk.

But, without order by DOCID, the above code finishes very quickly and no big disk I/O.

Any thoughs why?? we have metamorph inverted index on DOCTEXT and regular metamorph unique index on DOCID.
Any help is greatly appreciated!!
thanks!!

Post by **John** » Thu Sep 07, 2006 4:47 pm

What datatype is DOCID? The index on DOCTEXT can not help with the order by, so all the matching records will be read to get the DOCID, then they are sorted, and the top 10 shown. Without the order by only 10 records need to be read.

Post by **mark** » Thu Sep 07, 2006 5:44 pm

If DOCID is a fixed size type (int, counter, etc) or a short varchar it can be used in a compound metamorph index for added speed.
create metamorph inverted index xname on tname(DOCTEXT,DOCID);

skalyanaraman · Post by **skalyanaraman** » Thu Sep 07, 2006 5:46 pm

Thanks John,
The DOCID is of type varchar(50).
This is a unique alphanumeric key.

Post by **mark** » Thu Sep 07, 2006 5:51 pm

See also
http://www.thunderstone.com/texis/site/ ... pound.html
http://thunderstone.master.com/texis/ma ... morph&s=SS

skalyanaraman · Post by **skalyanaraman** » Thu Sep 07, 2006 5:53 pm

I created one such compound index,

1. If I try to run "select DOCID from tbldoc where DOCTEXT like 'test'" I get the following error "011 Cache size exceeded in the function btappend"
Pls note, the above query has no order by

2. If I try to run "select DOCID from tbldoc where DOCTEXT like 'test' order by DOCID", I get the same DOCID back hundreds of times.

thanks!!

Post by **mark** » Fri Sep 08, 2006 1:05 am

What's your texis -version?

skalyanaraman · Post by **skalyanaraman** » Fri Sep 08, 2006 12:44 pm

We are using texis version
Commercial Version 5.01.1154642055 20060803 (i686-unknown-linux2.4.9-64-32)