Data retrieval from disk

skalyanaraman
Posts: 109
Joined: Tue May 29, 2001 9:13 pm

Data retrieval from disk

Post by skalyanaraman »

Hi,
I was informed few years back, in respect to a question, that doing "select DOCID from tbldoc" actually, pulls the whole row into memory before dishing out the DOCID field. To make this faster, I was told to use one of the big fields usually OCR or DOCTEXT as a blob field. Here are my questions:
1. is this still true? i.e. For "select DOCID from tbldoc", does texis have to pull the whole record into memory to dish out the DOCID field?
2. is the same true for "select count(DOCID) from tbldoc"? I guess not, but, just in case.
3. if answer to 1 is "not necessarily", is there a sql setting I can use to just get DOCID from disk?

Sorry, if any of these questions are obvious or silly/dumb? It is just, that we are seeing lots of disk I/O off late.

Thanks!!
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

Data retrieval from disk

Post by John »

Do you usually have a WHERE clause?

1. Yes, the entire record is read to get the fields you want. Generally the record is smaller than the size of the data the drive will read anyway.

If you have no where clause, and an index on DOCID you can select directly from the index. If you do have a where clause and an index is used that includes DOCID it may pull the DOCID from there.

2. A select count() will use an index if possible rather than read through the table.
John Turnbull
Thunderstone Software
skalyanaraman
Posts: 109
Joined: Tue May 29, 2001 9:13 pm

Data retrieval from disk

Post by skalyanaraman »

Hi John,
Thanks.
I found the big disk I/O.
We are doing the following,
<sql MAX=10 SKIP=0 "select DOCID from tbldoc where DOCTEXT like 'test' order by DOCID">

Looks like when I do order by, I get hit on the disk big time, the strace of the process id shows seek,read of all kinds of data from disk.

But, without order by DOCID, the above code finishes very quickly and no big disk I/O.

Any thoughs why?? we have metamorph inverted index on DOCTEXT and regular metamorph unique index on DOCID.
Any help is greatly appreciated!!
thanks!!
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

Data retrieval from disk

Post by John »

What datatype is DOCID? The index on DOCTEXT can not help with the order by, so all the matching records will be read to get the DOCID, then they are sorted, and the top 10 shown. Without the order by only 10 records need to be read.
John Turnbull
Thunderstone Software
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Data retrieval from disk

Post by mark »

If DOCID is a fixed size type (int, counter, etc) or a short varchar it can be used in a compound metamorph index for added speed.
create metamorph inverted index xname on tname(DOCTEXT,DOCID);
skalyanaraman
Posts: 109
Joined: Tue May 29, 2001 9:13 pm

Data retrieval from disk

Post by skalyanaraman »

Thanks John,
The DOCID is of type varchar(50).
This is a unique alphanumeric key.
skalyanaraman
Posts: 109
Joined: Tue May 29, 2001 9:13 pm

Data retrieval from disk

Post by skalyanaraman »

I created one such compound index,

1. If I try to run "select DOCID from tbldoc where DOCTEXT like 'test'" I get the following error "011 Cache size exceeded in the function btappend"
Pls note, the above query has no order by

2. If I try to run "select DOCID from tbldoc where DOCTEXT like 'test' order by DOCID", I get the same DOCID back hundreds of times.

thanks!!
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Data retrieval from disk

Post by mark »

What's your texis -version?
skalyanaraman
Posts: 109
Joined: Tue May 29, 2001 9:13 pm

Data retrieval from disk

Post by skalyanaraman »

We are using texis version
Commercial Version 5.01.1154642055 20060803 (i686-unknown-linux2.4.9-64-32)
Post Reply