I'm trying to build a select query that makes a join between 2 tables.
the query runs but the performance is poor.
Can somebody tell me what I'm doing wrong?
table1: DocUrl Hash DocId + other fields
there is an index on DocUrl,Hash
table2: DocUrl Hash DocId +other fields
There is also an index on DocUrl,Hash.
Table1 has about 450 records while table2 have thousands.
my query is:
select t1.DocumentUrl as docUrl, t1.DocId as oldId, t2.DocId as newId from table1 as t1, table2 as t2 where t1.DocumentUrl = t2.DocumentUrl and t1.Hash = t2.Hash
If you run with tsql and add -V you should see whether it is iterating over t1 or t2. If it is not doing t1 first then you can add a sql statement "set nooptimize='join'" to prevent it reordering the tables.
what do you mean by iterating over t1?
Does that mean transforming the query into:
select ..... where t1.DocumentUrl = 'some Url' and t1.Hash = 'some Hash';?
If so, that's what it did (and setting nooptimize='join' behaved the same way). But that didn't change much.
The query still takes about 2.7s (for 450 records to identify) while inserting the records and building the index takes only 0.26s
For each of the 450 records in t1 it will do a query against t2. Does it run faster if reran immediately? The 170 or so queries a second you are seeing is around the transaction rates for many disks if it is needing to hit the disk for the t2 queries.
I have 2 SQL vortex commands one after each other.
They both take the same time.
The thing I don't understand is why it need to make 170 accesses to the disk.
I have indexes on both table, it should be able to load both indexes in memory and use them instead.
Any way I can force that?
That's why I was asking if it was faster if the same query is ran again immediately after, as unless the system is low on memory compared to the memory needed for the query it should then have everything in memory and not need to hit the disk. If you are on unix and "time" the tsql command you can see the amount of time spent by Texis, by the OS, and how much is waiting for IO.