Page 1 of 1
Major performance differences in queries
Posted: Fri May 03, 2002 6:03 pm
by chand012
I'm having trouble understanding why different queries can perform so differently, like <1 sec. versus 20+ sec. In this case, I have a date field with a regular index on it. When I select from the table "where Date >= '-1 month'" the query performs very fast. When I select "where Date between $x and $y", it performs very poorly. I'd like the user to be able to select the desired date range, but the wait is killing me. Do I need some other kind of index for the between clause to work effectively? The data is already sorted by date in ascending order.
Thanks,
David
Major performance differences in queries
Posted: Sat May 04, 2002 12:14 am
by John
Is there anything else in your WHERE clause? What values are you supplying for $x and $y, and what percentage of the field is chosen?
Major performance differences in queries
Posted: Mon May 06, 2002 10:33 am
by chand012
Nothing else in the where clause at this point. Here's the test code:
<fmt "%H:%M:%S" "now">
<flush>
<sql "select count(*) x from dbusage where AccessDate >= '-1 month'"></sql>
$x
<flush>
<fmt "%H:%M:%S" "now">
<flush>
<sql "select count(*) x from dbusage where AccessDate between '2002-04-06' and '2002-05-06'"></sql>
$x
<fmt "%H:%M:%S" "now">
Here's the output:
10:27:09
36012
10:27:09
35958
10:27:37
One month is about 20% of the data, but I get similar performance difference for one week (approx 5%).
--David
Major performance differences in queries
Posted: Mon May 06, 2002 3:59 pm
by John
We'll investigate that and see where the difference might come from.
Major performance differences in queries
Posted: Mon May 06, 2002 5:05 pm
by chand012
OK, thanks. If it's any help, our current version is:
Texis Web Script (Vortex) Copyright (c) 1996-2000 Thunderstone - EPI, Inc.
Commercial Version 3.01.959738425 of May 30, 2000 (sparc-sun-solaris2.5.1)
We plan to upgrade soon to a recent release.
--David
Major performance differences in queries
Posted: Tue May 07, 2002 11:31 am
by John
Texis has an optimization in it that will allow it to use the equivalent of $indexcount for count(*) if that number is accurate. It is conservative with that determination, in general preferring accuracy over speed. In the case of the second query you had it was actually reading the records in to count them. Newer versions extend the optimization to cover the second case you have. As a workaround for those cases you could use $indexcount, e.g.
<SQL "set bubble=0"></SQL>
<SQL MAX=1 "select AccessDate x from dbusage where AccessDate between '2002-04-06' and '2002-05-06'">
</SQL>
<$x=$indexcount>
<SQL "set bubble=1"></SQL>
Setting bubble=0 causes Texis to read all the index rows matching the where clause, so it generates an accurate indexcount, otherwise it you just pull the first record from the index to resolve the WHERE, and not give you a count.
Major performance differences in queries
Posted: Tue May 07, 2002 5:06 pm
by chand012
OK, great, that works well. Now I have another twist on the problem that gets back to the other part of your first question -- "are there any other parts of my where clause?". The full query involves two other criteria besides the date range. This database is essentially an access log that records three things: access time (AccessDate date), IP address of remote host (RemoteAddr varchar), and a resource ID (dbcn varchar). There is a metmorph index on dbcn, a normal index on RemoteAddr, and a compound index on (AccessDate, RemoteAddr). I want to produce reports on a selected date range, an optional (and possibly multi-valued) selected resource, and an IP range. So my full query looks like:
select count(*) from dbusage where dbcn like $dbcn and AccessDate between $startdate and $enddate and RemoteAddr matches $IPrange
I tried creating a metamorph compound index on (dbcn, AccessDate, RemoteAddr) but that doesn't seem to help performance.