Major performance differences in queries

Post Reply
chand012
Posts: 70
Joined: Mon Dec 18, 2000 4:34 pm

Major performance differences in queries

Post by chand012 »

I'm having trouble understanding why different queries can perform so differently, like <1 sec. versus 20+ sec. In this case, I have a date field with a regular index on it. When I select from the table "where Date >= '-1 month'" the query performs very fast. When I select "where Date between $x and $y", it performs very poorly. I'd like the user to be able to select the desired date range, but the wait is killing me. Do I need some other kind of index for the between clause to work effectively? The data is already sorted by date in ascending order.

Thanks,
David
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

Major performance differences in queries

Post by John »

Is there anything else in your WHERE clause? What values are you supplying for $x and $y, and what percentage of the field is chosen?
John Turnbull
Thunderstone Software
chand012
Posts: 70
Joined: Mon Dec 18, 2000 4:34 pm

Major performance differences in queries

Post by chand012 »

Nothing else in the where clause at this point. Here's the test code:

<fmt "%H:%M:%S" "now">

<flush>
<sql "select count(*) x from dbusage where AccessDate >= '-1 month'"></sql>
$x

<flush>
<fmt "%H:%M:%S" "now">

<flush>
<sql "select count(*) x from dbusage where AccessDate between '2002-04-06' and '2002-05-06'"></sql>
$x

<fmt "%H:%M:%S" "now">

Here's the output:

10:27:09
36012

10:27:09
35958

10:27:37

One month is about 20% of the data, but I get similar performance difference for one week (approx 5%).

--David
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

Major performance differences in queries

Post by John »

We'll investigate that and see where the difference might come from.
John Turnbull
Thunderstone Software
chand012
Posts: 70
Joined: Mon Dec 18, 2000 4:34 pm

Major performance differences in queries

Post by chand012 »

OK, thanks. If it's any help, our current version is:

Texis Web Script (Vortex) Copyright (c) 1996-2000 Thunderstone - EPI, Inc.
Commercial Version 3.01.959738425 of May 30, 2000 (sparc-sun-solaris2.5.1)

We plan to upgrade soon to a recent release.

--David
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

Major performance differences in queries

Post by John »

Texis has an optimization in it that will allow it to use the equivalent of $indexcount for count(*) if that number is accurate. It is conservative with that determination, in general preferring accuracy over speed. In the case of the second query you had it was actually reading the records in to count them. Newer versions extend the optimization to cover the second case you have. As a workaround for those cases you could use $indexcount, e.g.

<SQL "set bubble=0"></SQL>
<SQL MAX=1 "select AccessDate x from dbusage where AccessDate between '2002-04-06' and '2002-05-06'">
</SQL>
<$x=$indexcount>
<SQL "set bubble=1"></SQL>

Setting bubble=0 causes Texis to read all the index rows matching the where clause, so it generates an accurate indexcount, otherwise it you just pull the first record from the index to resolve the WHERE, and not give you a count.
John Turnbull
Thunderstone Software
chand012
Posts: 70
Joined: Mon Dec 18, 2000 4:34 pm

Major performance differences in queries

Post by chand012 »

OK, great, that works well. Now I have another twist on the problem that gets back to the other part of your first question -- "are there any other parts of my where clause?". The full query involves two other criteria besides the date range. This database is essentially an access log that records three things: access time (AccessDate date), IP address of remote host (RemoteAddr varchar), and a resource ID (dbcn varchar). There is a metmorph index on dbcn, a normal index on RemoteAddr, and a compound index on (AccessDate, RemoteAddr). I want to produce reports on a selected date range, an optional (and possibly multi-valued) selected resource, and an IP range. So my full query looks like:

select count(*) from dbusage where dbcn like $dbcn and AccessDate between $startdate and $enddate and RemoteAddr matches $IPrange

I tried creating a metamorph compound index on (dbcn, AccessDate, RemoteAddr) but that doesn't seem to help performance.
Post Reply