join - varchar

paulman · Post by **paulman** » Mon Jul 09, 2001 9:58 am

I'm trying to join two tables ( 100,000 + records each ) on varchar fields. I've created an index on both tables and the query is still taking a long time. Several minutes. I realize that using varchar as key field isn't the best idea but this data is being imported from another system. Before I set up something to convert the varchar to an int field, is there something I can do to optimize this type of join?

Post by **Kai** » Mon Jul 09, 2001 3:38 pm

What are the table schemas, indexes, and SQL queries you're currently using? What is the typical size of the varchar data per row?

paulman · Post by **paulman** » Mon Jul 09, 2001 3:56 pm

The two tables are linked by data.winid to data_category.dataid (see below)
The typical field size for the 'winid' field is 14 characters. I added the metamorph index on these two fields in the off chance that it might help. It didn't really do anything. Thanks for your help.

create table data
(
id counter,
winid varchar(25),
title varchar(1024),
description varchar(1024),
duedate date,
state varchar(25),
zip varchar(10),
state_name varchar(1024)
);

create table data_category
(
dataid varchar(25),
subcategoryid long
);

create metamorph inverted index data_title_MI on data(title);
create metamorph inverted index data_description_MI on data(description);
create metamorph inverted index data_state_name_MI on data(state_name);
create metamorph inverted index data_winid_MI on data(winid);
create index data_winid_I on data(winid);
create index data_date_I on data(duedate);
create index data_state_I on data(state);

create metamorph inverted index data_category_dataid_MI on data_category(dataid);
create index data_category_dataid_I on data_category(dataid);
create index data_category_subcategoryid_I on data_category(subcategoryid);

Post by **Kai** » Mon Jul 09, 2001 5:14 pm

What's the exact SQL query or queries you're using?

paulman · Post by **paulman** » Tue Jul 10, 2001 8:27 am

Here is a sample query:

select d.title, d.description
from data d, data_category dc
where d.winid = dc.dataid
and d.winid='O76140000801162700403'

Post by **Kai** » Tue Jul 10, 2001 10:28 am

I'm not sure why you're using a join here, when the data_category table only has one field which could be added to the data table (if the tables are one-to-one), and you're not selecting it. Are you verifying that the id exists in both tables?

Since you're looking for a particular id, skipping the join and just looking for that id in both tables would be faster:

select d.title, d.description
from data d, data_category dc
where dc.dataid = 'O76140000801162700403' and
d.winid='O76140000801162700403';

paulman · Post by **paulman** » Tue Jul 10, 2001 1:30 pm

Sorry I should have included a field from the other field.
select d.title, d.description, dc.subcategoryid
The rest would be the same.

Yes I'm sure the value exists in both tables. It's not that the result doesn't come back. It just takes a lot longer than I would think it should take.

I rebuilt the index and there was a significant performance improvement. Do the Regular Indexes in Texis quickly get fragmented when records are added? If so how often should I be rebuilding the index for the these tables? I'll probably be importing around 2000 or more records a week.

Post by **Kai** » Tue Jul 10, 2001 2:07 pm

A regular index never gets fragmented; it's only Metamorph indexes that get out-of-date and occasionally need to be re-created (not dropped) to speed up searches. What probably happened -- assuming nothing else changed -- was that the index and table data were already in RAM cache the second time around, speeding table access for all those matching rows for the join (the first part of the WHERE clause).

The most restrictive part of a query should be first, ie. the part that (were each part to be run alone) would generate the fewest result rows. That's probably the "d.winid='.....'" part in this case, so it should be first.