slow performance with queries

phoebe
Posts: 25
Joined: Fri Aug 01, 2003 9:29 am

slow performance with queries

Post by phoebe »

There is a metamorph index on Catno.
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

slow performance with queries

Post by John »

Did you set the index expression so single digits will be indexed? Are there any messages in the html while running the query?
John Turnbull
Thunderstone Software
phoebe
Posts: 25
Joined: Fri Aug 01, 2003 9:29 am

slow performance with queries

Post by phoebe »

Actually, the categories are of 2 characters, \alpha\digit so that they are not confused with noise words.
I did not changed any of the other webinator settings.

All the html pages have at least one category and in one version I use both '+' and '-' in the search. It doesn't seem to make a difference in the performance with '+' added.

The index expression is only the simple:
create metamorph index xhtmlcat on html(Catno)

When it doesn't time out, it just returns the query with no mesg.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

slow performance with queries

Post by mark »

I'm confused by that last statement. Are you saying that you did a query that should have had answers but didn't?

View the source of the results page to see if there are any error or warning messages within html comments.
phoebe
Posts: 25
Joined: Fri Aug 01, 2003 9:29 am

slow performance with queries

Post by phoebe »

I get a blank page.
page source is also blank.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

slow performance with queries

Post by mark »

That's very odd. Even for a timeout you should get back a page that says timeout (unless you've set your script timeout to -1 or some huge number and it's the web client or server that's timing out). Check your vortex.log and webserver error log for corresponding events.

You can eliminate the webserver from the equation by performing SQL on the command line with texis.
texis -d /path/to/your/database -s "select ..."

Please also summarize the actual sql statement and queries you're using and what kind(s) of indices are on the fields being queried.
phoebe
Posts: 25
Joined: Fri Aug 01, 2003 9:29 am

slow performance with queries

Post by phoebe »

It may have been webserver timeout. It doesn't happen very often, and it is not my main concern which is general slowness.
We run many crawls which feed the resulting databases to a main search database using a unique hash id to keep them distinct. The database now has 3+ million pages and growing. The searches seem to get slower as it grows.
If this doesn't resolve, we may have to split up the db abd piece the results together with relevance ranking. Will that run faster?
We are using the linux version with 2 G ram.

For this query, without the Catno clause, the performance is fine and using likep(not liker,like3) makes the speed tolerable.
Is there a reason why likep should not be used in the query instead of like for Catno?
The results are very different: the rank is completely off, but the pages don't seem to be less relevant.

select Url,Catno,count(*),$rank r from html where Title\Description\Keywords\Meta\Body likep 'breast cancer' and Catno likep 'a1 b5' group by Depth;

Title\Description\Keywords\Meta\Body is a metamorph inverted index
and
Catno is a regular metamorph index
phoebe
Posts: 25
Joined: Fri Aug 01, 2003 9:29 am

slow performance with queries

Post by phoebe »

Also another oddity:
SQL 1>select Url,Catno,count(*),$rank r from html where Title\Description\Keywords\Meta\Body likep 'breast cancer' and Catno likep '+b3 +j9';
Url Catno count(*) r
------------+------------+------------+------------+
http://www.accc-cancer.org/ a0,b3,d2,g1, 100 183

j9 is not in the Catno field.
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

slow performance with queries

Post by John »

If the first LIKEP sufficiently reduces the result set the second LIKEP may not use the index, and will only affect the rank value. It will not exclude non-matching records, just lower their rank.
John Turnbull
Thunderstone Software
Post Reply