Help with index

Post Reply
mjacobson
Posts: 204
Joined: Fri Feb 08, 2002 3:35 pm

Help with index

Post by mjacobson »

I have a table with over 30 million rows of URLs. I added an index to the table by:

<sql "create metamorph index xulrs on urltable(url)"></sql>

I have the following set before I created the index:
<sql "set hyphnephrase=0"></sql>
<sql "set delexp=0"></sql>
<sql "set addexp='\alnum{2,99}'"></sql>
<sql "set addexp='>>\alnum\+\_\x24\x27\x2E\xa0-\xff]{2,99}'"></sql>
<sql "set ignorecase=0"></sql>
<apicp keepnoise 1>

The urltable has an id and a url field. I need to select the rows from the table that match a host value. In my table I have urls like:

http://opensource.xxx.com/file1.html
http://somesite.xxx.com/openurl=opensou ... file1.html
http://someothersite.com/Opensource.html

When I do the following, I get all of these type of URLs returned and I only want the first type, http://opensource.xxx.com/file1.html which I have over 6 million of in the database.

<sql "select url AS Url from urltable WHERE url likep 'opensource.xxx.com'"></sql>
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Help with index

Post by mark »

User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

Help with index

Post by John »

A couple other things:

Were there any messages generated by the query?

You are missing an open square bracket "[" in your index expression.

You might want to use LIKE instead of LIKEP, as it doesn't look as if the ranker would make much difference.

You could also use an index expression such as:

>>=http://\P=[^/]+

to index the hostname only.
John Turnbull
Thunderstone Software
Post Reply