indexing .com .edu .au etc...

Post Reply
resume.robot
Posts: 68
Joined: Sat Jan 13, 2001 1:23 am

indexing .com .edu .au etc...

Post by resume.robot »

What's the best way to index the extensions, such as .com, .edu, .au, .uk etc. in the Url field using Version 2.52/2.56?

Currently we are using addexp to index C++, with the vortex script following. This script does not include Url.

Can I add Url to the script (Url\Title\Meta\Body) add more lines to the same script to index the dot, such as

<sql "set addexp = '.com'"></sql>
<sql "set addexp = '.edu'"></sql>
<sql "set addexp = '.au'"></sql>
<sql "set addexp = '.uk'"></sql>

Or a single line

<sql "set addexp = '.com .edu .au .uk'"></sql>

Or does it need a separate script?

Or is there a better way to do it?

Can addexp be used on a populated database, or should it be a new one?

Here is the script currently being used at the time a database is created:

<SCRIPT LANGUAGE=vortex>

<TIMEOUT = 18000>
<H4>Time Exceeded</H4>
Your query exceeded the time limit.
</TIMEOUT>

<DB = /db>

<A name=main>
<user=_SYSTEM>
<sql "drop index xhtmlbod"></sql>
<sql "set addexp = 'C\+\+'"></sql>
<sql "create metamorph inverted index xhtmlbod on html(Title\Meta\Body)"></sql>
</A>

</SCRIPT>

Thanks

Mike Clark
User avatar
mark
Site Admin
Posts: 5515
Joined: Tue Apr 25, 2000 6:56 pm

indexing .com .edu .au etc...

Post by mark »

Yes, you can add the Url field to the metamorph index. Make sure you also add it in your search script. You will need individual addexp's for the extensions or one expression that matches anything like them:

<sql "set addexp = '\.com'"></sql>
<sql "set addexp = '\.au'"></sql>

or

<sql "set addexp = '>>\.=\alpha{2,3}\F[^\alpha]'"></sql>

If you want to search by Url only, you should create a new metamorph index on just Url and just search Url. In that case you would not need any special expressions and could just search for "com" or "au" instead of ".com" or ".au".
resume.robot
Posts: 68
Joined: Sat Jan 13, 2001 1:23 am

indexing .com .edu .au etc...

Post by resume.robot »

Thanks, Mark

So then, is this the way to do it?

<A name=main>
<user=_SYSTEM>
<sql "drop index xhtmlbod"></sql>
<sql "set addexp = 'C\+\+'"></sql>
<sql "set addexp = '\.com'"></sql>
<sql "set addexp = '\.au'"></sql>
<sql "set addexp = '\.uk'"></sql>
<sql "create metamorph inverted index xhtmlbod on html(Url\Title\Meta\Body)"></sql>
</A>


Mike
User avatar
mark
Site Admin
Posts: 5515
Joined: Tue Apr 25, 2000 6:56 pm

indexing .com .edu .au etc...

Post by mark »

Yes. That's one way to do it. Searches will then find occurrences of ".com" within the text as well as the urls.
resume.robot
Posts: 68
Joined: Sat Jan 13, 2001 1:23 am

indexing .com .edu .au etc...

Post by resume.robot »

Thanks again

We would modify a search string something like

<SQL SKIP=$skip MAX=100
"select 'http://' + Url Url, Title, Body, length(Body) Size, id, Visited from html where Title\Meta\Body likep $query and (Title\Meta\Body like 'good,stuff') and not (Title\Meta\Body like '(bad,stuff)' and Url like '.au,.nz'">
User avatar
mark
Site Admin
Posts: 5515
Joined: Tue Apr 25, 2000 6:56 pm

indexing .com .edu .au etc...

Post by mark »

No. If your metamorph index is on Url\Title\Meta\Body your query should be on Url\Title\Meta\Body. That's why I suggested creating a separate index on only Url if you want to search Url by itself.

Use the advice in the last paragraph of message #2.
Post Reply