What kind of index to use?

Faiz
Posts: 109
Joined: Wed Jan 10, 2001 1:29 pm

What kind of index to use?

Post by Faiz »

I want to be able to search for either Word documents or PDF documents or HTML/TEXT documents, by matching their file extensions (.doc, .pdf, etc). Do I have to create a compound index on the field Url for better performance or just create a separate index on Url?
Url is a varchar field. Creating a metamorph index on it gives a warning. Is it okay?

Thanx,
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

What kind of index to use?

Post by John »

What sort of query are you doing? A compound index probably isn't appropriate. If you are searching on the document type and contents you probably want a virtual field index on Url\Content. You might want to add an index expression such as:

set addexp='\.=\alnum{1,4}>>=';

to just index the extensions, and you could then do a query such as:

WHERE Url\Content LIKEP 'query +.pdf';

to find .pdf files containing the word query.
John Turnbull
Thunderstone Software
Faiz
Posts: 109
Joined: Wed Jan 10, 2001 1:29 pm

What kind of index to use?

Post by Faiz »

Thanx. I also have a couple of addexp statements. Is the following expression correct?
<sql "set delexp=0"></sql>
<sql "set addexp='\alnum{1,30}'"></sql>
<sql "set addexp='\.=\alnum{1,4}>>='"></sql>
<sql "set addexp='>>\alnum=[\alnum\+\_\x24\x27\x2E\xa0-\xff]{1,30}'"></sql>
<sql "create metamorph inverted index xdocbod on doc_category(Url\Title\Description\Keywords\Body,Prod,Catid,QPFlag,Docid,DocFlag)"></sql>

I could then run the query,
WHERE Url\Title\Description\Keywords\Body likep '$query + .pdf'
to search for PDF docs only AND
WHERE Url\Title\Description\Keywords\Body likep $query
to search all documents.

Please let me know whether the syntax is correct.

Regards,
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

What kind of index to use?

Post by mark »

There should not be a space between + and .pdf: +.pdf
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

What kind of index to use?

Post by mark »

Also, your usage of likep '$query + .pdf' is incorrect. It should be something like:
<sum "%s" $query " +.pdf"><$xquery=$ret>
<sql ... likep $xquery
Faiz
Posts: 109
Joined: Wed Jan 10, 2001 1:29 pm

What kind of index to use?

Post by Faiz »

No records are returned when I try to find only PDF documents. Any idea where I might be going wrong? When I do a search on all documents, I get plenty of results though.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

What kind of index to use?

Post by mark »

View the source of the results page and check for error/warning messages within html comments.

Also, did you drop and rebuild the metamorph index when you changed the index expressions? That's required.
Faiz
Posts: 109
Joined: Wed Jan 10, 2001 1:29 pm

What kind of index to use?

Post by Faiz »

View source doesn't show any error or warning and I DO drop the index before creating it again. I am using the index expressions I mentioned in my previous posting to create the indexes. The sql query I use is this,
<sum "%s" $query " +.pdf"><$xquery=$ret>
<sql row "select id,Url,Title,Prod from doc_category
where Url\Title\Description\Keywords\Body likep $xquery
;">
and it returns no records. Is my index expression correct?
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

What kind of index to use?

Post by mark »

'\.=\alnum{1,4}>>=' would be good if you were searching just URL. For the virtual field you're using you would probably want '\.=\alnum{1,4}\F\n'
Faiz
Posts: 109
Joined: Wed Jan 10, 2001 1:29 pm

What kind of index to use?

Post by Faiz »

It works now. Thanx. Another quick question. If I want to be able to search documents posted within a time period, do I have to just add the DATE field in creating index(compound) and in the sql query, have it in where clause condition?

Regards,
Post Reply