for one of our applications we need the the query to be fully resolved down to the last hit. However, setting likeprows to 0 incurs a substantial performance hit (10 or more seconds for results as opposed to <1s for likeprows = 200), almost the same performance if it is set to some larger nonzero value like 2000. (maxlinearrows is set to 1 as well). Once the result set is returned, re-executing is super fast (I assume the query is being cached). any way of speeding up returns of large result sets?
Most disks peak at about 100-200 operations a second, so the way to improve the results is to reduce the number of disk operations needed, which you can see caching accomplishes.
Are you returning all the results, or just a page of results sorted? If you are returning all the results then it needs to read that many records, and if they are not cached it will require disk operations.
If you are sorting the results and returning a page of say 20, then you can improve the results by using indexes for the sort, of which the most common would be a Metamorph compound index, which would include the sort key with the text field, e.g.
create metamorph inverted index MM_idx on Table(Text, Date);
End user has option of returing 'n' number of results from the search form, up to a max of 500 per page. Sort options are title and date. Title is the default sort and I used a regular index on title for the sorting function. Not sure if I can use a compound index in this particular situation, as I think you mentioned at Deerfield conference it is meant really for fixed length data. I attach the sql for you, it is easier than trying to explain it!
<SQL "set likeprows=3000"></SQL>
<SQL "set maxlinearrows=1"></SQL>
<$null = "">
<SQL SKIP=$skip MAX=$max ROW "select
id,
TITLE,
DAY,
MONTH,
YEAR,
STATUS,
CITATION,
HTMLSIZE,
RAWTEXT,
$$rank Rank from " $Table " where
SOURCE likep $Source and
TITLE likep $Title and
PUBLISHER likep $Publisher and
AUTHOR likep $Author and
SUBJECT likep $Subject and
RAWTEXT likep $Body and
KEYWORDS likep $Keywords and
REGION likep $Region and
COI likep $Coi and
COA likep $Coa and
CITATION likep $Citation and
YEAR = $Year and
LANGUAGES likep $Languages order by " $sortby>
The title sort is most likely the problem. It needs to read the records to get the titles to sort by them, which is what takes the time.
If you defined the Title as varchar(20) for example, and included that in the metamorph compound index then you would get the results sorted by the first 20 characters of the title, and it would be faster.
Would the compound index need to include all the search fields (I would think so)?? I also have my TITLE field (and others) set at varchar(255). We have much longer fields but that's about average. Will this reduce the efficiency of the compound search? Should I rebuild the table schema using lower varchar values to help the compound index?
You have one index for each field you are searching? In that case you would want to add TITLE to each index.
Yes, the longer you have the TITLE field defined then the less efficient the search will be, but the more accurate the sort. Depending on your titles you might find something in the 20-40 character range sufficient.
With your suggested solution, would a query against that table that didn't use the Date field, like
WHERE text likep ('soccer')
still use the index? tsql -V seems to indicate it does, but that's not what I'd expect so I wanted to ask.
Yes. When considering which index to use for a LIKEP or other text search operator it only looks at the first field. The first field maybe a virtual field, but it is effectively looking up to the first comma. If there are additional fields in the index after that comma that can help the query they will be used, but they do not affect whether or not the index will be used.
If I wanted to be able to sort results with multiple fields, would I create just 1 metamorph index that includes all sort keys or a separate metamorph index for each sort key?
You would create a single metamorph index with all the possible sort keys as additional fields. The order of the additional fields in the metamorph index does not impact its usage. The more data you put in there however the less the performance gain due to the amount of data being read and processed.