Texis ranking engine

Post Reply
kevin31
Posts: 74
Joined: Fri Nov 01, 2002 12:45 pm

Texis ranking engine

Post by kevin31 »

We are finding that we cannot get any row in our Texis DB to be ranked higher than approx. 800 during a query using a metamorph index. For example, the results of a query on 'blogger poodle' might look something like
$rank TextBlob
-----+------------+
791 blogger poodle blogger poodle poodle
787 blogger poodle
395 blogger blogger
395 poodle poodle poodle

No matter how precisely the query matches the contents of the text field we cannot get the ranks to go higher. We have tried exact phrases enclosed in quotes as well.

The above query was done directly from the TSQL command prompt with no changes to apicp settings.

Our vortex script showed very similar results with these values set:
<sql db="$Datadb" "set likepleadbias=0"></sql>
<sql db="$Datadb" "set likepdocfreq=500"></sql>
<sql db="$Datadb" "set likeporder=750"></sql>
<sql db="$Datadb" "set likeptblfreq=750"></sql>
<sql db="$Datadb" "set likepproximity=750"></sql>
<apicp alequivs 1>
<apicp suffixproc 1>
<apicp alpostproc 0> <!-- Allow post-processing-->
<apicp alintersects 1>
<apicp minwordlen 5>
<apicp qminwordlen 2>
<sql db="$Datadb" "set allinear=0"></sql>

Is there a way we can adjust the ranking engine to improve this?

Our version of Texis is
Texis Web Script (Vortex) Copyright (c) 1996-2001 Thunderstone - EPI, Inc.
Commercial Version 3.01.992447526 of Jun 13, 2001 (i686-intel-winnt-32)
User avatar
Kai
Site Admin
Posts: 1272
Joined: Tue Apr 25, 2000 1:27 pm

Texis ranking engine

Post by Kai »

Part of the rank calculation involves table and document frequency of the term. These are never "perfect", and in fact are asymptotic because of the wide (near infinite) range of values. "Perfect" table frequency is approached if the term occurred exactly once in the table; perfect doc frequency if the document is entirely that term. Even if these occur, the rank may not be a perfect 1000, because of averaging with multiple search terms, and because the table/doc frequency computation is geared towards typical values, not obtaining exactly 1000 for a given circumstance.

You should be able to obtain a 1000 rank by setting table and doc frequency to 0 (in addition to leadbias if you're searching more than one term). Keep in mind that then these factors will not affact rank for non-1000 ranks (ie. ranks may be coarser).
Post Reply