tweaking relevance ranking

User avatar
mark
Site Admin
Posts: 5513
Joined: Tue Apr 25, 2000 6:56 pm

tweaking relevance ranking

Post by mark »

Keywords can mean many things and vary by context. Please explain what functionality you're looking for when you say "IMMEDIATE".
KMandalia
Posts: 301
Joined: Fri Jul 09, 2004 3:50 pm

tweaking relevance ranking

Post by KMandalia »

I think I may be able to accomplish what I need without XTREE (or may be not :) ). When you see what I have written below you will understand what I meant by IMMEDIATE results from same URL.

The good thing about the algorithm below is that it breaks the streak of results from a single url until a different url is being shown. The bad thing is that it can only hold off for 1 URL and that's where I am looking to plug-in XTREE.

Here is the deal, I have 100 websites that I am crawling, one website is overwhelming all the time (because it usually has good results or has the title wrongly formatted. For example, one site has word 'credit' first thing in title and someone searching for 'credit' gets bombarded with just that one website. I want to avoid this). So what I want to do is unless there is a big change in relevance percentage point (>2%), I don't want to display a site again that I have displayed already.

I want to use xtree (if I can) in the following.

FIRST RESULT

STORE THE REX'ED URL IN A VARIABLE
COUNT=1

FROM NEXT RECORD ONWARDS

REX THE URL
IF THE VARIABLE HAS THE SAME URL AND COUNT=2
SKIP THE RECORD
ELSE IF THE VARIABLE HAS THE SAME URL AND COUNT=1
SHOW THE RESULT
COUNT++
ELSE
SHOW THE RECORD
UPDATE THE VARIABLE WITH THE NEW URL
COUNT=1
END IF
NEXT
User avatar
John
Site Admin
Posts: 2597
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

tweaking relevance ranking

Post by John »

The algorithm you have should work if you change the order by to order by $rank, Url so that the same site will be grouped together in the results for the same rank value. You could even say order by $rank/50, Url to collapse similar ranks together.
John Turnbull
Thunderstone Software
KMandalia
Posts: 301
Joined: Fri Jul 09, 2004 3:50 pm

tweaking relevance ranking

Post by KMandalia »

I tried to implement it and it works but it would skip the entire results and the numbers shown next to results would be skipped to.

Now i think what I need can't be done. What I need is to skip the record as needed and come back to the skipped record (May be i can put the skipped record in the xtree? in that case I somehow need to figure out how to maintain the result numbers so they don't jump around)
User avatar
mark
Site Admin
Posts: 5513
Joined: Tue Apr 25, 2000 6:56 pm

tweaking relevance ranking

Post by mark »

It can be done. It's just a matter of precise bookkeeping and exporting the needed values to get back where you left off.
josmani
Posts: 53
Joined: Tue Jun 03, 2003 3:38 am

tweaking relevance ranking

Post by josmani »

I have a set of HTML pages and PDF documents in the same table. Depending on what the user searches, I want the the HTML pages to have more weight in ranking than the PDFs in search results. I don't want to sort by document type and then relevance otherwise all HTML pages appear first irrespective of how high they rank.

Is this possible?

PS: This for full Texis version 6, not Webinator.
User avatar
Kai
Site Admin
Posts: 1271
Joined: Tue Apr 25, 2000 1:27 pm

tweaking relevance ranking

Post by Kai »

It would require some edits to the script, and probably to the html table schema as well (the latter is possible since you have a full Texis license). The basic idea is to create a new column -- or usurp an existing unused one -- in the html table to store a rank bias value; let's call it `rankBias'. This can then be set during crawls to a custom value per row, depending on how much you wish to bias that row's rank up (positive) or down (negative) in results, e.g. based on the URL file extension or other factors.
I.e. a typical starting point for this value would be 10 or 20 for up-ranked rows (or -10 or -20 for down-ranked rows), and 0 for no-change rows. Then the search SQL is modified to `SELECT $rank + rankBias rawrank' instead of `SELECT $rank rawrank', and an `ORDER BY $rank + rankBias' is added, so that results are sorted by rank plus the new bias, not just the original text-query-relevance-only rank. This search SQL modification would only be done for queries where you want the special biasing.

Note that there are some bugs in current versions of Texis that may preclude this from fulling working (e.g. some low-rank rows may be truncated before `rankBias' can up-rank them in the ORDER BY).

We are working on integrating a general version of this feature -- customizable rank biasing -- into a near-term release of Webinator/Texis. So you could also consider contacting sales in a few months to see if/when that feature is added, and upgrading to get it.
Post Reply