Page 1 of 2

tweaking relevance ranking

Posted: Thu Jan 27, 2005 11:58 am
by KMandalia
When people use webinator on our site, some generic search term (like name of a company in a title) causes 10-20 results to be returned from the same website.

What I want to do is something like Google where I will limit results from a particular url to a maximum of 3 (I don't worry about showing them nested like google) but I want to stop the discouragement (and misunderstanding that the website in question in paying us to make sure top 10 results are from their site only) that our users feel when webinator return 10-15 top results from the same site.

How do I accomplish this?

tweaking relevance ranking

Posted: Thu Jan 27, 2005 2:24 pm
by mark
You'd have to modify the search script to stop showing results from the site(s) you've already seen. Search this board for
group results
for some discussion.

tweaking relevance ranking

Posted: Thu Jan 27, 2005 4:22 pm
by KMandalia
That was my first thought. Ideally I would want to show max 3 results from a site and then add it in a rex expression. Since some of the queries return thousands of results and I am ranking 500 rows, I would want to recycle through the list again.

But, I have noticed that some 'not matches' in sql statements that retrieve results slows down the page loading significantly then what effect this trivial approach will have on the speed of result delivery?

I am thinking that webinator is not really meant to be comparable to various search engines but it is so powerful that you can achieve almost everything that major search engines are doing when you put good amount of effort in it. How many of your customers use webinator in a similar fashion as Google or Yahoo (to crawl a targeted portion of www, very end-user oriented) or am I the only one?

tweaking relevance ranking

Posted: Thu Jan 27, 2005 4:52 pm
by mark
You wouldn't use not matches, since you want to change display in mid query. While displaying the hits you would keep an xtree of all hosts seen so far. Before displaying a hit lookup the host in your xtree and check the count. If the count is greater than 3 don't display this hit, just <continue>.

Webinator has been around longer than most other search engines. And it is one example of what can be done with our larger Texis product. It's design does what most users want/need in a site search. Since version 2 the search was opened up so that people that wanted to do specialized things or mimic other engines could do that themselves. Version 4 opened the indexer to customization as well.

tweaking relevance ranking

Posted: Fri Jan 28, 2005 8:30 am
by KMandalia
I completely agree with the second paragraph. Webinator provides powerful functionality for core searching if you are only searching your site. But now a days we have to deliver what users ask for.

I am not using 'not matches' for grouping results. I was just giving an example of modifying the standard sql calls. I will give the xtree approach a try to see how beneficial it will be considering all other factors.

Thanks for the help.

tweaking relevance ranking

Posted: Wed Apr 20, 2005 3:42 pm
by KMandalia
I was just visiting this discussion because I never got time to implement this functionality. Now I wish to implement this grouping.

If I have 100 websites and if I decide to show max 2 CONSECUTIVE results from a website, how to accomplish that using xtree?

What I want is not to give everybody an equal chance but to limit the no. of equal relevance ranked results from a website to 2 and then switch to the next site if there is any. The cut-off point could be >2% change in the result rank at which point the xtree gets cleared.

In short, I want to do something like Google. Google is essentially implementing xtree, but we are much too small scale and i just don't want to display 200 max records without any concerns for what is the result relevance.

tweaking relevance ranking

Posted: Wed Apr 20, 2005 4:38 pm
by mark
See <xtree>'s COUNT command or $ret.count .

tweaking relevance ranking

Posted: Wed Apr 20, 2005 5:39 pm
by KMandalia
I guess if the count is only incremented for immediate repeated insertion then that could be the starting point. Is it the actual case?

tweaking relevance ranking

Posted: Wed Apr 20, 2005 5:49 pm
by mark
Each count is incremented for each insertion. Clear or Flush can be used to clear the counts.

tweaking relevance ranking

Posted: Wed Apr 20, 2005 6:02 pm
by KMandalia
The keyword i am looking for is IMMEDIATE. How can xtree help me with that?