tweaking relevance ranking

KMandalia
Posts: 301
Joined: Fri Jul 09, 2004 3:50 pm

tweaking relevance ranking

Post by KMandalia »

When people use webinator on our site, some generic search term (like name of a company in a title) causes 10-20 results to be returned from the same website.

What I want to do is something like Google where I will limit results from a particular url to a maximum of 3 (I don't worry about showing them nested like google) but I want to stop the discouragement (and misunderstanding that the website in question in paying us to make sure top 10 results are from their site only) that our users feel when webinator return 10-15 top results from the same site.

How do I accomplish this?
User avatar
mark
Site Admin
Posts: 5513
Joined: Tue Apr 25, 2000 6:56 pm

tweaking relevance ranking

Post by mark »

You'd have to modify the search script to stop showing results from the site(s) you've already seen. Search this board for
group results
for some discussion.
KMandalia
Posts: 301
Joined: Fri Jul 09, 2004 3:50 pm

tweaking relevance ranking

Post by KMandalia »

That was my first thought. Ideally I would want to show max 3 results from a site and then add it in a rex expression. Since some of the queries return thousands of results and I am ranking 500 rows, I would want to recycle through the list again.

But, I have noticed that some 'not matches' in sql statements that retrieve results slows down the page loading significantly then what effect this trivial approach will have on the speed of result delivery?

I am thinking that webinator is not really meant to be comparable to various search engines but it is so powerful that you can achieve almost everything that major search engines are doing when you put good amount of effort in it. How many of your customers use webinator in a similar fashion as Google or Yahoo (to crawl a targeted portion of www, very end-user oriented) or am I the only one?
User avatar
mark
Site Admin
Posts: 5513
Joined: Tue Apr 25, 2000 6:56 pm

tweaking relevance ranking

Post by mark »

You wouldn't use not matches, since you want to change display in mid query. While displaying the hits you would keep an xtree of all hosts seen so far. Before displaying a hit lookup the host in your xtree and check the count. If the count is greater than 3 don't display this hit, just <continue>.

Webinator has been around longer than most other search engines. And it is one example of what can be done with our larger Texis product. It's design does what most users want/need in a site search. Since version 2 the search was opened up so that people that wanted to do specialized things or mimic other engines could do that themselves. Version 4 opened the indexer to customization as well.
KMandalia
Posts: 301
Joined: Fri Jul 09, 2004 3:50 pm

tweaking relevance ranking

Post by KMandalia »

I completely agree with the second paragraph. Webinator provides powerful functionality for core searching if you are only searching your site. But now a days we have to deliver what users ask for.

I am not using 'not matches' for grouping results. I was just giving an example of modifying the standard sql calls. I will give the xtree approach a try to see how beneficial it will be considering all other factors.

Thanks for the help.
KMandalia
Posts: 301
Joined: Fri Jul 09, 2004 3:50 pm

tweaking relevance ranking

Post by KMandalia »

I was just visiting this discussion because I never got time to implement this functionality. Now I wish to implement this grouping.

If I have 100 websites and if I decide to show max 2 CONSECUTIVE results from a website, how to accomplish that using xtree?

What I want is not to give everybody an equal chance but to limit the no. of equal relevance ranked results from a website to 2 and then switch to the next site if there is any. The cut-off point could be >2% change in the result rank at which point the xtree gets cleared.

In short, I want to do something like Google. Google is essentially implementing xtree, but we are much too small scale and i just don't want to display 200 max records without any concerns for what is the result relevance.
User avatar
mark
Site Admin
Posts: 5513
Joined: Tue Apr 25, 2000 6:56 pm

tweaking relevance ranking

Post by mark »

See <xtree>'s COUNT command or $ret.count .
KMandalia
Posts: 301
Joined: Fri Jul 09, 2004 3:50 pm

tweaking relevance ranking

Post by KMandalia »

I guess if the count is only incremented for immediate repeated insertion then that could be the starting point. Is it the actual case?
User avatar
mark
Site Admin
Posts: 5513
Joined: Tue Apr 25, 2000 6:56 pm

tweaking relevance ranking

Post by mark »

Each count is incremented for each insertion. Clear or Flush can be used to clear the counts.
KMandalia
Posts: 301
Joined: Fri Jul 09, 2004 3:50 pm

tweaking relevance ranking

Post by KMandalia »

The keyword i am looking for is IMMEDIATE. How can xtree help me with that?
Post Reply