Page 1 of 1

getting top sites

Posted: Wed Nov 03, 2004 10:05 pm
by KMandalia
We are crawling some 50-60 websites. I am generating a report on query log table that gives me top search terms, top search hits (web pages) etc.

What I want to see is top 5 or 10 websites whose webpages were clicked by users.

I am not quite sure that querylog or html tables can help in this regard.How do I implement this feature?

getting top sites

Posted: Wed Nov 03, 2004 10:37 pm
by mark
I think you're looking for something like this
"select count(*) Count,sandr('http://=[^/]+.*','\1\2',Query) Site from querylog where Info matches 'what=u%' group by sandr('http://=[^/]+.*','\1\2',Query) order by 1 desc"

getting top sites

Posted: Mon Nov 08, 2004 10:40 am
by KMandalia
It did the job but I don't understand how you formed the expression..

Since we are walking domains, I want to take out the www and instead match only somesite.com part of http://www.somesite.com. How do I do that??

getting top sites

Posted: Mon Nov 08, 2004 12:38 pm
by mark
See the docs for sandr and rex. Let us know if you have specific syntax questions.

Change the sandr's to

sandr('http://=www\.?[^/]+.*','\3',Query)