Page 1 of 1

sandr in query help

Posted: Sat Mar 13, 2004 11:29 pm
by phoebe
Hi
I am trying to extract the actual number of host names in the database from the Url field.
So I want to do something like:
select distinct sandr('>>http://=!/+/=.*=>>=','',Url) from html;
If I remove the last part (=.*=>>=), the above line will give me the domain name, but with the trailing pathname as well. Eg:
http://www.thelancet.com/era
becomes www.thelancet.comera
when all I want is
www.thelancet.com

Is there an easy way to do this?

Thanks,
P.

sandr in query help

Posted: Sun Mar 14, 2004 10:32 am
by phoebe
nevermind. I figured it out.

sandr in query help

Posted: Sun Mar 14, 2004 10:44 am
by John
sandr('>>http://=[^/]+.*', '\2', Url)

should do it. The \2 is the second subexpression, [^/]+, which should be the hostname. For single characters saying [^/]+ is more efficient than !/+.