sandr in query help

Post Reply
phoebe
Posts: 25
Joined: Fri Aug 01, 2003 9:29 am

sandr in query help

Post by phoebe »

Hi
I am trying to extract the actual number of host names in the database from the Url field.
So I want to do something like:
select distinct sandr('>>http://=!/+/=.*=>>=','',Url) from html;
If I remove the last part (=.*=>>=), the above line will give me the domain name, but with the trailing pathname as well. Eg:
http://www.thelancet.com/era
becomes www.thelancet.comera
when all I want is
www.thelancet.com

Is there an easy way to do this?

Thanks,
P.
phoebe
Posts: 25
Joined: Fri Aug 01, 2003 9:29 am

sandr in query help

Post by phoebe »

nevermind. I figured it out.
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

sandr in query help

Post by John »

sandr('>>http://=[^/]+.*', '\2', Url)

should do it. The \2 is the second subexpression, [^/]+, which should be the hostname. For single characters saying [^/]+ is more efficient than !/+.
John Turnbull
Thunderstone Software
Post Reply