latest dowalk doesn't crawl pages without extensions

hiti
Posts: 26
Joined: Tue Aug 07, 2007 3:37 am

latest dowalk doesn't crawl pages without extensions

Post by hiti »

Mark
Please letme know how to request for tech support ticket.One thing more i was crawling a site http://linux.slashdot.org and i added http://slashdot.org/
as extra domain but didn't get any success.
Can u help me with that
Thanks
Hiti
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

latest dowalk doesn't crawl pages without extensions

Post by John »

Click on Tech Support at the top of this page and fill in the form.

The Extra Domain would be slashdot.org as you should not put the http:// etc on.
John Turnbull
Thunderstone Software
hiti
Posts: 26
Joined: Tue Aug 07, 2007 3:37 am

latest dowalk doesn't crawl pages without extensions

Post by hiti »

I am writing the customised code for my all the sites.Is there any way by which we can replace a single quote in the regular expression.Because i have come across many sites that use single quotes in the html. So i want to know how can we make use of single quotes in the regular expression


Would statement like the below will work?

<rex '>><h1 class\='newsheadlinearticle'>\P=!</h1>+\F</h1>' $rawdoc><$StoryTitle=$ret>
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

latest dowalk doesn't crawl pages without extensions

Post by John »

No, the best way is to use the hex escape for single quote, \x27.
John Turnbull
Thunderstone Software
hiti
Posts: 26
Joined: Tue Aug 07, 2007 3:37 am

latest dowalk doesn't crawl pages without extensions

Post by hiti »

Ok Is the below statement correct?

<rex '>><h1 class\=\x27'newsheadlinearticle\x27'>\P=!</h1>+\F</h1>' $rawdoc><$StoryTitle=$ret>

I have written the expression for this statement <h1 class='newsheadlinearticle'>Test Title</h1>
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

latest dowalk doesn't crawl pages without extensions

Post by John »

No, it would be:

<rex '>><h1 class\=\x27newsheadlinearticle\x27>\P=!</h1>+' $rawdoc><$StoryTitle=$ret>
John Turnbull
Thunderstone Software
Post Reply