Page 3 of 3

latest dowalk doesn't crawl pages without extensions

Posted: Fri Aug 31, 2007 9:16 am
by hiti
Mark
Please letme know how to request for tech support ticket.One thing more i was crawling a site http://linux.slashdot.org and i added http://slashdot.org/
as extra domain but didn't get any success.
Can u help me with that
Thanks
Hiti

latest dowalk doesn't crawl pages without extensions

Posted: Fri Aug 31, 2007 9:45 am
by John
Click on Tech Support at the top of this page and fill in the form.

The Extra Domain would be slashdot.org as you should not put the http:// etc on.

latest dowalk doesn't crawl pages without extensions

Posted: Tue Sep 04, 2007 9:41 am
by hiti
I am writing the customised code for my all the sites.Is there any way by which we can replace a single quote in the regular expression.Because i have come across many sites that use single quotes in the html. So i want to know how can we make use of single quotes in the regular expression


Would statement like the below will work?

<rex '>><h1 class\='newsheadlinearticle'>\P=!</h1>+\F</h1>' $rawdoc><$StoryTitle=$ret>

latest dowalk doesn't crawl pages without extensions

Posted: Tue Sep 04, 2007 9:58 am
by John
No, the best way is to use the hex escape for single quote, \x27.

latest dowalk doesn't crawl pages without extensions

Posted: Wed Sep 05, 2007 9:25 am
by hiti
Ok Is the below statement correct?

<rex '>><h1 class\=\x27'newsheadlinearticle\x27'>\P=!</h1>+\F</h1>' $rawdoc><$StoryTitle=$ret>

I have written the expression for this statement <h1 class='newsheadlinearticle'>Test Title</h1>

latest dowalk doesn't crawl pages without extensions

Posted: Wed Sep 05, 2007 9:52 am
by John
No, it would be:

<rex '>><h1 class\=\x27newsheadlinearticle\x27>\P=!</h1>+' $rawdoc><$StoryTitle=$ret>