Page 1 of 3

Crawling Issues in Few Sites

Posted: Fri Aug 17, 2007 3:11 am
by hiti
There are few sites that do not show any success on the crawling.We get only the failures for such site.However we have crawled the site both ways by putting the customised code in the dowalk script and crawling it without the customised code.But neither of the two options work for such sites.
Can u suggest some way out please

Crawling Issues in Few Sites

Posted: Fri Aug 17, 2007 9:51 am
by John
Increase the verbosity on the crawl and check for error messages.

Crawling Issues in Few Sites

Posted: Fri Aug 17, 2007 10:43 am
by mark
You'd want verbosity 4.

Crawling Issues in Few Sites

Posted: Sat Aug 18, 2007 12:14 am
by hiti
I have already set the verbosity to 4.But got no success.To get this profile crawled i have made a new profile also but it also proved futile.
Plz help

Crawling Issues in Few Sites

Posted: Mon Aug 20, 2007 8:02 am
by hiti
I am still waiting to hear frm u guys.Plz guide me if i have to do some specific setting in the webinator.What can be the reason of some sites which didnt get crawl at all.Is the code which we write for them is the reason or settings in the webinator are need to be reviewd again.
Plz letme know.

Crawling Issues in Few Sites

Posted: Mon Aug 20, 2007 9:28 am
by John
We would need more specifics as to how it is not working. Can you point us at the sites and the settings you are using that aren't working with a non-customized crawler?

Crawling Issues in Few Sites

Posted: Mon Aug 20, 2007 9:31 am
by hiti
The site is shopping.com
Code for it is
<if $baseUrl eq "http://shopping.com"><!-- Not Working -->
<rex '>><div class\="contentIndent">\P=!</h1>+\F</h1>' $rawdoc><$StoryTitle=$ret>
<rex '>><div class\="prodImage">\P=!</div>+\F</div>' $rawdoc><$ImgRowData=$ret>
<rex '>><div id\="long" style\="display: block;">\P=!</div>+\F</div>' $rawdoc><$StoryRowDescription=$ret>
<sandr '>><div class\="boxMid">=!<div class\="boxBtmRt">+<div class\="boxBtmRt">' '' $rawdoc><$rawdoc=$ret>
<sandr '>><div id\="saiArea">=!</iframe>+</iframe>' '' $rawdoc><$rawdoc=$ret>
<$ImgRowData=$StoryRowDescription>
<$SiteName ="Shopping">
<filterStory>
</if>

Crawling Issues in Few Sites

Posted: Mon Aug 20, 2007 9:33 am
by hiti
John this site have pages somewhat like this

http://shopping.com/xPF-Canon-EOS-400D- ... -55mm-Lens

.No extensions at all.The
page contains the link that do not have any extension.When i crawled such a
site then i didn't get any success only failures.Is there a way to crawl
such kind of pages.Please tell me what changes i need to do in my settings
to get this site crawled.

Crawling Issues in Few Sites

Posted: Mon Aug 20, 2007 10:46 am
by mark
Verbosity 4 won't change the walk but it will help you diagnose problems. Make sure you do a new walk instead of refresh when trying to debug such problems.

Go to List/Edit urls. Lookup the page that links to that page to get the details about it. Then click the "children" link on the detail page. Look for the expected url. If it's clickable it's in the database. If not it's not in the database and, with verbosity at 4, there should be a reason to the right indicating why.

Crawling Issues in Few Sites

Posted: Tue Aug 21, 2007 9:31 am
by hiti
Mark

All the pages that i crawled for this site gets failed .No success at all.When none of the page get stored in the database then how can i search in the List/Edit urls.
Plz tell me what specific settings are needed to get this site crawl.
I even tried by making a new profile but got no success
Plz Help