Crawling Issues in Few Sites

hiti
Posts: 26
Joined: Tue Aug 07, 2007 3:37 am

Crawling Issues in Few Sites

Post by hiti »

There are few sites that do not show any success on the crawling.We get only the failures for such site.However we have crawled the site both ways by putting the customised code in the dowalk script and crawling it without the customised code.But neither of the two options work for such sites.
Can u suggest some way out please
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

Crawling Issues in Few Sites

Post by John »

Increase the verbosity on the crawl and check for error messages.
John Turnbull
Thunderstone Software
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Crawling Issues in Few Sites

Post by mark »

You'd want verbosity 4.
hiti
Posts: 26
Joined: Tue Aug 07, 2007 3:37 am

Crawling Issues in Few Sites

Post by hiti »

I have already set the verbosity to 4.But got no success.To get this profile crawled i have made a new profile also but it also proved futile.
Plz help
hiti
Posts: 26
Joined: Tue Aug 07, 2007 3:37 am

Crawling Issues in Few Sites

Post by hiti »

I am still waiting to hear frm u guys.Plz guide me if i have to do some specific setting in the webinator.What can be the reason of some sites which didnt get crawl at all.Is the code which we write for them is the reason or settings in the webinator are need to be reviewd again.
Plz letme know.
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

Crawling Issues in Few Sites

Post by John »

We would need more specifics as to how it is not working. Can you point us at the sites and the settings you are using that aren't working with a non-customized crawler?
John Turnbull
Thunderstone Software
hiti
Posts: 26
Joined: Tue Aug 07, 2007 3:37 am

Crawling Issues in Few Sites

Post by hiti »

The site is shopping.com
Code for it is
<if $baseUrl eq "http://shopping.com"><!-- Not Working -->
<rex '>><div class\="contentIndent">\P=!</h1>+\F</h1>' $rawdoc><$StoryTitle=$ret>
<rex '>><div class\="prodImage">\P=!</div>+\F</div>' $rawdoc><$ImgRowData=$ret>
<rex '>><div id\="long" style\="display: block;">\P=!</div>+\F</div>' $rawdoc><$StoryRowDescription=$ret>
<sandr '>><div class\="boxMid">=!<div class\="boxBtmRt">+<div class\="boxBtmRt">' '' $rawdoc><$rawdoc=$ret>
<sandr '>><div id\="saiArea">=!</iframe>+</iframe>' '' $rawdoc><$rawdoc=$ret>
<$ImgRowData=$StoryRowDescription>
<$SiteName ="Shopping">
<filterStory>
</if>
hiti
Posts: 26
Joined: Tue Aug 07, 2007 3:37 am

Crawling Issues in Few Sites

Post by hiti »

John this site have pages somewhat like this

http://shopping.com/xPF-Canon-EOS-400D- ... -55mm-Lens

.No extensions at all.The
page contains the link that do not have any extension.When i crawled such a
site then i didn't get any success only failures.Is there a way to crawl
such kind of pages.Please tell me what changes i need to do in my settings
to get this site crawled.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Crawling Issues in Few Sites

Post by mark »

Verbosity 4 won't change the walk but it will help you diagnose problems. Make sure you do a new walk instead of refresh when trying to debug such problems.

Go to List/Edit urls. Lookup the page that links to that page to get the details about it. Then click the "children" link on the detail page. Look for the expected url. If it's clickable it's in the database. If not it's not in the database and, with verbosity at 4, there should be a reason to the right indicating why.
hiti
Posts: 26
Joined: Tue Aug 07, 2007 3:37 am

Crawling Issues in Few Sites

Post by hiti »

Mark

All the pages that i crawled for this site gets failed .No success at all.When none of the page get stored in the database then how can i search in the List/Edit urls.
Plz tell me what specific settings are needed to get this site crawl.
I even tried by making a new profile but got no success
Plz Help
Post Reply