Crawling Issues in Few Sites

User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Crawling Issues in Few Sites

Post by mark »

You need to escape the = which are special to rex.
...class\=\x27...
hiti
Posts: 26
Joined: Tue Aug 07, 2007 3:37 am

Crawling Issues in Few Sites

Post by hiti »

Thanks a bunch Mark.U caught my error.I left that slash by mistake.This site now gets crawl perfectly.Thanks again
hiti
Posts: 26
Joined: Tue Aug 07, 2007 3:37 am

Crawling Issues in Few Sites

Post by hiti »

Mark i want to clear one more thing
<h3 style="font-size: 14px; display: inline; font-weight: normal; margin: 0; padding: 0; clear: none; ">

For this I have made the regular expression as below

<rex '>><h3 style\="font-size\: 14px\; display\: inline\; font-weight\: normal\; margin\: 0\; padding\: 0\; clear\: none\; ">\P=!</h3>+\F</h3>' $rawdoc><$StoryTitle=$ret>
I have replaced colon,semilcolon and equal to sign .Is it correct?
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Crawling Issues in Few Sites

Post by mark »

hiti
Posts: 26
Joined: Tue Aug 07, 2007 3:37 am

Crawling Issues in Few Sites

Post by hiti »

Thanks Mark.It worked without escapement.
neetu
Posts: 9
Joined: Wed Aug 22, 2007 1:07 am

Crawling Issues in Few Sites

Post by neetu »

I have a strange problem while crawling the sites.When i started the crawling success and failure pages get increased .The sites get crawled and everything worked fine.But suddenly i noticed in the walk status of the webinator that success rate is decreasing.It get decreased from 1079 to 609 and same thing happened with the failed page.
the walk Status was somewhat like
47 pages in todo
0 pages scheduled to be refreshed in the next hour
9,209 pages visited in the last hour (519 success/8,690 failed)
3,695 pages in index

Really weird.....Can you tell the possible reason of this strange behavior?
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

Crawling Issues in Few Sites

Post by John »

Are you talking about the like of pages visited in the last hour? If so it is possible that you are crawling a slower site, or slower portion of the site, e.g. dynamically generated, and so the walk rate has decreased, and the number of successful and failed pages in the last hour has decreased.
John Turnbull
Thunderstone Software
Post Reply