problems with crawl

Post Reply
jgdoke
Posts: 167
Joined: Wed Jul 14, 2004 10:52 am

problems with crawl

Post by jgdoke »

The crawl started with resume, got 40 of these types of errors, cranked along slowly (~500 pages per hour)
then hit the process memory limit, Here is what is says after that limit. What is the issue? The crawl seems to keep going just slowly.

2008-05-29 06:11:57 Process memory limit exceeded (current: 700,018,688; limit: 700,000,000) (2387)
1797 pages fetched (2,147,483,647 bytes) from http://literature.rockwellautomation.co ... egory.hcst took 20 hours 5 minutes 59 seconds
2008-05-29 06:26:02 started 1 (23862) Resume 483c45671b
Using primer: http://literature.rockwellautomation.co ... egory.hcst

005 /usr/local/morph3/texis/scripts/dowalk(procpage) 4056: Bad size 279123982 for column Ref before offset 0x4 in recid 0x0 of (temp RAM DBF) in the function pbuftofld while processing url http://literature.rockwellautomation.co ... %20English

005 /usr/local/morph3/texis/scripts/dowalk(procpage) 4056: Bad size 279123982 for column Ref before offset 0x4 in recid 0x0 of (temp RAM DBF) in the function pbuftofld while processing url http://literature.rockwellautomation.co ... %20English

005 /usr/local/morph3/texis/scripts/dowalk(procpage) 4056: Bad size 279123982 for column Ref before offset 0x4 in recid 0x0 of (temp RAM DBF) in the function pbuftofld while processing url http://literature.rockwellautomation.co ... %20English

005 /usr/local/morph3/texis/scripts/dowalk(proclinks) 3688: Bad size 279123982 for column Ref before offset 0x4 in recid 0x0 of (temp RAM DBF) in the function pbuftofld while processing url http://literature.rockwellautomation.co ... %20English

005 /usr/local/morph3/texis/scripts/dowalk(proclinks) 3688: Bad size 279123982 for column Ref before offset 0x4 in recid 0x0 of (temp RAM DBF) in the function pbuftofld while processing url http://literature.rockwellautomation.co ... %20English
jgdoke
Posts: 167
Joined: Wed Jul 14, 2004 10:52 am

problems with crawl

Post by jgdoke »

Here is the walk status from today:

10:36 am friday 5/30/08
153,779 pages in todo
0 pages scheduled to be refreshed in the next hour
777 pages visited in the last hour (0 success/777 failed)
9,088 pages in index

4:41 pm friday. 5/30/08
196,586 pages in todo
0 pages scheduled to be refreshed in the next hour
537 pages visited in the last hour (0 success/537 failed)
9,089 pages in index

Almost 200,000 pages in TODO? How can I find out what pages it needs to do???
jgdoke
Posts: 167
Joined: Wed Jul 14, 2004 10:52 am

problems with crawl

Post by jgdoke »

Monday morning:
203,975 pages in todo
0 pages scheduled to be refreshed in the next hour
196 pages visited in the last hour (0 success/196 failed)
9,186 pages in index

At this rate it won't be done for 42 days.
I need to find out what pages are in todo so I can change the walk to avoid them.
User avatar
John
Site Admin
Posts: 2597
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

problems with crawl

Post by John »

It appears from the first message that there may be an issue with an index on one of the database tables. If you open a tech support ticket we can figure out the best way to resolve it.
John Turnbull
Thunderstone Software
Post Reply