Page 1 of 1

problems with crawl

Posted: Thu May 29, 2008 6:10 pm
by jgdoke
The crawl started with resume, got 40 of these types of errors, cranked along slowly (~500 pages per hour)
then hit the process memory limit, Here is what is says after that limit. What is the issue? The crawl seems to keep going just slowly.

2008-05-29 06:11:57 Process memory limit exceeded (current: 700,018,688; limit: 700,000,000) (2387)
1797 pages fetched (2,147,483,647 bytes) from http://literature.rockwellautomation.co ... egory.hcst took 20 hours 5 minutes 59 seconds
2008-05-29 06:26:02 started 1 (23862) Resume 483c45671b
Using primer: http://literature.rockwellautomation.co ... egory.hcst

005 /usr/local/morph3/texis/scripts/dowalk(procpage) 4056: Bad size 279123982 for column Ref before offset 0x4 in recid 0x0 of (temp RAM DBF) in the function pbuftofld while processing url http://literature.rockwellautomation.co ... %20English

005 /usr/local/morph3/texis/scripts/dowalk(procpage) 4056: Bad size 279123982 for column Ref before offset 0x4 in recid 0x0 of (temp RAM DBF) in the function pbuftofld while processing url http://literature.rockwellautomation.co ... %20English

005 /usr/local/morph3/texis/scripts/dowalk(procpage) 4056: Bad size 279123982 for column Ref before offset 0x4 in recid 0x0 of (temp RAM DBF) in the function pbuftofld while processing url http://literature.rockwellautomation.co ... %20English

005 /usr/local/morph3/texis/scripts/dowalk(proclinks) 3688: Bad size 279123982 for column Ref before offset 0x4 in recid 0x0 of (temp RAM DBF) in the function pbuftofld while processing url http://literature.rockwellautomation.co ... %20English

005 /usr/local/morph3/texis/scripts/dowalk(proclinks) 3688: Bad size 279123982 for column Ref before offset 0x4 in recid 0x0 of (temp RAM DBF) in the function pbuftofld while processing url http://literature.rockwellautomation.co ... %20English

problems with crawl

Posted: Fri May 30, 2008 5:44 pm
by jgdoke
Here is the walk status from today:

10:36 am friday 5/30/08
153,779 pages in todo
0 pages scheduled to be refreshed in the next hour
777 pages visited in the last hour (0 success/777 failed)
9,088 pages in index

4:41 pm friday. 5/30/08
196,586 pages in todo
0 pages scheduled to be refreshed in the next hour
537 pages visited in the last hour (0 success/537 failed)
9,089 pages in index

Almost 200,000 pages in TODO? How can I find out what pages it needs to do???

problems with crawl

Posted: Mon Jun 02, 2008 9:20 am
by jgdoke
Monday morning:
203,975 pages in todo
0 pages scheduled to be refreshed in the next hour
196 pages visited in the last hour (0 success/196 failed)
9,186 pages in index

At this rate it won't be done for 42 days.
I need to find out what pages are in todo so I can change the walk to avoid them.

problems with crawl

Posted: Mon Jun 02, 2008 1:04 pm
by John
It appears from the first message that there may be an issue with an index on one of the database tables. If you open a tech support ticket we can figure out the best way to resolve it.