Walk of folder not collecting anything.

Post Reply
agcorwheeler
Posts: 28
Joined: Wed Dec 15, 2004 1:31 pm

Walk of folder not collecting anything.

Post by agcorwheeler »

I'm having more problems again ... I am trying to walk a folder on our server that has over 100,000 documents in it. When I start the walk it has no errors but is not collecting any data from the folder. All the counters still have 0 in them. The auto-refresh counter runs but nothing is happening. I checked the URL and it is correct. Any knowledge as to why it would do this?
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Walk of folder not collecting anything.

Post by mark »

So you're saying the walk status says the walk is still running after a while but there's no progress, not even the first page.?

Is it a file url or http?
What other non-default settings did you use?
Does it stop if you hit stop?
Under maintenance->view texis logs view the vortex.log file to see if it says anything since you started the walk.
agcorwheeler
Posts: 28
Joined: Wed Dec 15, 2004 1:31 pm

Walk of folder not collecting anything.

Post by agcorwheeler »

This is a file URL, it does stop, and there was nothing under the log after I started the walk. After nearly 2 hours it finished, with 2 pages, one of the pages was the directory (with an error of unwanted prefix, and was not the complete path of the directory) and the other of the directory (the actual complete path of the directory). So, it is acknowledging the directory but is not acknowledging any of the 100,382 documents. Do you have any idea as to why? Here is the report from the walk...
--------------------------------------------------------
Walk started at 2004-12-22 04:31:52 (by resume)
Verbosity set to 4
JavaScript walking enabled
HTTPS walking disabled
Start fetching at file://aghesforgr/foragerd/repository/documents/Ecn/
file://aghesforgr/foragerd/repository/documents/Ecn/
Ignore urls containing any of the following:
/cgi-bin/
~
?

started 1 new (10144) on file://aghesforgr/foragerd/repository/documents/Ecn/
Process memory limit exceeded (current:166461440, limit:35000000)
1 pages fetched (5,408,454 bytes) from file://aghesforgr/foragerd/repository/documents/Ecn/
1 errors

Updating search index ...Done.
Creating spell-checker dictionaries...Done.
Verifying usability of new walk.

Walk finished at 2004-12-22 06:19:23 (took 1 hours 47 minutes 23 seconds)
Keeping database live: /usr/local/morph3/texis/ECN.41c93ccb3/db1

--------------------------------------------------------------------------------
Checking for broken hyperlinks...

The link : file://aghesforgr/foragerd/repository/documents/
Had this error: Unwanted prefix
Referenced by : file://aghesforgr/foragerd/repository/documents/Ecn
file://aghesforgr/foragerd/repository/documents/Ecn/
--------------------------------------------------------------------------------
End of report.

--------------------------------------------------------
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

Walk of folder not collecting anything.

Post by John »

Process memory limit exceeded (current:166461440, limit:35000000)

indicates that it quit due to using a lot of memory. A refresh should now start a lot quicker as it will have all the links from that directory stored to process. The time was spent extracting the links from the page, and checking each one against the rules. It looks as if there was a link to the parent directory, which was not followed.
John Turnbull
Thunderstone Software
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Walk of folder not collecting anything.

Post by mark »

Now that is has the initial directory, change "rewalk type" to "refresh" and click go to walk the remainder. It may also help to set your "Maximum Process Size" to unlimited.
Post Reply