Lots of errors with domain walk

Post Reply
michel.weber
Posts: 256
Joined: Sat Oct 08, 2005 12:40 pm

Lots of errors with domain walk

Post by michel.weber »

Hi

I am trying to investigate why our web sites generate tens of thousans of errors.

Looking at the wlk status, I finds loads of lines like these :

0 pages fetched (0 bytes) from http://press.coe.int/cp/2005/636a(2005).htm
started 3 (22257) Resume 4491545e4
Temporary Error limit exceeded (current: 6, limit: 5)
0 pages fetched (0 bytes) from http://press.coe.int/cp/2005/636a(2005).htm
started 3 (22258) Resume 4491545e4
Temporary Error limit exceeded (current: 6, limit: 5)

They all seem to be related to the same document and the same error (the site http://press.coe.int does not exist anymore).

As we have lots of small sites (about 100) i am doing a domain walk.

Now heres my question(s).
- How are those errors counted? Do they count as one?
- How do i best handle the deletion of a site? Should i do a full rewalk or can i just wait for all the pages out of the index?
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

Lots of errors with domain walk

Post by John »

The best thing to do would be to go into List/Edit URLs and delete all URLs with the pattern http://press.coe.int/*.

It looks as if the name server is not returning a definitive host not found for press.coe.int, and so the crawl will treat it as a temporary error and not remove existing URLs until it gets a definitive error. That prevents a refresh from removing results if a server is termporarily off line.

It is trying to resume a walk from the base url that it found, however it is trying 5 or 6 urls, they are all failing, which would count as 5 errors.
John Turnbull
Thunderstone Software
michel.weber
Posts: 256
Joined: Sat Oct 08, 2005 12:40 pm

Lots of errors with domain walk

Post by michel.weber »

You are right, it is still in the DNS.

Thanks, i'll delete the url's right away.
Post Reply