Page 1 of 1

Showing links that were disallowed

Posted: Thu Nov 15, 2001 4:23 am
by neil.munro
Is it possible to find out which links have been rejected by the walk(rewalk due to disallowed protocol/MIME type or anything like that.
When I walk a database with any degree of verbosity, I can see links that are disallowed, but I want to find out what they are...?
(Besides going to that page and looking at the HTML code...)
...regards.

Showing links that were disallowed

Posted: Thu Nov 15, 2001 9:40 am
by mark
http://www.thunderstone.com/texis/site/ ... =Verbosity

Verbosity level 4 should do what you want. The "error" report will contain all rejected urls and the reasons.

Showing links that were disallowed

Posted: Fri Nov 16, 2001 4:34 am
by neil.munro
Thanks fo that...however, this report (as far as I can tell) doesn't show what I want. eg this is an extract from the walking of the database:
http://www.mywebsite.com/some-page.htm
. ........
32: TotLinks: 858, Links: 34/ 18, Good: 13, New: 1 Disallowed path(x/)
32: TotLinks: 858, Links: 34/ 6, Good: 23, New: 1 Disallowed protocol
32: TotLinks: 858, Links: 34/ 5, Good: 23, New: 1 Disallowed protocol
32: TotLinks: 858, Links: 34/ 4, Good: 23, New: 1 Disallowed protocol
32: TotLinks: 858, Links: 34/ 3, Good: 23, New: 1 Disallowed protocol
32: TotLinks: 858, Links: 34/ 2, Good: 23, New: 1 Disallowed protocol
32: TotLinks: 858, Links: 34/ 1, Good: 23, New: 1 Disallowed protocol

Sure, I can figure out the disallowd path..that is my doing...But the disallowed protocol? How can I find out what protocol it was trying to retrieve?
Is this what you meant by the error report? or is there a file somewhere?

The above was run with a verbosity of 4 or greater.
In the error table, there is only one error about the robots.txt file...

regards,

Showing links that were disallowed

Posted: Fri Nov 16, 2001 9:50 am
by mark
Sorry, I was referring to version 4. In version 2 you need to turn verbosity up to 7 (i usually just crank it all the way to 9 when in doubt). It will then print the link it's working on first so you can correlate the message to the link. Version 2 will not record any of those in the error table.