Directories excluded in robots.txt still being indexed

Post by **mark** » Tue Nov 27, 2001 12:07 pm

Yes, webinator . See http://www.thunderstone.com/site/webina ... de110.html

That won't help if there's extraneous trailing space though. You should load the file into notepad, delete the trailing spaces, then save it.

vallinem · Post by **vallinem** » Tue Nov 27, 2001 1:24 pm

Good news and bad news:

1) Good news: when I added the Webinator user agent to robots.txt, the indexer started recognizing the file (although it listed the exclusions twice, presumably because they were all listed under Webinator and also under *)

2) Bad news: Even with the exclusions listed, the walk is still indexing directories that shouldn't be indexed. I do notice that there's a case difference - both the file name & the link name (on our Web pages) are in mixed case (I realize that was poor planning on our part) but the error message shows in lower case.

Here's what the walk shows:

started 1 (932) on http://countynet/
robots.txt excludes the following prefixes:
http://countynet/apps/bps2001/
<others, not relevant>
http://countynet/misc/Grants/AerialImages/
http://countynet/misc/Grants/KernDataDemog/
<others, not relevant>
http://countynet/NTAdmin/
http://countynet/Webinator/

156 pages fetched (12,233,748 bytes) from http://countynet/
11 errors
5 duplicate pages
<SNIP>
The link : http://countynet/misc/grants/kerndatademog/kern data p5.htm
Had this error: Invalid url: spaces translated to %20 for retry

I tried adding

/misc/grants/kerndatademog/

in the exclusions, and it showed as same on the walk report, but the above error still came up. The weird thing is that that kerndatademog directory has about 30 files in it (with the embedded spaces in the filename), and only 3 - 4 are erroring.

Also, when I listed URL's for the walk, this was included:

http://countynet/misc/grants/KernDataDe ... ocator.asp

P.S. On the plus side, problem seems to only affect this directory - other directories that were listed in the exclusions are now being excluded.

Post by **mark** » Tue Nov 27, 2001 3:01 pm

Case doesn't matter for the exclusions. You'll probably want to be using the "ignore case" flag to handle the inconsistent naming.

Make sure you don't have any extra spaces on the exclusion line.