Good news and bad news:
1) Good news: when I added the Webinator user agent to robots.txt, the indexer started recognizing the file (although it listed the exclusions twice, presumably because they were all listed under Webinator and also under *)
2) Bad news: Even with the exclusions listed, the walk is still indexing directories that shouldn't be indexed. I do notice that there's a case difference - both the file name & the link name (on our Web pages) are in mixed case (I realize that was poor planning on our part) but the error message shows in lower case.
Here's what the walk shows:
started 1 (932) on
http://countynet/
robots.txt excludes the following prefixes:
http://countynet/apps/bps2001/
<others, not relevant>
http://countynet/misc/Grants/AerialImages/
http://countynet/misc/Grants/KernDataDemog/
<others, not relevant>
http://countynet/NTAdmin/
http://countynet/Webinator/
156 pages fetched (12,233,748 bytes) from
http://countynet/
11 errors
5 duplicate pages
<SNIP>
The link :
http://countynet/misc/grants/kerndatademog/kern data p5.htm
Had this error: Invalid url: spaces translated to %20 for retry
I tried adding
/misc/grants/kerndatademog/
in the exclusions, and it showed as same on the walk report, but the above error still came up. The weird thing is that that kerndatademog directory has about 30 files in it (with the embedded spaces in the filename), and only 3 - 4 are erroring.
Also, when I listed URL's for the walk, this was included:
http://countynet/misc/grants/KernDataDe ... ocator.asp
P.S. On the plus side, problem seems to only affect this directory - other directories that were listed in the exclusions are now being excluded.