On one of our servers, we have a robots.txt file that has the following lines:
User-agent: *
Disallow: /
When Webinator 4.1 parses the robots.txt file I get the following output
The following exclusion(s) specified by robots.txt will be ignored as they would exclude the base url (http://ntsrvr1.mysite.com/).
/
The problem is we want everything to be excluded from this site. This is why we made a Disallow: / robots.txt file. What can I do to stop Webinator from indexing sites that have their robots.txt file like the one I have included. I don't want phone calls and emails from site administrators upset because their site is being walked.
Oh, you must be using the enterprise or accept domain options. Those options assume you meant it. A small change to dowalk can get what you want. I think this will do it:
At the top of checkrejects add
<$rejected=0>
a little farther down after <if $ret ne ""><!-- yes --> add
<$rejected=1>
Then in setuprobots after the call to checkrejects add
<if $rejected ne 0><$top=></if>
Thanks for the code changes. It works. I was assuming that Webinator would respect robots.txt file no matter what options you had filled out or checked. Except for the option "Robots". If that was checked to N then Webinator would not follow the robots.txt statements. If it is Y then Webinator will follow the statements.