Webinator 4.1 ignnoring robots.txt file

Post Reply
mjacobson
Posts: 204
Joined: Fri Feb 08, 2002 3:35 pm

Webinator 4.1 ignnoring robots.txt file

Post by mjacobson »

On one of our servers, we have a robots.txt file that has the following lines:

User-agent: *
Disallow: /

When Webinator 4.1 parses the robots.txt file I get the following output

The following exclusion(s) specified by robots.txt will be ignored as they would exclude the base url (http://ntsrvr1.mysite.com/).
/

The problem is we want everything to be excluded from this site. This is why we made a Disallow: / robots.txt file. What can I do to stop Webinator from indexing sites that have their robots.txt file like the one I have included. I don't want phone calls and emails from site administrators upset because their site is being walked.
User avatar
mark
Site Admin
Posts: 5513
Joined: Tue Apr 25, 2000 6:56 pm

Webinator 4.1 ignnoring robots.txt file

Post by mark »

User avatar
mark
Site Admin
Posts: 5513
Joined: Tue Apr 25, 2000 6:56 pm

Webinator 4.1 ignnoring robots.txt file

Post by mark »

webinator 4.1 is not old version 2.5 (this group). Please make future posts in the "webinator" group.
mjacobson
Posts: 204
Joined: Fri Feb 08, 2002 3:35 pm

Webinator 4.1 ignnoring robots.txt file

Post by mjacobson »

User avatar
mark
Site Admin
Posts: 5513
Joined: Tue Apr 25, 2000 6:56 pm

Webinator 4.1 ignnoring robots.txt file

Post by mark »

Oh, you must be using the enterprise or accept domain options. Those options assume you meant it. A small change to dowalk can get what you want. I think this will do it:
At the top of checkrejects add
<$rejected=0>
a little farther down after <if $ret ne ""><!-- yes --> add
<$rejected=1>
Then in setuprobots after the call to checkrejects add
<if $rejected ne 0><$top=></if>
mjacobson
Posts: 204
Joined: Fri Feb 08, 2002 3:35 pm

Webinator 4.1 ignnoring robots.txt file

Post by mjacobson »

Thanks for the code changes. It works. I was assuming that Webinator would respect robots.txt file no matter what options you had filled out or checked. Except for the option "Robots". If that was checked to N then Webinator would not follow the robots.txt statements. If it is Y then Webinator will follow the statements.
User avatar
mark
Site Admin
Posts: 5513
Joined: Tue Apr 25, 2000 6:56 pm

Webinator 4.1 ignnoring robots.txt file

Post by mark »

Reasonable assumption. A future version may address that more fully.
Post Reply