Page 1 of 1

Webinator 4.1 ignnoring robots.txt file

Posted: Wed Aug 21, 2002 11:35 am
by mjacobson
On one of our servers, we have a robots.txt file that has the following lines:

User-agent: *
Disallow: /

When Webinator 4.1 parses the robots.txt file I get the following output

The following exclusion(s) specified by robots.txt will be ignored as they would exclude the base url (http://ntsrvr1.mysite.com/).
/

The problem is we want everything to be excluded from this site. This is why we made a Disallow: / robots.txt file. What can I do to stop Webinator from indexing sites that have their robots.txt file like the one I have included. I don't want phone calls and emails from site administrators upset because their site is being walked.

Webinator 4.1 ignnoring robots.txt file

Posted: Wed Aug 21, 2002 11:46 am
by mark
Don't give a base url of http://ntsrvr1.mysite.com/

Webinator 4.1 ignnoring robots.txt file

Posted: Wed Aug 21, 2002 11:47 am
by mark
webinator 4.1 is not old version 2.5 (this group). Please make future posts in the "webinator" group.

Webinator 4.1 ignnoring robots.txt file

Posted: Wed Aug 21, 2002 11:51 am
by mjacobson
Sorry. I didn't see where I was when I posted this message. I had done a search looking for similar messages.

I didn't give it the base url of http://ntsrvr1.mysite.com/ I made a test page that had a link to http://ntsrvr1.mysite.com/ and placed that page on a different server. I then used http://someotherserver.mysite.com as the base url.

Webinator 4.1 ignnoring robots.txt file

Posted: Wed Aug 21, 2002 1:01 pm
by mark
Oh, you must be using the enterprise or accept domain options. Those options assume you meant it. A small change to dowalk can get what you want. I think this will do it:
At the top of checkrejects add
<$rejected=0>
a little farther down after <if $ret ne ""><!-- yes --> add
<$rejected=1>
Then in setuprobots after the call to checkrejects add
<if $rejected ne 0><$top=></if>

Webinator 4.1 ignnoring robots.txt file

Posted: Wed Aug 21, 2002 2:30 pm
by mjacobson
Thanks for the code changes. It works. I was assuming that Webinator would respect robots.txt file no matter what options you had filled out or checked. Except for the option "Robots". If that was checked to N then Webinator would not follow the robots.txt statements. If it is Y then Webinator will follow the statements.

Webinator 4.1 ignnoring robots.txt file

Posted: Wed Aug 21, 2002 2:48 pm
by mark
Reasonable assumption. A future version may address that more fully.