Ignoring robots.txt file

User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Ignoring robots.txt file

Post by mark »

The scripts should have no extension. Remove the .txt part.
sunnedaze
Posts: 22
Joined: Mon Jul 28, 2003 2:07 pm

Ignoring robots.txt file

Post by sunnedaze »

Awesome!!! It is seeing robots now...:D
Thanks mucho!!!
sunnedaze
Posts: 22
Joined: Mon Jul 28, 2003 2:07 pm

Ignoring robots.txt file

Post by sunnedaze »

Is there a limit on the number of sites that can be exluded? Only 21 show up on the walk status screen and we have 30 in the robots.txt file.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Ignoring robots.txt file

Post by mark »

No limit. You probably have a syntax error in your robots.txt. And robots.txt does not control sites. It controls access to areas of the single site where that robots.txt file resides.
sunnedaze
Posts: 22
Joined: Mon Jul 28, 2003 2:07 pm

Ignoring robots.txt file

Post by sunnedaze »

Now confused. Here is the exclusion from the status. Then a search using that profile. And, the second result is in the robots listing (3rd one).

robots.txt excludes the following prefixes:
http://cleohsenet01.napa.ad.etn.com/webinator
http://cleohsenet01.napa.ad.etn.com/worldwideops
http://cleohsenet01.napa.ad.etn.com/wlc050403
http://cleohsenet01.napa.ad.etn.com/uwsurvey

http://www.etn.com/cgiexe/texis.exe/web ... estination

1: PowerPoint Presentation
of the above individuals. Please make sure that you provide the following information for the Support Analyst: •Origin •Destination •Class •Weight •Ship Date and/or ...
http://cleohsenet01.napa.ad.etn.com/... ... de0439.htm 70%
Size: 9K
Depth: 10
Find Similar
Match Info
Show Parents
2: 2003 Worldwide Leadership Conference
high quality video formats, use the order form to order a DVD or VHS copy. Welcome to the Information Destination for the Worldwide Leadership Conference Thinking ...
http://cleohsenet01.napa.ad.etn.com/wlc050403/
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Ignoring robots.txt file

Post by mark »

Hmm, are you sure you're looking at the status from a successful walk of the same profile and not an abandoned walk or a different profile than you're searching?

Are you getting any other pages from underneath wlc050403?

I assume these are internal or fake urls since none of them work from here.
sunnedaze
Posts: 22
Joined: Mon Jul 28, 2003 2:07 pm

Ignoring robots.txt file

Post by sunnedaze »

Yes, I am sure...all this was done after the new version of the software was installed. Prior, we did not see the robots.txt file in the log at all.

Yes, the url's are internal.

I had many pages of results...don't know about the other question.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Ignoring robots.txt file

Post by mark »

User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Ignoring robots.txt file

Post by mark »

You're not getting a message like this are you?
The following exclusion(s) specified by $n will be ignored as they would exclude the base url

Items in robots.txt or other exclusions that would exclude any of your base urls will be removed.
sunnedaze
Posts: 22
Joined: Mon Jul 28, 2003 2:07 pm

Ignoring robots.txt file

Post by sunnedaze »

Post Reply