Page 3 of 5

Ignoring robots.txt file

Posted: Fri Aug 01, 2003 11:56 am
by mark
The scripts should have no extension. Remove the .txt part.

Ignoring robots.txt file

Posted: Fri Aug 01, 2003 12:12 pm
by sunnedaze
Awesome!!! It is seeing robots now...:D
Thanks mucho!!!

Ignoring robots.txt file

Posted: Fri Aug 01, 2003 12:21 pm
by sunnedaze
Is there a limit on the number of sites that can be exluded? Only 21 show up on the walk status screen and we have 30 in the robots.txt file.

Ignoring robots.txt file

Posted: Fri Aug 01, 2003 12:45 pm
by mark
No limit. You probably have a syntax error in your robots.txt. And robots.txt does not control sites. It controls access to areas of the single site where that robots.txt file resides.

Ignoring robots.txt file

Posted: Fri Aug 01, 2003 12:55 pm
by sunnedaze
Now confused. Here is the exclusion from the status. Then a search using that profile. And, the second result is in the robots listing (3rd one).

robots.txt excludes the following prefixes:
http://cleohsenet01.napa.ad.etn.com/webinator
http://cleohsenet01.napa.ad.etn.com/worldwideops
http://cleohsenet01.napa.ad.etn.com/wlc050403
http://cleohsenet01.napa.ad.etn.com/uwsurvey

http://www.etn.com/cgiexe/texis.exe/web ... estination

1: PowerPoint Presentation
of the above individuals. Please make sure that you provide the following information for the Support Analyst: •Origin •Destination •Class •Weight •Ship Date and/or ...
http://cleohsenet01.napa.ad.etn.com/... ... de0439.htm 70%
Size: 9K
Depth: 10
Find Similar
Match Info
Show Parents
2: 2003 Worldwide Leadership Conference
high quality video formats, use the order form to order a DVD or VHS copy. Welcome to the Information Destination for the Worldwide Leadership Conference Thinking ...
http://cleohsenet01.napa.ad.etn.com/wlc050403/

Ignoring robots.txt file

Posted: Fri Aug 01, 2003 2:41 pm
by mark
Hmm, are you sure you're looking at the status from a successful walk of the same profile and not an abandoned walk or a different profile than you're searching?

Are you getting any other pages from underneath wlc050403?

I assume these are internal or fake urls since none of them work from here.

Ignoring robots.txt file

Posted: Fri Aug 01, 2003 2:59 pm
by sunnedaze
Yes, I am sure...all this was done after the new version of the software was installed. Prior, we did not see the robots.txt file in the log at all.

Yes, the url's are internal.

I had many pages of results...don't know about the other question.

Ignoring robots.txt file

Posted: Fri Aug 01, 2003 3:10 pm
by mark
Go to list/edit urls and enter
http://cleohsenet01.napa.ad.etn.com/wlc050403*
to see if anything under there was walked.

Ignoring robots.txt file

Posted: Fri Aug 01, 2003 3:15 pm
by mark
You're not getting a message like this are you?
The following exclusion(s) specified by $n will be ignored as they would exclude the base url

Items in robots.txt or other exclusions that would exclude any of your base urls will be removed.

Ignoring robots.txt file

Posted: Fri Aug 01, 2003 3:16 pm
by sunnedaze
1 matching pages.
Create a category for these URLs named:
Select a link to see information about that page.


Depth: 0 Home
http://cleohsenet01.napa.ad.etn.com/wlc050403/