Page 1 of 1

robots meta tag being ignored

Posted: Mon Jun 03, 2002 8:35 pm
by waynes
Hi folks,

We have a number of pages with the following tag: <meta name="robots" content="noindex,follow">

For most of the pages (they are all navigation bars) Webinator correctly follows and doesn't index but for a few with exactly the same meta tag (in exactly in the same spot) it ignores it and indexes the navigation bar resulting in these nav bars appearing in the search results.

I have looked at the source code for these nav bars and they are all identical html-wise other than the content.

Does anyone have any suggestions or advice as to what to try to solve this problem?

robots meta tag being ignored

Posted: Mon Jun 03, 2002 9:16 pm
by mark
If you supply the urls of a page that works and one that doesn't someone may be able to spot something.

robots meta tag being ignored

Posted: Mon Jun 03, 2002 11:26 pm
by waynes
Unfortunately it's on an intranet site. The opening 4 lines for both good & bad pages are:

<html>
<head>
<meta name="robots" content="noindex,follow">
<title>Navigation Bar</title>
</head>

The rest is identical other than the actual content of the menu items.

robots meta tag being ignored

Posted: Tue Jun 04, 2002 11:06 am
by mark
Are you sure the text content of those frames is being stored? Use "List/Edit" urls to examine what's in the database.

By default, urls of "noindex" pages will be kept in the database, but not their content. This makes the "parents" links work consistently. You can disable this behavior, and prevent even the urls from being stored, by editing dowalk and changing
<$SSc_metarobotsplaceholder=Y>
to
<$SSc_metarobotsplaceholder=N>

robots meta tag being ignored

Posted: Wed Jun 05, 2002 6:39 pm
by waynes
Thanks for the tip. Tried changing this option but same result.

I looked under the list/edit urls and both the URL and Body are there.

Any other ideas??

robots meta tag being ignored

Posted: Thu Jun 06, 2002 12:33 pm
by mark
Not really. Is there any pattern to the ones that work vs. those that don't, such as being specified differently in the walk settings or reached through some different method when surfing the site?

You can add something like this to the dowalk script to debug exactly what robots settings the walker is finding for each page. In the collectmeta function right after the line containing <urlinfo metaname "robots"> add a line reading
<$zz=$ret>
Then after the </loop> a few lines below add the following lines
<write append /tmp/debug.log>
$u robots="$zz" metarobots_index="$metarobots_index" metarobots_follow="$metarobots_follow"
</write>

After a walk look at /tmp/debug.log to compare the good and bad pages.