Search Appliance does not respect robot META tags

dietric · Post by **dietric** » Tue May 23, 2006 8:55 am

I have a profile set up to respect robot META tags.
However, it still indexes pages that are specified as noindex,nofollow:

http://sandsports.off-road.com/dunes/ma ... ?id=227437

Post by **John** » Tue May 23, 2006 10:03 am

If you check the link under List/Edit URLs does it have any content? An empty place holder is kept for the NOINDEX pages. It maybe more useful to add emailContent.jsp to the Exclusions to avoid fetching the page in the first place and then finding the NOINDEX.

dietric · Post by **dietric** » Tue May 23, 2006 11:40 am

They don't have any content.
I'm building the tags programatically, and the values are data-drive - moving this to a robots.txt file would be pretty complex. I'm mostly concerned about bogging down the appliance with indexing pages that are not eligible for search, and eating up my available indexes... Any thoughts on how this affects the walk durations?

Post by **John** » Tue May 23, 2006 11:59 am

The trouble with META robots tags is that they are not seen until the page has been downloaded and processed, whereas using Exclusions or robots.txt allows the determination to be made before hand.

An upcoming update to the search appliance will have an option to not store the placeholders.