Hi here is my issue - I have a frontend for newsgroups that I am crawling. I really only want THE RESULTS (the posts) not the indeces of posts (the thread view) in my results. Urls are pretty standard like thread.php?= or ?article.php? I just want articles.
Ideally the index pages would have
<meta name="robots" content="noindex,follow">
on them. See http://www.robotstxt.org/wc/meta-user.html
If you can't control that you can use "Data from Field" to simulate it. Set it to recognize your index page urls or content and set "Exclude" to "Pages only".