Page 1 of 1

wish to crawl pages for links but not include them in the index

Posted: Tue May 13, 2008 12:22 pm
by jgdoke
I need to crawl a site and have some pages which have the links to the real content be crawled for the links but not be included in the index.

For instance this page has the links I want in the index:
http://literature.rockwellautomation.co ... %20English

But I dont want that page in the index.
Hope that makes sense.
John

wish to crawl pages for links but not include them in the index

Posted: Tue May 13, 2008 12:32 pm
by mark
Use "Exclude by field". Create a metamorph query to match only those pages you want to affect. Use Exclude "Pages only" to exclude the page content but keep the links.

wish to crawl pages for links but not include them in the index

Posted: Fri May 16, 2008 5:27 pm
by jgdoke
I have searched for documentation on a metamorph query but did not find out how I would exclude pages with *browse* in the url. Can you help with that syntax?

wish to crawl pages for links but not include them in the index

Posted: Fri May 16, 2008 5:45 pm
by mark
Do you want to match the word or a substring (joebrowser)?
If the word enter
browse
If the substring enter
/browse

wish to crawl pages for links but not include them in the index

Posted: Fri May 16, 2008 5:51 pm
by jgdoke
there are two different url's tht I would like to match and only them.
http://literature.rockwellautomation.co ... /webassets /browse_results.hcst
http://literature.rockwellautomation.co ... egory.hcst

I need the links from them but not the page.

So if I enter browse_ that should catch both.

correct?

wish to crawl pages for links but not include them in the index

Posted: Mon May 19, 2008 10:42 am
by mark
Yes, and anything similar of course.