Duplicates and followed links

Post Reply
dietric
Posts: 100
Joined: Fri May 20, 2005 10:57 am

Duplicates and followed links

Post by dietric »

In my walk settings, I am enabling "Prevent Duplicates" and using the "Title" field to check for duplicates.
If I have two pages that would be seen as duplicates, will the links of each individual page still be indexed?
If not, how can I ensure that the links are always processed, even if the pages they reside on are seen as duplicates?
User avatar
jason112
Site Admin
Posts: 347
Joined: Tue Oct 26, 2004 5:35 pm

Duplicates and followed links

Post by jason112 »

If you are using only "Title" for duplicate detection, then two pages with the same title will not both be indexed, as that is exactly what Prevent Duplicates does.

I'm not sure what behavior you're looking for. You say you're preventing duplicates based on title, yet you want two pages with the same title to both be indexed. Is there some other nuance to it?
User avatar
mark
Site Admin
Posts: 5513
Joined: Tue Apr 25, 2000 6:56 pm

Duplicates and followed links

Post by mark »

Pages that are determined to be duplicates of one already in the database will not have their links followed. There's currently no way around that though allowing link following from dup pages is somewhere on the development todo list.
dietric
Posts: 100
Joined: Fri May 20, 2005 10:57 am

Duplicates and followed links

Post by dietric »

Any suggestions how I can make them not to be seen as duplicates? My problem is that when I use the Body as a "Duplicate Check Field", slight variations in whitespace that I can't control will create a lot of unrecognized duplicates, and if I use only the title/description/keywords, lists with multiple pages will be seen as duplicates and the links on those pages are not followed.
How about using META tags to check for duplicates? Will any META tags be recognized? What if I created a custom META tag that contains a unique value for each page that I don't want to be seen as a duplicate, would that work?

Thanks
-ds
User avatar
mark
Site Admin
Posts: 5513
Joined: Tue Apr 25, 2000 6:56 pm

Duplicates and followed links

Post by mark »

If you include that meta field in your crawl using the "Meta Tags" setting and check "Meta" in the duplicate check fields list that should work.
Post Reply