Page 1 of 1
Duplicates and followed links
Posted: Tue Oct 30, 2007 12:12 pm
by dietric
In my walk settings, I am enabling "Prevent Duplicates" and using the "Title" field to check for duplicates.
If I have two pages that would be seen as duplicates, will the links of each individual page still be indexed?
If not, how can I ensure that the links are always processed, even if the pages they reside on are seen as duplicates?
Duplicates and followed links
Posted: Tue Oct 30, 2007 12:47 pm
by jason112
If you are using only "Title" for duplicate detection, then two pages with the same title will not both be indexed, as that is exactly what Prevent Duplicates does.
I'm not sure what behavior you're looking for. You say you're preventing duplicates based on title, yet you want two pages with the same title to both be indexed. Is there some other nuance to it?
Duplicates and followed links
Posted: Tue Oct 30, 2007 1:49 pm
by mark
Pages that are determined to be duplicates of one already in the database will not have their links followed. There's currently no way around that though allowing link following from dup pages is somewhere on the development todo list.
Duplicates and followed links
Posted: Wed Oct 31, 2007 2:26 pm
by dietric
Any suggestions how I can make them not to be seen as duplicates? My problem is that when I use the Body as a "Duplicate Check Field", slight variations in whitespace that I can't control will create a lot of unrecognized duplicates, and if I use only the title/description/keywords, lists with multiple pages will be seen as duplicates and the links on those pages are not followed.
How about using META tags to check for duplicates? Will any META tags be recognized? What if I created a custom META tag that contains a unique value for each page that I don't want to be seen as a duplicate, would that work?
Thanks
-ds
Duplicates and followed links
Posted: Wed Oct 31, 2007 3:11 pm
by mark
If you include that meta field in your crawl using the "Meta Tags" setting and check "Meta" in the duplicate check fields list that should work.