Page 1 of 1
Not Duplicate but appliance says they are.
Posted: Fri Jan 21, 2005 10:17 am
by jgdoke
Not Duplicate but appliance says they are.
Posted: Fri Jan 21, 2005 10:22 am
by mark
Go to list/edit urls and see what text was extracted from each. It's probably the same.
Not Duplicate but appliance says they are.
Posted: Fri Jan 21, 2005 12:21 pm
by jgdoke
http://www.ab.com/networks/ethernet.html
Is NOT in the list.
Guessing that because it is a duplicate it deletes the page. Looking at the code from each there are no similarities.
Not Duplicate but appliance says they are.
Posted: Fri Jan 21, 2005 1:44 pm
by mark
I tried a crawl of just those 2 pages with dups off. Both seem to come up with no body text. Not sure why. Will require more study of the html on those pages.
Not Duplicate but appliance says they are.
Posted: Fri Jan 21, 2005 2:05 pm
by jgdoke
You are correct. The list url's shows zero bytes text from the page..
ABjournal is one of our high traffic areas, please let me know an answer ASAP.
Thank you
John
Not Duplicate but appliance says they are.
Posted: Fri Jan 21, 2005 2:24 pm
by mark
Those pages are returning different content based on user-agent. Adjust your user-agent to something the webserver likes. Maybe something like this will make it behave
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)
Not Duplicate but appliance says they are.
Posted: Mon Jan 24, 2005 12:14 pm
by mark
Not sure I what I was looking at before, but looking at this again it would appear that the problem is not client related, but is that both of those pages have no text content, only . The appliance can find the links to the desired pages,
http://www.ab.com/abjournal/nov2004/index.html and
http://www.ab.com/networks/ethernet/index.html, but won't try to follow any links on the duplicate (empty) page.