We have one huge index covering corporate and country sites and use filters in the search script if we want to retrieve only results from one site. This works fine except some minor issues. The Base Url points to a html page (depth=0) containing links to a sitemap of all the websites (depth=1). On these sitemap every page under it is linked again (depth=2). Some of these pages have one more level or iframes with pdfs we want to index as well (depth=3). So I set the Max Depth to 3. But as some pages have anchors within the document the walk follows them as well. This is not perfect, but acceptable. The problem is that it continues to follow these links and multiplying the path until the Max URL Size Limit is reached. This multiplying was an error on our CMS, but still, it should not follow deeper with the Max Depth parameter set. It should follow these anchors once and then stop. Do I miss something there?
Another Max Depth Problem
Another Max Depth Problem
Not sure how that could happen. Do you see anything unusual in the walk status? Does it say "max process size exceeded" in the walk status? Do these sites link to each other?
-
- Posts: 18
- Joined: Mon Nov 19, 2007 6:17 am
Another Max Depth Problem
As I said it seems it recursively followed the anchor links on that page and added the path (which was an error on our CMS). Some other pages had something like a subnavigation created manually from the authors. So from 5 pages every page has links to the 4 other related pages and thus also recursively followed this links. So it is really strange, as it should stop after following each link once (we are at depth 3 there already).
Regarding the status, I restarted the walk after fixing that error, so am not longer able to see what went wrong. We'll see what happens now, but it sure will take up to 20 hours for a full index. I now have prevent duplicates activated again, what should prevent this endless following as well.
Regarding the status, I restarted the walk after fixing that error, so am not longer able to see what went wrong. We'll see what happens now, but it sure will take up to 20 hours for a full index. I now have prevent duplicates activated again, what should prevent this endless following as well.