Max Depth problem

twu
Posts: 22
Joined: Fri Oct 19, 2007 2:57 pm

Max Depth problem

Post by twu »

Hi, what is the default value for Max Depth when setting as -1? I am having this problem that seems something to do with Max Depth. I have this basic URL:
http://www.mysite.com/nr/rdonlyres/, and directory structure as the following:
/nr/rdonlyres/
/nr/rdonlyres/[firstSub]/
/nr/rdonlyres/[firstSub]/0/
/nr/rdonlyres/[firstSub]/0/[fileName].pdf

There are no pages under rdonlyres and [firstSub] directories, now the problem is the indexer doesn't crawl all the PDFs, all http://www.mysite.com/nr/rdonlyres//[fi ... eName].pdf URLs are not selectable.

Most of all walking settings are default, could you guys please let me know where I did wrong.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Max Depth problem

Post by mark »

-1 for Max Depth means no limit.
Without knowing more about your actual urls and your non-default walk settings it hard to say.
Make sure .pdf is listed in the extensions list.
Make sure there is no meta-robots or robots.txt or excludes setting that prevents indexing of those pages.

To aid in finding the problem set verbosity to 4 and rewalk type to new. Then do a new walk. When it's complete go to List/edit urls and lookup the page that lists one of the missing pages. Click on it to get walk details. Then click on "Children" on the detail page. That will show you all the links found on the page, which ones are indexed and which aren't. For those that aren't there should be a reason to the right.
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

Max Depth problem

Post by John »

Do you have directory browsing enabled? If not what links are found if you go to one of the /nr/rdonlyres/[firstSub]/ directories with your browser?
John Turnbull
Thunderstone Software
twu
Posts: 22
Joined: Fri Oct 19, 2007 2:57 pm

Max Depth problem

Post by twu »

Thanks Mark. The following are the Basic URLs:
http://www.meritageportfolios.com/nr/rdonlyres/ http://www.meritageportfolios.com/meritage_en/inv/

.pdf is in the listed extensions.
There is no meta-robots or robots.txt to avoid indexing of those pages.
The following is a typical page from List/Edit URLs:

List/Edit URLs
Current User: webinator
Current Profile: mtg_inv_en Webinator 5.1.61-Windows-w/plugin
Pages linked by http://meritageportfolios.com/nr/rdonly ... d88ea57f/0
Select a link to see information about that page.
(links that are not selectable are not in the database)

http://meritageportfolios.com/nr/rdonly ... cd88ea57f/
http://meritageportfolios.com/nr/rdonly ... 772_fr.pdf


The pdf file is not selectable, and there is no reason to the right.(verbosity is set to 4)
twu
Posts: 22
Joined: Fri Oct 19, 2007 2:57 pm

Max Depth problem

Post by twu »

Thanks John, directory browsing are enabled and all PDF files are working fine from browser.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Max Depth problem

Post by mark »

Are other PDF's indexed? Did you purchase the PDF plugin?
Did you pause the walk (or any walk) while it was running?
Did the walk stop early due to page count license limit (check the walk status)?
twu
Posts: 22
Joined: Fri Oct 19, 2007 2:57 pm

Max Depth problem

Post by twu »

User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Max Depth problem

Post by mark »

I was able to crawl that site pair and get that pdf page.
Check your vortex.log file to see if there are any related errors logged there.

Try the latest scripts from the website.
twu
Posts: 22
Joined: Fri Oct 19, 2007 2:57 pm

Max Depth problem

Post by twu »

Thanks Mark.
How many pdfs are you getting? Ideally it should be getting a few hundreds PDFs.
What version of Webinator are you using? I am using Webinator 5 without Execute JavaScript Plugin.
I downloaded the latest dowalk Script, and still didn't get the latest one.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Max Depth problem

Post by mark »

Post Reply