Page index error

Post Reply
nick107
Posts: 19
Joined: Wed May 05, 2004 11:57 am

Page index error

Post by nick107 »

I am having a new problem with the following page and its child links not being indexed:

http://www.icsc.org/srch/rsrch/research ... /index.php

I see the following relevant error in the error.log:

http://www.icsc.org/srch/cgi/memberprin ... 060310.pdf Error translating via anytotx: <EXEC> command /usr/local/morph3/bin/anytotx returned exit code 1 0http://www.icsc.org/srch/rsrch/researchquarterly/index.php02 2006-03-22 00:31:31 Cannot pdf open /tmp/cvti03871a in the function do_epipdf_file (/usr/local/morph3/bin/anytotx -fpdf --timeout=120 --error-log=/usr/local/morph3/texis/IcscLive.4419a7c74/anytotx-errors.1656)
2006-03-22 00:29:4

Any ideas?
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Page index error

Post by mark »

Is the file larger than your max page size? Partial PDF's can't be processed.
nick107
Posts: 19
Joined: Wed May 05, 2004 11:57 am

Page index error

Post by nick107 »

Ok, I've set the max apge size to -1 and that error isn't showing up in the log anymore. For some reason, I still can't get it to most of the items on the dropdown on that page:

http://www.icsc.org/srch/rsrch/research ... /index.php

I tried to bypass it by setting up a "Page URL" on this page: http://www.icsc.org/srch/rsrch/research ... issues.txt

I did a new walk and the walk status shows:
Reading urls from URL http://www.icsc.org/srch/rsrch/research ... issues.txt

but I still can't see any of the url's when I go to "List/Edit URLs" and try this string or any other:

*/cgi/memberprint?datafile=rsrchquarterly/back/*

I can't find any problems in the error log either, any ideas? Am I just missing something obvious?
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Page index error

Post by mark »

issues.txt doesn't contain valid urls. They're missing http://
nick107
Posts: 19
Joined: Wed May 05, 2004 11:57 am

Page index error

Post by nick107 »

Ah that makes sense. I fixed the text file now, but it looks like its trying to fetch these pages before it issues the custom primer url. So its returning the first link on that page as a login page and the rest as duplicates.

I bypassed this by putting the custom primer url at the top of the list and updated the walk, but I had to remove the primer since it had the username/login.
How can I easily make this work?
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

Page index error

Post by John »

Add the issues.txt file as an additional URL in the Base URL field. You could also add it to Exclude by
Field to exclude the issues.txt from the index, but still index links.
John Turnbull
Thunderstone Software
Post Reply