Page 1 of 1

Page index error

Posted: Wed Mar 22, 2006 4:06 pm
by nick107
I am having a new problem with the following page and its child links not being indexed:

http://www.icsc.org/srch/rsrch/research ... /index.php

I see the following relevant error in the error.log:

http://www.icsc.org/srch/cgi/memberprin ... 060310.pdf Error translating via anytotx: <EXEC> command /usr/local/morph3/bin/anytotx returned exit code 1 0http://www.icsc.org/srch/rsrch/researchquarterly/index.php02 2006-03-22 00:31:31 Cannot pdf open /tmp/cvti03871a in the function do_epipdf_file (/usr/local/morph3/bin/anytotx -fpdf --timeout=120 --error-log=/usr/local/morph3/texis/IcscLive.4419a7c74/anytotx-errors.1656)
2006-03-22 00:29:4

Any ideas?

Page index error

Posted: Wed Mar 22, 2006 9:39 pm
by mark
Is the file larger than your max page size? Partial PDF's can't be processed.

Page index error

Posted: Thu Mar 23, 2006 3:47 pm
by nick107
Ok, I've set the max apge size to -1 and that error isn't showing up in the log anymore. For some reason, I still can't get it to most of the items on the dropdown on that page:

http://www.icsc.org/srch/rsrch/research ... /index.php

I tried to bypass it by setting up a "Page URL" on this page: http://www.icsc.org/srch/rsrch/research ... issues.txt

I did a new walk and the walk status shows:
Reading urls from URL http://www.icsc.org/srch/rsrch/research ... issues.txt

but I still can't see any of the url's when I go to "List/Edit URLs" and try this string or any other:

*/cgi/memberprint?datafile=rsrchquarterly/back/*

I can't find any problems in the error log either, any ideas? Am I just missing something obvious?

Page index error

Posted: Thu Mar 23, 2006 5:02 pm
by mark
issues.txt doesn't contain valid urls. They're missing http://

Page index error

Posted: Fri Mar 24, 2006 1:09 am
by nick107
Ah that makes sense. I fixed the text file now, but it looks like its trying to fetch these pages before it issues the custom primer url. So its returning the first link on that page as a login page and the rest as duplicates.

I bypassed this by putting the custom primer url at the top of the list and updated the walk, but I had to remove the primer since it had the username/login.
How can I easily make this work?

Page index error

Posted: Fri Mar 24, 2006 6:22 am
by John
Add the issues.txt file as an additional URL in the Base URL field. You could also add it to Exclude by
Field to exclude the issues.txt from the index, but still index links.