Todo list

Post Reply
watterson
Posts: 71
Joined: Mon Feb 14, 2005 4:15 pm

Todo list

Post by watterson »

Looking at the walk status on our appliance, there are over 3000 pages in the todo list, but when the crawl starts, the number does not change and the process seems to end normally. Is there a way to see what is in the todo list, or is there some reason that these are not being crawled?

Mike
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Todo list

Post by mark »

Check your walk status. The walk is probably quitting prematurely due to size limits. If you don't have other walks running at the same time you might want to raise the max process size limit before resuming the walk.
watterson
Posts: 71
Joined: Mon Feb 14, 2005 4:15 pm

Todo list

Post by watterson »

I have the verbosity set to 4. Looking at the walk status, there is no indication that there was a problem with the process size. Any subsequent refreshes also just end right away without crawling anything. There are no other crawls running at the time.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Todo list

Post by mark »

What version of scripts are you using? If not 5.4.6 please check for updates and try again.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Todo list

Post by mark »

p.s.
If you pause a walk todo will be non-empty.
watterson
Posts: 71
Joined: Mon Feb 14, 2005 4:15 pm

Todo list

Post by watterson »

The script version is 5.4.6-1. There are times when we pause a walk, but the times I am mentioning here are ones that just seem to terminate normally and it seems that it has not completed. If I start a refresh right after this happens, it begins indexing again and stops, again seemingly normal, but still has a high todo number.

Mike
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Todo list

Post by mark »

What's everything in your walk status between the first "started 1" line and the "Walk finished at" line?
watterson
Posts: 71
Joined: Mon Feb 14, 2005 4:15 pm

Todo list

Post by watterson »

started 1 new (24471) on http://www.stsci.edu/
started 2 new (24475) on http://oposite.stsci.edu/
0 pages fetched (19,754 bytes) from http://www.stsci.edu/
started 2 new (24478) on http://www-int.stsci.edu/~fruchter/ERO
0 pages fetched (103,599 bytes) from http://oposite.stsci.edu/
started 2 new (24483) on http://asds.stsci.edu/asds/
0 pages fetched (4,214 bytes) from http://asds.stsci.edu/asds/
started 2 new (24485) on http://archive.stsci.edu/galex

100 /usr/local/morph3/texis/scripts/dowalk(doprimer) 276: Too many redirections (3) while fetching original http://archive.stsci.edu/galex
0 pages fetched (58,795 bytes) from http://archive.stsci.edu/galex
started 2 new (24489) on http://stsdas.stsci.edu/numarray/numarr ... array.html
112 pages fetched (1,289,518 bytes) from http://stsdas.stsci.edu/numarray/numarr ... array.html
started 2 new (24546) on http://hubblesite.org/gallery/showcase/
35 pages fetched (777,539 bytes) from http://hubblesite.org/gallery/showcase/
started 2 new (24667) on http://ess.stsci.edu/gsd/dst/
0 pages fetched (370 bytes) from http://ess.stsci.edu/gsd/dst/
started 2 new (24669) on http://www.ess.stsci.edu/psdb/
0 pages fetched (3,985 bytes) from http://www.ess.stsci.edu/psdb/
started 2 new (24671) on http://presto.stsci.edu/public/propinfo.html
0 pages fetched (25,858 bytes) from http://presto.stsci.edu/public/propinfo.html
started 2 new (24675) on http://www.stsci.edu/resources/software ... /numarray/
5 pages fetched (6,703,659 bytes) from http://www-int.stsci.edu/~fruchter/ERO
started 2 refresh (24676) on http://hubblesite.org/newscenter/newsde ... %20body/+1
0 pages fetched (17,701 bytes) from http://www.stsci.edu/resources/software ... /numarray/
started 2 refresh (24678) on http://www-int.stsci.edu/~welty/welty_cmd.html
2 pages fetched (142,176 bytes) from http://hubblesite.org/newscenter/newsde ... %20body/+1
started 2 refresh (24751) on http://www.stsci.edu/instruments/wfc3/C ... ummary.pdf
20 pages fetched (13,715,236 bytes) from http://www-int.stsci.edu/~welty/welty_cmd.html
started 2 new (25592) on http://archive.stsci.edu/hst/tall.html
2 pages fetched (1,752,478 bytes) from http://archive.stsci.edu/hst/tall.html
started 2 new (25646) on http://hst.stsci.edu/HST_overview/instruments
0 pages fetched (28,077 bytes) from http://hst.stsci.edu/HST_overview/instruments
started 2 new (25649) on http://ra.stsci.edu/STSDAS.html

006 /usr/local/morph3/texis/scripts/dowalk(doprimer) 276: Cannot connect to ra.stsci.edu:80: Connection refused
0 pages fetched (0 bytes) from http://ra.stsci.edu/STSDAS.html
started 2 new (25650) on http://sco.stsci.edu/newsletter/PDF/2002/spring_02.pdf

006 /usr/local/morph3/texis/scripts/dowalk(doprimer) 276: Timeout reading from sco.stsci.edu:80 in the function htbuf_readnblk
0 pages fetched (509,356 bytes) from http://sco.stsci.edu/newsletter/PDF/2002/spring_02.pdf
started 2 new (25754) on http://jwstsite.stsci.edu/
0 pages fetched (9,011 bytes) from http://jwstsite.stsci.edu/
started 2 new (25756) on http://hubblesource.stsci.edu/sources/
0 pages fetched (38,918 bytes) from http://hubblesource.stsci.edu/sources/
started 2 new (25808) on http://elite.stsci.edu/

100 /usr/local/morph3/texis/scripts/dowalk(doprimer) 276: Document access unauthorized: http://elite.stsci.edu/ returned code 401 (Access Denied)
0 pages fetched (4,431 bytes) from http://elite.stsci.edu/
started 2 new (25809) on http://mytime.stsci.edu/

015 /usr/local/morph3/texis/scripts/dowalk(doprimer) 276: Disallowed protocol `https' for URL https://mytime.stsci.edu/
0 pages fetched (0 bytes) from http://mytime.stsci.edu/
started 2 new (25810) on http://stsdas.stsci.edu/multidrizzle/
0 pages fetched (9,355 bytes) from http://stsdas.stsci.edu/multidrizzle/
started 2 new (25812) on http://oposite.stsci.edu/pubinfo/
0 pages fetched (17,338 bytes) from http://oposite.stsci.edu/pubinfo/
started 2 new (25814) on http://amazing-space.stsci.edu/resource ... s/groundup
0 pages fetched (5,795 bytes) from http://amazing-space.stsci.edu/resource ... s/groundup
started 2 new (25818) on http://sol.stsci.edu/~koekemoe/A2597_AAS191_poster

100 /usr/local/morph3/texis/scripts/dowalk(doprimer) 276: Document not found: http://sol.stsci.edu/~koekemoe/A2597_AAS191_poster returned code 404 (Not Found)
0 pages fetched (223 bytes) from http://sol.stsci.edu/~koekemoe/A2597_AAS191_poster
started 2 new (25819) on http://sso.stsci.edu/second_decade/recommendations/

000 /usr/local/morph3/texis/scripts/dowalk(doprimer) 276: Cannot resolve host `sso.stsci.edu': Host not found
0 pages fetched (0 bytes) from http://sso.stsci.edu/second_decade/recommendations/
started 2 refresh (25820) on http://stlibrary1.stsci.edu/uhtbin/cgis ... 17/0/57/49
1 pages fetched (44,112 bytes) from http://stlibrary1.stsci.edu/uhtbin/cgis ... 17/0/57/49
started 2 refresh (25822) on http://sd.stsci.edu/internal/
1 pages fetched (22,438 bytes) from http://sd.stsci.edu/internal/
started 2 new (25824) on http://apt.stsci.edu/phase2
0 pages fetched (2,181 bytes) from http://apt.stsci.edu/phase2
started 2 refresh (25826) on http://ess.stsci.edu/teams/dsb/dads/jav ... eUser.html
0 pages fetched (0 bytes) from http://ess.stsci.edu/teams/dsb/dads/jav ... eUser.html
started 2 refresh (25827) on http://apt.stsci.edu/help/tooltours/opt ... anner.html
0 pages fetched (0 bytes) from http://apt.stsci.edu/help/tooltours/opt ... anner.html
started 2 refresh (25875) on http://www.at.stsci.edu/dads/
0 pages fetched (0 bytes) from http://www.at.stsci.edu/dads/
started 2 refresh (25876) on http://stsdas.stsci.edu/multidrizzle/cu ... apers.html
0 pages fetched (0 bytes) from http://stsdas.stsci.edu/multidrizzle/cu ... apers.html
started 2 refresh (25877) on http://www.ess.stsci.edu/fset/projects/ ... ase-b.html
0 pages fetched (0 bytes) from http://www.ess.stsci.edu/fset/projects/ ... ase-b.html
started 2 refresh (25878) on http://archive.stsci.edu/hst/daily/arch ... rt_03.html
0 pages fetched (0 bytes) from http://archive.stsci.edu/hst/daily/arch ... rt_03.html
started 2 refresh (25879) on http://starview.stsci.edu/html/Frame_releases.html
0 pages fetched (0 bytes) from http://starview.stsci.edu/html/Frame_releases.html
started 2 refresh (25880) on http://asds.stsci.edu/manindex.html
0 pages fetched (0 bytes) from http://asds.stsci.edu/manindex.html
started 2 refresh (25881) on http://stdatu.stsci.edu/hst/archive_status.html
0 pages fetched (0 bytes) from http://stdatu.stsci.edu/hst/archive_status.html
started 2 refresh (25882) on http://informal-sci.stsci.edu/exhibits/
0 pages fetched (0 bytes) from http://informal-sci.stsci.edu/exhibits/
started 2 refresh (25883) on http://nemesis.stsci.edu/~hamilton/
0 pages fetched (0 bytes) from http://nemesis.stsci.edu/~hamilton/
started 2 refresh (25884) on http://sundog.stsci.edu/
0 pages fetched (0 bytes) from http://sundog.stsci.edu/
started 2 refresh (25885) on http://netmon.stsci.edu/docs/mrtg/172.16.1.252_12.html
0 pages fetched (0 bytes) from http://netmon.stsci.edu/docs/mrtg/172.16.1.252_12.html
started 2 refresh (25886) on http://oposite.stsci.edu/pubinfo/educat ... level.html
0 pages fetched (0 bytes) from http://oposite.stsci.edu/pubinfo/educat ... level.html
started 2 refresh (25887) on http://amazing-space.stsci.edu/tele-video.php
0 pages fetched (0 bytes) from http://amazing-space.stsci.edu/tele-video.php
started 2 refresh (25888) on http://rabbit.stsci.edu/products/Manual ... e/download
0 pages fetched (0 bytes) from http://rabbit.stsci.edu/products/Manual ... e/download
started 2 refresh (25889) on http://www.pst.stsci.edu/~pstwww/class_list.html
0 pages fetched (0 bytes) from http://www.pst.stsci.edu/~pstwww/class_list.html
started 2 refresh (25890) on http://nvo.stsci.edu/VORegistry
0 pages fetched (0 bytes) from http://nvo.stsci.edu/VORegistry
started 2 refresh (25893) on http://presto.stsci.edu/reports/spss-status.html
0 pages fetched (0 bytes) from http://presto.stsci.edu/reports/spss-status.html
started 2 refresh (25941) on http://stats.stsci.edu/
0 pages fetched (0 bytes) from http://stats.stsci.edu/
started 2 new (26092) on http://hst.stsci.edu/acs/performance/ct ... kshop.html
0 pages fetched (3,075 bytes) from http://hst.stsci.edu/acs/performance/ct ... kshop.html
started 2 new (26097) on http://ra.stsci.edu/DocRequest.html

006 /usr/local/morph3/texis/scripts/dowalk(doprimer) 276: Cannot connect to ra.stsci.edu:80: Connection refused
0 pages fetched (0 bytes) from http://ra.stsci.edu/DocRequest.html
started 2 new (27742) on http://www.sogs.stsci.edu/spst/lrpg/LRP.html

006 /usr/local/morph3/texis/scripts/dowalk(doprimer) 276: Cannot connect to www.sogs.stsci.edu:80: Connection refused (Broken pipe)
0 pages fetched (0 bytes) from http://www.sogs.stsci.edu/spst/lrpg/LRP.html
Dispatcher stopping by request. May take up to 65 seconds to stop.
Forcibly killing slow or stuck child 24751 (http://www.stsci.edu/instruments/wfc3/C ... ummary.pdf)
83806 pages fetched (-488,719,742 bytes) Total
193359 errors Total
17738 duplicate pages Total

Updating search index ...Done.
Creating spell-checker dictionaries...Done.
Done.
Verifying usability of new walk.

Walk finished at 2005-05-17 17:10:00 (took 2 hours 14 minutes 37 seconds)
Keeping database live: /usr/local/morph3/texis/www.421cdd7a3/db1
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Todo list

Post by mark »

Here's the smoking gun:

Dispatcher stopping by request.

Someone did a "pause and live". Make sure your rewalk type is set to refresh and hit "go" to finish the walk.
Post Reply