Hi,
As a lurker on this list for six months or more, I want
to say how impressed I am by the support Mark, Bart, and
others provide by means of this list.
Now a few questions from my first serious use of
Webinator. I'm indexing the server where we allow
students and others to have personal Web pages, so I
expect there will be some pages with horrible syntax and
grossly incorrect content. There are at least 15K pages
on the server.
1. I left it at the default level 2 message reporting
and as it chugs through the URLs, it reports two
numbers, e.g.
2113/20831
I think the lefthand number is the number of URLs or
pages processed, but I'm not sure what the righthand
number is.
2. My run went for about two hours and then seemed to
hang on the two numbers above. I gave it another
half hour, the process seemed to be doing nothing
(on a DEC Unix system), so I killed it with
CTRL-C and I see in the wg.log file that it says it
got signal 2 and attempted to quit nicely. The last
URL entered before that is
http://pubpages.unh.edu/~dmks/?S=A
and when I look at that page I can see one .htm
file in the directory, but it is mostly binary
stuff, even though named as HTML.
Have I just hit an occasional hazard that Webinator
won't be able to deal with? Is the solution to
exclude that page and resume walking? In fact, at
this point do I want to specify a rewalk to continue
walk (and index) the remaining pages? I could turn
on a higher level of message reporting, I wonder if
that really gets me anything?
3. I went on to index after stopping the walk and
the index seems OK. I can search and find the
sorts of things I would expect, EXCEPT that I can't
get any regular expressions to work. The index is at:
http://unhinfo.unh.edu/cgi-bin/texis/webinator/search/
Since I can find the name "sand" with a normal search,
it seems I should be able to do something like any of these
/s.nd
/[A-Z]and
Do I have to activate REX searching in some way I did not
notice?
- Jim Cerny, Computing & Information Services, Univ.NH
jim.cerny@unh.edu