Page 2 of 3
Extra download required for Webinator with Texis?
Posted: Fri Jan 11, 2002 12:44 pm
by mark
oops, sorry. Apparently IE's save as will save carriage returns on the end of every line of the script. The internal config is not expected to have that. Make the following change to webinatoradmin to accomodate that.
Search for
<timport
you'll find a long line ending with
*\x01\x0a=
change that part to
*\x01=\x0d?\x0a=
Extra download required for Webinator with Texis?
Posted: Fri Jan 11, 2002 12:53 pm
by b.sims
Bingo! It looks pretty normal now.
One last thing: Is it possible to feed in a list of URLs from a text file as is was in 2.5? This is what I need to do before I go home for the weekend
Extra download required for Webinator with Texis?
Posted: Fri Jan 11, 2002 12:59 pm
by b.sims
Also, can I specify that the crawl stop at a certain time (6am Monday morning)? I need it to stop before we open Monday but don't want to get out of bed at 4am.
Extra download required for Webinator with Texis?
Posted: Fri Jan 11, 2002 1:21 pm
by mark
You can feed urls from a local file (like gw's "&filelist") with the "URL File" or "Page File" option depending on whether you want full site walks or just single page fetches, respectively.
Webinator 4 will walk until completion or manual intervention.
Assuming that you wanted to make an incomplete walk live you could modify dowalk in the "fetchset" function. Insert this right before the userstats call:
<if convert( 'now' , 'date' ) gt convert( '2002-01-14 06:00:00' , 'date' )>
Walker stopping by time. ($top)
<$stopwalk=2>
<bye>
</if>
Extra download required for Webinator with Texis?
Posted: Mon Jan 14, 2002 5:20 am
by b.sims
When I make this modification, webinator throws up the error 'Missing start single quote in value'. I used the code exactly as above, where it looks as though all the quotes are correctly paired up; perhaps I am missing something in the syntax?
I ran the crawler over the weekend and manually stopped it this morning; once this function is in place, will that index automatically be made once dowalk is run again?
Thanks a lot,
Extra download required for Webinator with Texis?
Posted: Mon Jan 14, 2002 6:02 am
by bart
Make sure there are spaces before and after each single quote in a convert statement. This is silly, but the parser is kind of brain dead in this area.
Im not sure about your second question, but you can check it yourself.
Extra download required for Webinator with Texis?
Posted: Mon Jan 14, 2002 6:09 am
by b.sims
Also, does Webinator 4 contain an equivalent of the 2.5 todo table? I would like to be able to customize the crawler so that I can stop and restart a walk, in order to use system slowtime and bandwidth.
Extra download required for Webinator with Texis?
Posted: Mon Jan 14, 2002 7:36 am
by b.sims
OK, I've been going through the code, please tell me if I am right about all this:
System regularly checks the value of $stopwalk. This can have a value of 0, 1 or 2. 1 means the walk was stopped by the system, 2 means it was stopped manually by the admin. As far as I can tell, stopping manually like this causes an abandon and the page index is not created: is this correct?
Can the indexing process be triggered manually?
Extra download required for Webinator with Texis?
Posted: Mon Jan 14, 2002 10:17 am
by mark
Webinator 4 operates under a somewhat different paradigm than webinator 2. It's not currently very amenable to stopping and restarting. But that's a feature want to add. You might find the comments at the top of the dowalk script interesting.
Right. Stopping the walk in the dispatcher abandons it and does not make the index. Where I suggested you place the time based stop will stop the children spawned by the dispatcher. The dispatcher is oblivious to why they quit and will simply assume all is well and index and make the database live.
You could call the "remakeindex" function documented at
http://www.thunderstone.com/texis/site/ ... ing+dowalk
Extra download required for Webinator with Texis?
Posted: Wed Jan 16, 2002 1:36 pm
by b.sims
Based on what I said in 18, should your code in 14 be corrected to <$stopwalk=1>. My walks are not going live as expected; is this due to 2 being the abandon code?