Page 1 of 1

Can I timeout a walk?

Posted: Mon Oct 27, 2003 5:51 pm
by lightsource
I am evaluating the downloadable version of Webinator (Webinator 4.4.3-Windows-wo/plugin) and have my basic walk settings configured to wait 1 second between requests, allow 4 threads, and walk the domain on our public website.

What I'm seeing is that basically, indexing completes in about 40 minutes, then webinator proceeds to spawn threads for about 18 hours (retrieving nothing for those threads) before deciding that it's done.

Is there a way to configure the dowalk script to just stop if it's been running for, say, two hours?

Thanks -
Mark

Can I timeout a walk?

Posted: Mon Oct 27, 2003 7:36 pm
by lightsource
I don't see anything in the logs that indicates that it's doing _anything_ in particular, and the output to the screen (while the walk is active) basically just tells me repetitively that a new thread is created (for the same URL, but it looks like a new thread ID?? in parentheses each time) and that 0 pages were fetched.

The entire site is dynamically generated, but there's nothing like an active clock or date object which displays on the page that the monitor would have freaked out over, and there's no monitor URL configured for this particular walk.

I'm trying again with more filtering enabled (I'm stripping out all forms which the spider couldn't complete) and have selected "refresh" as the walk type - I'll let this one go overnight and see what it looks like in the morning. Perhaps the spider was getting hung up on a form or something (although there are no forms in the directory that it was looping on).

Thanks - I'll update when I see how it has behaved in the morning - this product rocks!

Mark

Can I timeout a walk?

Posted: Tue Oct 28, 2003 10:16 am
by Kai
If it was spawning a thread repeatedly for the same URL, there may be an issue with that URL. Are there any messages in texis/vortex.log in the install dir?

Can I timeout a walk?

Posted: Tue Oct 28, 2003 10:52 am
by mark
While experimenting with settings it's best to always do "new" walks rather than "refresh".

Can I timeout a walk?

Posted: Fri Jul 30, 2004 12:02 pm
by KMandalia
I am having the same issue as above (however, it is not stuck on same webpage but same folder). However, I can't stop the dowalk.

The following is latest from monitor and vortex logs.

Monitor

200 2004-07-30 11:53:00 (9812) Database Monitor on e:\Webinator\Data\site\db2 exiting

Vortex

115 2004-07-30 11:48:00 e:\Program Files\Thunderstone Software\Webinator\texis\scripts\Webinator\dowalk:69: Field NextCheck non-existent
000 2004-07-30 11:48:00 e:\Program Files\Thunderstone Software\Webinator\texis\scripts\Webinator\dowalk:69: SQLExecute() failed with -1 in the function execntexis

Webinator has already walked thousands of sites and I don't want to loose them, however, i want to stop the walker and also want to assure this problem doesn't happen in future.

Bottom line is I want to stop this walk and change settings so it doesn't happen in future. I have paid webinator 5.0.5 (both scripts are up to date). Any help?

Can I timeout a walk?

Posted: Fri Jul 30, 2004 12:09 pm
by KMandalia
I did that, but that doesn't stop walk. I want to stop the walk completely. The reason is this:

I want to put in categories. I paused the walk and make database live and now I am going into all walk settings, putting categories and url patterns in place and hitting update button. However, I am not seeing the category box on the search form. May be becuase the walk is still going on.

Can I timeout a walk?

Posted: Fri Jul 30, 2004 12:18 pm
by mark
Check the walk status page for the status of what's happening. You may have to scroll down to see it all.

Can I timeout a walk?

Posted: Fri Jul 30, 2004 2:31 pm
by KMandalia
Dispatcher stopping by request. May take up to 185 seconds to stop.
47561 pages fetched (-1,846,877,930 bytes) Total
224475 errors Total
24048 duplicate pages Total

Updating search index ...Recategorization started: 2004-07-30 11:32:39

I am putting in two categories. one with 20 websites and others with 1 website.


started 3 (9576) on http://promo.cuna.org/promo/banner_js.p ... _kids.html
started 3 (8692) on http://promo.cuna.org/promo/banner_js.p ... 4_fct.html
started 3 (8984) on http://promo.cuna.org/promo/banner_js.p ... 4_fct.html
started 3 (9696) on http://promo.cuna.org/promo/banner_js.p ... _spam.html
started 3 (9680) on http://promo.cuna.org/promo/banner_js.p ... _call.html
started 3 (9804) on http://promo.cuna.org/promo/banner_js.p ... 4_fct.html
started 3 (9656) on http://promo.cuna.org/promo/banner_js.p ... _fct.html/

This goes a long way up (different webpages and different numbers in brackets). I know these are bad webpages.So, I paused and made database live succesfully.

1) I want to stop walk. Period.
2) I am wondering whether recategorization really takes 3 hours for 46000 websites (its only two categories).
3) When I click on the stop button in all walk settings it should sooner or later stop,doesn't it?

First, I was not able to crawl enough pages due to some reason which is still unresolved. Now I removed categories and all other restriction that could potentially limit no. of pages retrived. Now I am not able to stop the walk.

I have the latest script (both same version and all).

Let me know what you think.

Can I timeout a walk?

Posted: Fri Jul 30, 2004 3:32 pm
by mark
When you hit "Pause and live" the index is updated, the database is made live, and the walk stops. Recategorization will run until it's complete. It looks like you tried to recategorize while the index was still being built which may have confused things.

On NT use task manager to see if there are any texis processes still running. Kill them. That should eliminate anything that's stuck but will leave the database in an unknown state. You should probably do a new walk after that.