Can I timeout a walk?

Post Reply
lightsource
Posts: 2
Joined: Mon Oct 27, 2003 5:44 pm

Can I timeout a walk?

Post by lightsource »

I am evaluating the downloadable version of Webinator (Webinator 4.4.3-Windows-wo/plugin) and have my basic walk settings configured to wait 1 second between requests, allow 4 threads, and walk the domain on our public website.

What I'm seeing is that basically, indexing completes in about 40 minutes, then webinator proceeds to spawn threads for about 18 hours (retrieving nothing for those threads) before deciding that it's done.

Is there a way to configure the dowalk script to just stop if it's been running for, say, two hours?

Thanks -
Mark
lightsource
Posts: 2
Joined: Mon Oct 27, 2003 5:44 pm

Can I timeout a walk?

Post by lightsource »

I don't see anything in the logs that indicates that it's doing _anything_ in particular, and the output to the screen (while the walk is active) basically just tells me repetitively that a new thread is created (for the same URL, but it looks like a new thread ID?? in parentheses each time) and that 0 pages were fetched.

The entire site is dynamically generated, but there's nothing like an active clock or date object which displays on the page that the monitor would have freaked out over, and there's no monitor URL configured for this particular walk.

I'm trying again with more filtering enabled (I'm stripping out all forms which the spider couldn't complete) and have selected "refresh" as the walk type - I'll let this one go overnight and see what it looks like in the morning. Perhaps the spider was getting hung up on a form or something (although there are no forms in the directory that it was looping on).

Thanks - I'll update when I see how it has behaved in the morning - this product rocks!

Mark
User avatar
Kai
Site Admin
Posts: 1272
Joined: Tue Apr 25, 2000 1:27 pm

Can I timeout a walk?

Post by Kai »

If it was spawning a thread repeatedly for the same URL, there may be an issue with that URL. Are there any messages in texis/vortex.log in the install dir?
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Can I timeout a walk?

Post by mark »

While experimenting with settings it's best to always do "new" walks rather than "refresh".
KMandalia
Posts: 301
Joined: Fri Jul 09, 2004 3:50 pm

Can I timeout a walk?

Post by KMandalia »

I am having the same issue as above (however, it is not stuck on same webpage but same folder). However, I can't stop the dowalk.

The following is latest from monitor and vortex logs.

Monitor

200 2004-07-30 11:53:00 (9812) Database Monitor on e:\Webinator\Data\site\db2 exiting

Vortex

115 2004-07-30 11:48:00 e:\Program Files\Thunderstone Software\Webinator\texis\scripts\Webinator\dowalk:69: Field NextCheck non-existent
000 2004-07-30 11:48:00 e:\Program Files\Thunderstone Software\Webinator\texis\scripts\Webinator\dowalk:69: SQLExecute() failed with -1 in the function execntexis

Webinator has already walked thousands of sites and I don't want to loose them, however, i want to stop the walker and also want to assure this problem doesn't happen in future.

Bottom line is I want to stop this walk and change settings so it doesn't happen in future. I have paid webinator 5.0.5 (both scripts are up to date). Any help?
KMandalia
Posts: 301
Joined: Fri Jul 09, 2004 3:50 pm

Can I timeout a walk?

Post by KMandalia »

I did that, but that doesn't stop walk. I want to stop the walk completely. The reason is this:

I want to put in categories. I paused the walk and make database live and now I am going into all walk settings, putting categories and url patterns in place and hitting update button. However, I am not seeing the category box on the search form. May be becuase the walk is still going on.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Can I timeout a walk?

Post by mark »

Check the walk status page for the status of what's happening. You may have to scroll down to see it all.
KMandalia
Posts: 301
Joined: Fri Jul 09, 2004 3:50 pm

Can I timeout a walk?

Post by KMandalia »

Dispatcher stopping by request. May take up to 185 seconds to stop.
47561 pages fetched (-1,846,877,930 bytes) Total
224475 errors Total
24048 duplicate pages Total

Updating search index ...Recategorization started: 2004-07-30 11:32:39

I am putting in two categories. one with 20 websites and others with 1 website.


started 3 (9576) on http://promo.cuna.org/promo/banner_js.p ... _kids.html
started 3 (8692) on http://promo.cuna.org/promo/banner_js.p ... 4_fct.html
started 3 (8984) on http://promo.cuna.org/promo/banner_js.p ... 4_fct.html
started 3 (9696) on http://promo.cuna.org/promo/banner_js.p ... _spam.html
started 3 (9680) on http://promo.cuna.org/promo/banner_js.p ... _call.html
started 3 (9804) on http://promo.cuna.org/promo/banner_js.p ... 4_fct.html
started 3 (9656) on http://promo.cuna.org/promo/banner_js.p ... _fct.html/

This goes a long way up (different webpages and different numbers in brackets). I know these are bad webpages.So, I paused and made database live succesfully.

1) I want to stop walk. Period.
2) I am wondering whether recategorization really takes 3 hours for 46000 websites (its only two categories).
3) When I click on the stop button in all walk settings it should sooner or later stop,doesn't it?

First, I was not able to crawl enough pages due to some reason which is still unresolved. Now I removed categories and all other restriction that could potentially limit no. of pages retrived. Now I am not able to stop the walk.

I have the latest script (both same version and all).

Let me know what you think.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Can I timeout a walk?

Post by mark »

When you hit "Pause and live" the index is updated, the database is made live, and the walk stops. Recategorization will run until it's complete. It looks like you tried to recategorize while the index was still being built which may have confused things.

On NT use task manager to see if there are any texis processes still running. Kill them. That should eliminate anything that's stuck but will leave the database in an unknown state. You should probably do a new walk after that.
Post Reply