walk doesn't stop

KMandalia
Posts: 301
Joined: Fri Jul 09, 2004 3:50 pm

walk doesn't stop

Post by KMandalia »

dowalk 5.1.6
timeout of 60 seconds
refreshing about 120K pages.

Even after clicking 'pause walk and live' 3 times, walker doesn't stop. It says removing commonality in walk status but then after that keeps getting more pages. More than 1 hour has passed and only single texis process is running.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

walk doesn't stop

Post by mark »

Does the runnint texis pid correspond to any of the started children listed on the walk status page? Try pause again. If not, it's probably the dispatcher still running the remove common which can take a fair amount of time and memory with a large number of files.

The "stop" and "pause" signals last 1 minute then go away. Occasionally a child doesn't check for the signal before it expires. Doing 2 or 3 pauses 45 seconds apart will usually get those.
KMandalia
Posts: 301
Joined: Fri Jul 09, 2004 3:50 pm

walk doesn't stop

Post by KMandalia »

I will try what you said. Usually if remove common has the problem then it will be the last statement in walks statut. but this thing keeps going on.

Also, I am facing this type of issues every time I start a refresh on a newly completed walk that has about 100K pages.

We have enterprise webinator for 6 months now and there hasn't been a single time when I will be able to successfully refresh a huge database. Ideally, I would want to do a complete walk (whatever days it will take) and then introduce a big delay of about 20-30 seconds and start a refresh (since index doesn't get updated on the fly, I would pause and restart every 3-4 days, if it works successfully..)

May be we need to look over the combination of enterprise webinator and windows 2003 server that we have.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

walk doesn't stop

Post by mark »

Where does your refresh get stuck? Try turning off remove common. It won't be very effective in a refresh anyhow.
KMandalia
Posts: 301
Joined: Fri Jul 09, 2004 3:50 pm

walk doesn't stop

Post by KMandalia »

Yeah, that makes sense (I will do that, only if I can successfully stop it without killing everything and remaking index by hand)
KMandalia
Posts: 301
Joined: Fri Jul 09, 2004 3:50 pm

walk doesn't stop

Post by KMandalia »

The pid showing up is 9160 and last several lines are (this is going on for last 6 hours, I tried doing pauses every 45 seconds, 9160 is not showing up anywhere on the page):

started 3 new (11760) on http://www.cunalendingcouncil.org/download/
Dispatcher stopping by request. May take up to 65 seconds to stop.
123246 pages fetched (-1,987,327,543 bytes) Total
114858 errors Total
34930 duplicate pages Total

Removing commonality from fetched pages...Reading urls from URL http://www.creditunions.com/getarchive.aspx
Reading urls from URL http://www.creditunions.com/resources/a ... leases.asp
Reading urls from URL http://www.creditunions.com/resources/a ... report.asp
Walker holding by request. (http://www.bai.org)
482 pages fetched (67,608,105 bytes) from http://www.bai.org
Reading urls from URL http://www.creditunions.com/getarchive.aspx
Reading urls from URL http://www.creditunions.com/resources/a ... leases.asp
Reading urls from URL http://www.creditunions.com/resources/a ... report.asp
Walker holding by request. (http://www.cunatechnologycouncil.org/)
31 pages fetched (1,097,407 bytes) from http://www.cunatechnologycouncil.org/
Reading urls from URL http://www.creditunions.com/getarchive.aspx
Reading urls from URL http://www.creditunions.com/resources/a ... leases.asp
Reading urls from URL http://www.creditunions.com/resources/a ... report.asp
166 pages fetched (25,039,688 bytes) from http://www.cunalendingcouncil.org/download/
Show Errors
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

walk doesn't stop

Post by mark »

If the texis that's still running is a generating the pages fetched messages it has to be listed on the walk status. The unidentified texis would be the dispatcher which would be doing the remove common right now. Perhaps the texis you're looking at is newer than the walk status you have up. Sounds like you may need to kill that texis.
KMandalia
Posts: 301
Joined: Fri Jul 09, 2004 3:50 pm

walk doesn't stop

Post by KMandalia »

Comment re. #4 above (it won't be effective in refresh anyway..)

Tried it and most of the critical results are now showing common text which is very very discouraging. Why?

Immediately need to run remove common. How do I do that from command line?
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

walk doesn't stop

Post by mark »

There's no provision for running remove common by itself. You could do a sql update to remove the boilerplate text then updateindex. Something like this should do it

"update html set Body=Body-'the common text' where Body matches 'the common text*'"

Obtain the common text from the full record display under list/edit urls.

For future walks you'll need to configure your walk to use keep and/or ignore tags to strip the boilerplate or do the sql update at the end.
KMandalia
Posts: 301
Joined: Fri Jul 09, 2004 3:50 pm

walk doesn't stop

Post by KMandalia »

I can do that but that will not solve the problem (I noticed it for just one query term, there could be more than one sites).

In your benchmarking tests, what would be the worst amount of time removecommon would take for 5.1.6 enterprise webinator for a total of 124,000 pages with a windows machine that has 2.8 Ghz clock and 2.5 GB of RAM.

Mine is getting out of control now, need to resolve this issue since it is becoming huge effort running new profiles every now and them and merging tables.
Post Reply