Walk never stops or finishes

Post Reply
david180
Posts: 38
Joined: Wed May 11, 2005 3:44 pm

Walk never stops or finishes

Post by david180 »

Up to today, the walk was only seeing 33% of my documents (976 of 2898 items). This morning I realized that the Exclusions list included the ~ and that the missing urls have a ~ in them. I removed the ~ from exclusions list, changed the walk from Refresh to New and told it to GO. After <5mins, it had walked 1074 pages, then nothing. The walk didn't stop nor did it continue. Errors counts didn't increase. The Texis process stopped using cpu, nothing. I waited ten more minutes and then tried to Stop the walk. This failed and after several tries and minutes, I rebooted the Webinator server and tried again. This time the walk saw 2500 pages and then hung just the same. After waiting for 15 minutes of inactivity, I tried to Stop the walk, and after failing several times, killed the Texis process. Tries 3 and 4 followed with both walking about 1000 pages and then hanging same as above. The errors encountered don't seem unusual (offsite, max page size exceeded, ...)

What do I look at next?
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Walk never stops or finishes

Post by mark »

What does the bottom of the walk status indicate when it's stuck?
What's your texis version (full output of "texis -version" )?
What's your dowalk script version (top right of the settings page)?
What version of what OS are you running on?
Are the documents all html and text or a mix including pdf, word, etc.?
david180
Posts: 38
Joined: Wed May 11, 2005 3:44 pm

Walk never stops or finishes

Post by david180 »

From the current walk that has been doing nothing for 11 minutes, having started about 16 minutes ago.

Creating database C:\Program Files\Thunderstone Software\Webinator/texis/default/db2...Done.
Walk started at 2005-07-01 10:48:32 (by user)
Verbosity set to 4
JavaScript walking disabled
HTTPS walking disabled
Start fetching at http://zek.netdocuments.com/dcWeb/dcWeb.aspx
http://zek.netdocuments.com/dcWeb/dcWeb.aspx
Ignore urls containing any of the following:
/cgi-bin/
started 1 new (4028) on http://zek.netdocuments.com/dcWeb/dcWeb.aspx
david180
Posts: 38
Joined: Wed May 11, 2005 3:44 pm

Walk never stops or finishes

Post by david180 »

And for the rest of the data:

version
Texis Web Script (Vortex) Copyright (c) 1996-2005 Thunderstone - EPI, Inc.
Enterprise Webinator Version 5.01.1109610969 20050228 (i686-intel-winnt-32-32)

dowalk
Webinator 5.1.10
$Id: dowalk.src,v 2.310 2005/02/23 18:19:41 kai Exp $

OS
Windows Server 2003 Standard Edition SP1

documents
A mix of all sorts of stuff. 771 .txt, 652 .doc, then dozens of pdf, xls, wpd, ppt, htm and some other misc types.
david180
Posts: 38
Joined: Wed May 11, 2005 3:44 pm

Walk never stops or finishes

Post by david180 »

I updated to the newest dowalk script with no change in results:

Webinator 5.1.18
$Id: dowalk.src,v 2.352 2005/06/22 17:15:29 kai Exp $

I let it do nothing for more than an hour and then request that it Stop walking. It won't stop. I requested the stop 4 minutes ago:

Creating database C:\Program Files\Thunderstone Software\Webinator/texis/default/db2...Done.
Walk started at 2005-07-01 11:44:47 (by user)
Verbosity set to 4
JavaScript walking disabled
HTTPS walking disabled
Start fetching at http://zek.netdocuments.com/dcWeb/dcWeb.aspx
http://zek.netdocuments.com/dcWeb/dcWeb.aspx
Ignore urls containing any of the following:
/cgi-bin/
started 1 new (1748) on http://zek.netdocuments.com/dcWeb/dcWeb.aspx
Dispatcher stopping by request. May take up to 65 seconds to stop.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Walk never stops or finishes

Post by mark »

What's running in taskmanager in terms of texis, monitor, and anytotx. Are any of those using cpu?

What all non-default settings are you using?

Just for testing, does it work better if you take out all non html and text extensions like .pdf .xsl .doc .swf .wpd .ppt etc.?
david180
Posts: 38
Joined: Wed May 11, 2005 3:44 pm

Walk never stops or finishes

Post by david180 »

anytotx is running, using no cpu.
monitor is running 4 times using no cpu.
texis is running 2 times using no cpu.

I am not sure that I know what the defaults are...
Exclusions - remove ~
Verbosity - 4
Strip Queries - N
Login Info - supplied
Off-site - N
Stay Under - Y
All Extensions - Y (trying N now)
Don't think I have changed anything else.

Trying fewer extensions now.
david180
Posts: 38
Joined: Wed May 11, 2005 3:44 pm

Walk never stops or finishes

Post by david180 »

This extension set makes no difference in behavior (see below): .html .htm .txt .doc .pdf .xls .aspx .ppt .wpd

I have proven my theory below by trying just: .html .htm .txt and getting 0 pages...

Changing Extensions to be just .aspx yields the problem again.

Extensions:
The changes I am making to the extension handling aren't going to make any difference... Everything, from the extension check point of view, is .aspx. Even the documents. Now, the mime types change properly to be what they should be for .doc, .xls, .pdf, etc. But the Url, before the query string portion is always .aspx.

Typical document url:
http://zek.netdocuments.com/dcWeb/dcWeb ... 0&ext=.doc
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Walk never stops or finishes

Post by mark »

If you kill the anytotx it will probably continue and should log an error indicating what url it was working on at the time.
david180
Posts: 38
Joined: Wed May 11, 2005 3:44 pm

Walk never stops or finishes

Post by david180 »

Yes. That worked. I am now working on excluding the items that are causing problems. I will be on vacation all next week, so I am just getting this working for the Testing and Demo teams, but once I am back, I will look more closely at what was causing anytotx to hang.

Thanks.
Post Reply