Page 1 of 1
Walk never stops or finishes
Posted: Fri Jul 01, 2005 12:28 pm
by david180
Up to today, the walk was only seeing 33% of my documents (976 of 2898 items). This morning I realized that the Exclusions list included the ~ and that the missing urls have a ~ in them. I removed the ~ from exclusions list, changed the walk from Refresh to New and told it to GO. After <5mins, it had walked 1074 pages, then nothing. The walk didn't stop nor did it continue. Errors counts didn't increase. The Texis process stopped using cpu, nothing. I waited ten more minutes and then tried to Stop the walk. This failed and after several tries and minutes, I rebooted the Webinator server and tried again. This time the walk saw 2500 pages and then hung just the same. After waiting for 15 minutes of inactivity, I tried to Stop the walk, and after failing several times, killed the Texis process. Tries 3 and 4 followed with both walking about 1000 pages and then hanging same as above. The errors encountered don't seem unusual (offsite, max page size exceeded, ...)
What do I look at next?
Walk never stops or finishes
Posted: Fri Jul 01, 2005 1:05 pm
by mark
What does the bottom of the walk status indicate when it's stuck?
What's your texis version (full output of "texis -version" )?
What's your dowalk script version (top right of the settings page)?
What version of what OS are you running on?
Are the documents all html and text or a mix including pdf, word, etc.?
Walk never stops or finishes
Posted: Fri Jul 01, 2005 1:07 pm
by david180
From the current walk that has been doing nothing for 11 minutes, having started about 16 minutes ago.
Creating database C:\Program Files\Thunderstone Software\Webinator/texis/default/db2...Done.
Walk started at 2005-07-01 10:48:32 (by user)
Verbosity set to 4
JavaScript walking disabled
HTTPS walking disabled
Start fetching at
http://zek.netdocuments.com/dcWeb/dcWeb.aspx
http://zek.netdocuments.com/dcWeb/dcWeb.aspx
Ignore urls containing any of the following:
/cgi-bin/
started 1 new (4028) on
http://zek.netdocuments.com/dcWeb/dcWeb.aspx
Walk never stops or finishes
Posted: Fri Jul 01, 2005 1:11 pm
by david180
And for the rest of the data:
version
Texis Web Script (Vortex) Copyright (c) 1996-2005 Thunderstone - EPI, Inc.
Enterprise Webinator Version 5.01.1109610969 20050228 (i686-intel-winnt-32-32)
dowalk
Webinator 5.1.10
$Id: dowalk.src,v 2.310 2005/02/23 18:19:41 kai Exp $
OS
Windows Server 2003 Standard Edition SP1
documents
A mix of all sorts of stuff. 771 .txt, 652 .doc, then dozens of pdf, xls, wpd, ppt, htm and some other misc types.
Walk never stops or finishes
Posted: Fri Jul 01, 2005 3:15 pm
by david180
I updated to the newest dowalk script with no change in results:
Webinator 5.1.18
$Id: dowalk.src,v 2.352 2005/06/22 17:15:29 kai Exp $
I let it do nothing for more than an hour and then request that it Stop walking. It won't stop. I requested the stop 4 minutes ago:
Creating database C:\Program Files\Thunderstone Software\Webinator/texis/default/db2...Done.
Walk started at 2005-07-01 11:44:47 (by user)
Verbosity set to 4
JavaScript walking disabled
HTTPS walking disabled
Start fetching at
http://zek.netdocuments.com/dcWeb/dcWeb.aspx
http://zek.netdocuments.com/dcWeb/dcWeb.aspx
Ignore urls containing any of the following:
/cgi-bin/
started 1 new (1748) on
http://zek.netdocuments.com/dcWeb/dcWeb.aspx
Dispatcher stopping by request. May take up to 65 seconds to stop.
Walk never stops or finishes
Posted: Fri Jul 01, 2005 3:19 pm
by mark
What's running in taskmanager in terms of texis, monitor, and anytotx. Are any of those using cpu?
What all non-default settings are you using?
Just for testing, does it work better if you take out all non html and text extensions like .pdf .xsl .doc .swf .wpd .ppt etc.?
Walk never stops or finishes
Posted: Fri Jul 01, 2005 3:34 pm
by david180
anytotx is running, using no cpu.
monitor is running 4 times using no cpu.
texis is running 2 times using no cpu.
I am not sure that I know what the defaults are...
Exclusions - remove ~
Verbosity - 4
Strip Queries - N
Login Info - supplied
Off-site - N
Stay Under - Y
All Extensions - Y (trying N now)
Don't think I have changed anything else.
Trying fewer extensions now.
Walk never stops or finishes
Posted: Fri Jul 01, 2005 3:51 pm
by david180
This extension set makes no difference in behavior (see below): .html .htm .txt .doc .pdf .xls .aspx .ppt .wpd
I have proven my theory below by trying just: .html .htm .txt and getting 0 pages...
Changing Extensions to be just .aspx yields the problem again.
Extensions:
The changes I am making to the extension handling aren't going to make any difference... Everything, from the extension check point of view, is .aspx. Even the documents. Now, the mime types change properly to be what they should be for .doc, .xls, .pdf, etc. But the Url, before the query string portion is always .aspx.
Typical document url:
http://zek.netdocuments.com/dcWeb/dcWeb ... 0&ext=.doc
Walk never stops or finishes
Posted: Fri Jul 01, 2005 4:44 pm
by mark
If you kill the anytotx it will probably continue and should log an error indicating what url it was working on at the time.
Walk never stops or finishes
Posted: Fri Jul 01, 2005 5:34 pm
by david180
Yes. That worked. I am now working on excluding the items that are causing problems. I will be on vacation all next week, so I am just getting this working for the Testing and Demo teams, but once I am back, I will look more closely at what was causing anytotx to hang.
Thanks.