Large indexes and live search

scott.shaver
Posts: 45
Joined: Tue May 31, 2005 12:13 pm

Large indexes and live search

Post by scott.shaver »

Okay I've killed the walk. Just to see if the problem is caused by the index being stored on a SAN disk I have made a new walk as a duplicate of this walk and told it to store the database on a local disk. this is what I get

Walk started at 2006-03-20 12:42:41 (by resume)
JavaScript walking enabled
HTTPS walking disabled
Start fetching at file://evergreen/corp/
Ignore urls containing any of the following:
/cgi-bin/
~
?
/private
started 1 new (3804) on file://evergreen/corp/

010 E:\MORPH3\texis\scripts/Webinator/dowalk(doprimer) 282: Document not found: file:// document from file \\evergreen\corp\: The system cannot find the path specified in the function htconn_openskt
1 pages fetched (0 bytes) from file://evergreen/corp/
1 errors

Updating search index ...Done.
Creating spell-checker dictionaries...Done.
Done.
Verifying usability of new walk.

Walk finished at 2006-03-20 12:42:44 (took 2 seconds)
Keeping database live: E:\test\evergreen_corp_local/db1

--------------------------------------------------------------------------------
Checking for broken hyperlinks...

The link : file://evergreen/corp/
Had this error: Document not found: file:// document from file \\evergreen\corp\: The system cannot find the path specified
--------------------------------------------------------------------------------
End of report.



Which makes no since, I can go to the server and type file://evergreen/corp into a browser and it works perfectly. What the heck am I doing wrong?
scott.shaver
Posts: 45
Joined: Tue May 31, 2005 12:13 pm

Large indexes and live search

Post by scott.shaver »

I think I may have found part of the slow down. there seems to be a lot of this going on (that file is a word document):

Error translating via anytotx: 100 2006-03-21 09:59:08 Anytotx translator E:\MORPH3\etc\antiword\antiword -cUTF-8.txt -M -l -fMSW --timeout=60 --error-log=s:\texisindexes\evergreen_corp/anytotx-errors.3448 --max-depth=9 --content-type=application/msword E:\MORPH3\tmp\cvti02488a stderr output for <stdin> follows 100 2006-03-21 09:59:08 stderr: 100 E:\MORPH3\tmp\cvti02488a is not a Word Document. 100 2006-03-21 09:59:08 Anytotx translator E:\MORPH3\etc\antiword\antiword -cUTF-8.txt -M -l -fMSW --timeout=60 --error-log=s:\texisindexes\evergreen_corp/anytotx-errors.3448 --max-depth=9 --content-type=application/msword E:\MORPH3\tmp\cvti02488a returned exit code 1 for <stdin> in the function txdatatran_translatefile (E:\MORPH3\anytotx.exe -fmsw --timeout=60 "--error-log=s:\texisindexes\evergreen_corp/anytotx-errors.3448")

file://evergreen/corp/Marketing/public/event/Events%202004/SNW%20Spring/SNWInteropLabCheckRequest.doc
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Large indexes and live search

Post by mark »

For the word-nonword problem please open a tech support ticket so we can arrange to get a copy of the file for inspection.

I'm not sure I entirely followed what was different about the profile that couldn't even get started vs. the one that was running ok (but slowly).
scott.shaver
Posts: 45
Joined: Tue May 31, 2005 12:13 pm

Large indexes and live search

Post by scott.shaver »

okay I'll open a ticket. Just to clarify the above errors are occuring on the index that is running slowly not the local disk one which I can't get started.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Large indexes and live search

Post by mark »

Doc files were larger than max page size and got truncated. Need to increase max page size under all walk settings.
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

Large indexes and live search

Post by John »

You may want to set the Primer URL to None one the walk that is having the problem. Apart from the database location is the profile identical?
John Turnbull
Thunderstone Software
scott.shaver
Posts: 45
Joined: Tue May 31, 2005 12:13 pm

Large indexes and live search

Post by scott.shaver »

They were identical. I had to kill the texis process for the walk that was slow because it wouldn't stop. I deleted the index that wasn't working. I created another index as a duplicate of the slow one. I changed to max size to 10 meg and set the primer URL to None. Tweaked a couple of settings about the java script stuff. Upped the verbosity to 4. Tried to run the new one and got the walk status page shown below. What is going on, why won't it walk the file server at all now?




Walk Status
Current User: webinator
Current Profile: evergreen_corp2 Webinator 5.1.29-Windows-w/plugin

Latest run:
0 pages in todo
0 pages scheduled to be refreshed in the next hour
2 pages visited in the last hour (1 success/1 failed)
1 pages in index


Pages recently walked
1 pages (0 bytes).
1 errors.
0 duplicate pages.

Page Visited Modified Url
-------+-------------------+-------------------+-------------------------------------------------------
1 2 mins ago 2 mins ago file://evergreen/corp/ (0 bytes)

Recent errors
Visited Reason Url
--------------------+--------------------+-------------------------------------------------------
2 mins ago Document not found: file://evergreen/corp/

Next Pages to be walked
Next Check Modified Url
--------------------+------------------+-------------------------------------------------------
In 6 d, 23 hr+ 2 mins ago file://evergreen/corp/ (0 bytes)


Walk started at 2006-03-21 12:43:35 (by resume)
Verbosity set to 4
JavaScript walking disabled
HTTPS walking disabled
Start fetching at file://evergreen/corp/
file://evergreen/corp/
Ignore urls containing any of the following:
/cgi-bin/
~
?
/private
started 1 refresh (1880) on file://evergreen/corp/
0 pages fetched (0 bytes) from file://evergreen/corp/
1 errors

Updating search index ...Done.
Creating spell-checker dictionaries...Done.
Done.
Verifying usability of new walk.

Walk finished at 2006-03-21 12:43:42 (took 3 seconds)
Keeping database live: s:\texisindexes\evergreen_corp2/db2

--------------------------------------------------------------------------------
Checking for broken hyperlinks...

The link : file://evergreen/corp/
Had this error: Document not found: file:// document from file \\evergreen\corp\: The system cannot find the path specified
--------------------------------------------------------------------------------
End of report.
scott.shaver
Posts: 45
Joined: Tue May 31, 2005 12:13 pm

Large indexes and live search

Post by scott.shaver »

well this is interesting, I just set up an identical index on the test box and it works over there, I don't know what it going on.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Large indexes and live search

Post by mark »

The machine where it's not working has permission to access to \\evergreen\corp, right?
Does the webserver run as the same user with the same perms on both systems? Did someone go around locking down or cleaning up things on the non-functioning server? We've had reports of things working on dev machines but not production machines or vice-versa because they weren't true clones of each other.
scott.shaver
Posts: 45
Joined: Tue May 31, 2005 12:13 pm

Large indexes and live search

Post by scott.shaver »

I'm getting there the issue seems to be a problem with how win2003, win2k, texis and DFS interact. I'll have more info tomorrow.
Post Reply