Large indexes and live search

scott.shaver
Posts: 45
Joined: Tue May 31, 2005 12:13 pm

Large indexes and live search

Post by scott.shaver »

"Are there actually more files it should be indexing?"

Well I can't tell, it seems to be going through the directories in alphabetical order. If it really is then there are a lot of files to go.

"In what way do you see it "moving through"?"

by whatching the file url list in the walk status page, it changes every now and again.

""few" meaning roughly how many?"

about 300

"You don't have a crawl delay of anything other than 0 do you?"

nope it is zero

"Is the connection between the appliance and fileserver a fast one?"

It isn't an appliance it's a texis installation on a very nice server. The network is quite fast. When the index first started it was flying throught the files very fast.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Large indexes and live search

Post by mark »

How large are the texis processes?
Does it speedup if you "pause and live" then "go" (in mode refresh) again?
scott.shaver
Posts: 45
Joined: Tue May 31, 2005 12:13 pm

Large indexes and live search

Post by scott.shaver »

I have one texis process that is current 90 megs and 2 monitor processes that are about 4.3 meg each.

I'll try pausing the walk and see what happens.
scott.shaver
Posts: 45
Joined: Tue May 31, 2005 12:13 pm

Large indexes and live search

Post by scott.shaver »

How long does it normally take to pause a walk? I know I'm not being patient but it would sure be nice to know the thing isn't hung

Walk started at 2006-03-17 12:05:53 (by resume)
JavaScript walking enabled
HTTPS walking disabled
Start fetching at file://evergreen/corp/
Ignore urls containing any of the following:
/cgi-bin/
~
?
/private
started 1 (3160) Resume 44187392f2
Walker holding by request. (file://evergreen/corp/)
8058 pages fetched (-1,511,690,516 bytes) from file://evergreen/corp/
started 1 (3192) Resume 44187392f2
Walker holding by request. (file://evergreen/corp/)
3 pages fetched (5,292,935 bytes) from file://evergreen/corp/
started 1 (3032) Resume 44187392f2
Show Errors
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Large indexes and live search

Post by mark »

The walk status should say "stopping by request" which is usually quick (well under a minute) unless there's a zillion urls in memory to write out to disk for resumption later. Then it'll go into a "creating search index" phase which could take a fair number of minutes for a large dataset.

It looks like your walk started up again after being paused. You don't have it on a rapid schedule do you? If it doesn't go into the indexing phase within 30 seconds or so click the pause button again.

If you have remove common on that will happen before the indexing and could take a while on a large dataset.
scott.shaver
Posts: 45
Joined: Tue May 31, 2005 12:13 pm

Large indexes and live search

Post by scott.shaver »

That restart was from friday. I don't have remove common on. The behavior you are describing is not what I'm seeing.

...

gak it just fried on me again :(
-----------------------------------------

Texis ISAPI
Texis ISAPI has been installed. However, it has encountered an error and cannot continue. Please check the event log for details on the problem.
-----------------------------------------

Texis ISAPI encountered a socket error:

Socket connection to remote Texis failed!

Please ensure that the host ((null)) and port (10700) are configured properly, and check "Texis/monitor.log" to see if the Monitor Web Server is running.

WSAGetLastError: 10061

-----------------------------------------
had to restart the monitor service.

The walk is dead now:

-----------------------------------------
Walk started at 2006-03-17 12:05:53 (by resume)
JavaScript walking enabled
HTTPS walking disabled
Start fetching at file://evergreen/corp/
Ignore urls containing any of the following:
/cgi-bin/
~
?
/private
started 1 (3160) Resume 44187392f2
Walker holding by request. (file://evergreen/corp/)
8058 pages fetched (-1,511,690,516 bytes) from file://evergreen/corp/
started 1 (3192) Resume 44187392f2
Walker holding by request. (file://evergreen/corp/)
3 pages fetched (5,292,935 bytes) from file://evergreen/corp/
started 1 (3032) Resume 44187392f2

-----------------------------------------

I can't tell if it finished or not. I suspect it didn't.
scott.shaver
Posts: 45
Joined: Tue May 31, 2005 12:13 pm

Large indexes and live search

Post by scott.shaver »

Okay I've killed the walk. Just to see if the problem is caused by the index being stored on a SAN disk I have made a new walk as a duplicate of this walk and told it to store the database on a local disk. this is what I get

Walk started at 2006-03-20 12:42:41 (by resume)
JavaScript walking enabled
HTTPS walking disabled
Start fetching at file://evergreen/corp/
Ignore urls containing any of the following:
/cgi-bin/
~
?
/private
started 1 new (3804) on file://evergreen/corp/

010 E:\MORPH3\texis\scripts/Webinator/dowalk(doprimer) 282: Document not found: file:// document from file \\evergreen\corp\: The system cannot find the path specified in the function htconn_openskt
1 pages fetched (0 bytes) from file://evergreen/corp/
1 errors

Updating search index ...Done.
Creating spell-checker dictionaries...Done.
Done.
Verifying usability of new walk.

Walk finished at 2006-03-20 12:42:44 (took 2 seconds)
Keeping database live: E:\test\evergreen_corp_local/db1

--------------------------------------------------------------------------------
Checking for broken hyperlinks...

The link : file://evergreen/corp/
Had this error: Document not found: file:// document from file \\evergreen\corp\: The system cannot find the path specified
--------------------------------------------------------------------------------
End of report.



Which makes no since, I can go to the server and type file://evergreen/corp into a browser and it works perfectly. What the heck am I doing wrong?
scott.shaver
Posts: 45
Joined: Tue May 31, 2005 12:13 pm

Large indexes and live search

Post by scott.shaver »

I think I may have found part of the slow down. there seems to be a lot of this going on (that file is a word document):

Error translating via anytotx: 100 2006-03-21 09:59:08 Anytotx translator E:\MORPH3\etc\antiword\antiword -cUTF-8.txt -M -l -fMSW --timeout=60 --error-log=s:\texisindexes\evergreen_corp/anytotx-errors.3448 --max-depth=9 --content-type=application/msword E:\MORPH3\tmp\cvti02488a stderr output for <stdin> follows 100 2006-03-21 09:59:08 stderr: 100 E:\MORPH3\tmp\cvti02488a is not a Word Document. 100 2006-03-21 09:59:08 Anytotx translator E:\MORPH3\etc\antiword\antiword -cUTF-8.txt -M -l -fMSW --timeout=60 --error-log=s:\texisindexes\evergreen_corp/anytotx-errors.3448 --max-depth=9 --content-type=application/msword E:\MORPH3\tmp\cvti02488a returned exit code 1 for <stdin> in the function txdatatran_translatefile (E:\MORPH3\anytotx.exe -fmsw --timeout=60 "--error-log=s:\texisindexes\evergreen_corp/anytotx-errors.3448")

file://evergreen/corp/Marketing/public/event/Events%202004/SNW%20Spring/SNWInteropLabCheckRequest.doc
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Large indexes and live search

Post by mark »

For the word-nonword problem please open a tech support ticket so we can arrange to get a copy of the file for inspection.

I'm not sure I entirely followed what was different about the profile that couldn't even get started vs. the one that was running ok (but slowly).
scott.shaver
Posts: 45
Joined: Tue May 31, 2005 12:13 pm

Large indexes and live search

Post by scott.shaver »

okay I'll open a ticket. Just to clarify the above errors are occuring on the index that is running slowly not the local disk one which I can't get started.
Post Reply