A 2nd RTF Indexing problem

Post Reply
legedza.henry
Posts: 142
Joined: Wed Jul 24, 2002 11:52 pm

A 2nd RTF Indexing problem

Post by legedza.henry »

Hi there,

Hope someone may be able shed some light on this peculiar problem.

We currently run three Webinator deployments: Portal, Intranet and Extranet. All are running 4.4.3 Windows with plugin

There is an RTF file on the Extranet which refuses to index.

The extranet server is on a network which requires a login via a proxy.

After the default timout period of 60 secs the index process abandons and returns the following error

The link : http://www.leadersdesktop.sa.edu.au/res ... eement.rtf

Had this error: Error translating via anytotx: command e:\webinator\anytotx.exe returned exit code 1 000 Dec 16 08:35:20 anytotx (1412): timeout processing (e:\webinator\anytotx.exe --content-type=application/rtf --timeout=60 "--error-log=f:\webinator/texis/leaderstest/anytotx-errors.644")

What makes the situation more confusing is that when we index the file via the Public Portal and Intranet deployments it indexes fine.

The Extranet deployment also manages to correctly index other RTF files fetched from the Portal.

I tried putting the strange RTF file on the Portal and then had the Extranet fetch and index but still got the same result.

Regards
Henry
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

A 2nd RTF Indexing problem

Post by mark »

See if you get the same behavior if you run anytotx by hand. Get the file from the server in question and place it somewhere on the machine where Webinator is installed. Run the command above using the downloaded file as input
e:\webinator\anytotx.exe --content-type=application/rtf --timeout=60 <yourfile.rtf
Also supply your exact version of anytotx from the following command:
e:\webinator\anytotx.exe --identify
legedza.henry
Posts: 142
Joined: Wed Jul 24, 2002 11:52 pm

A 2nd RTF Indexing problem

Post by legedza.henry »

Thanks for the tips.

Here is the result of the identify command:

C:\>e:\webinator\anytotx.exe --identify
release: 20031025 1067009964
thunderstone: 1
formats: pdf html msw xls mso swf auto other
pdf: 2.02
metaok: 1
features: meta links images rules timeout

Our other deployments give the same result.

As to the manual test:

it simply sits there and then returns:

C:\>e:\webinator\anytotx.exe --content-type=application/rtf --timeout=60 <c:\temp\services_agreement.rtf
000 anytotx (1452): timeout processing <stdin>

The same happened with another RTF file which indexed successfully on another deployment
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

A 2nd RTF Indexing problem

Post by mark »

I should have had you run it with -G as well to get more insight into where it's getting stuck. I'm not able to replicate your problem with your version and that file.
e:\webinator\anytotx.exe -G --content-type=application/rtf --timeout=60 <c:\temp\services_agreement.rtf

None of your local drives are near full or anything are they?
legedza.henry
Posts: 142
Joined: Wed Jul 24, 2002 11:52 pm

A 2nd RTF Indexing problem

Post by legedza.henry »

Disk capacity not an issue...

The result of the -G addition is as follows:

C:\>e:\webinator\anytotx.exe -G --content-type=application/rtf --timeout=60 <c:\
temp\services_agreement.rtf
X-Anytotx-Content-Type: application/rtf
X-Anytotx-Identified-By: Content-Type
X-Anytotx-Status: translate
X-Anytotx-Translator-Args: e:\webinator\etc\rtf2html

the above command produced the result instantaneously
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

A 2nd RTF Indexing problem

Post by mark »

Maybe the subprocessor's getting stuck. Now try
e:\webinator\etc\rtf2html <c:\temp\services_agreement.rtf
legedza.henry
Posts: 142
Joined: Wed Jul 24, 2002 11:52 pm

A 2nd RTF Indexing problem

Post by legedza.henry »

That seemed to work.

It dumped the content of the file to stdout (commandline)
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

A 2nd RTF Indexing problem

Post by mark »

Ok, try this. Open e:\webinator\conf\formats.rule in notepad. Find the line with rtf2html on it. Change
%INSTALLDIR%\etc\rtf2html
to
%INSTALLDIR%\etc\rtf2html %IN%
Then try the anytotx command again.
legedza.henry
Posts: 142
Joined: Wed Jul 24, 2002 11:52 pm

A 2nd RTF Indexing problem

Post by legedza.henry »

That fixed the problem.. File indexes without a hitch. Now what is it we did?

Regards
Henry
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

A 2nd RTF Indexing problem

Post by mark »

Told it to use a temp file instead of a pipeline to communicate with rtf2html. Sometimes NT pipes act funny...
Post Reply