Formatta form files jamming anytotx

Post Reply
abosch
Posts: 11
Joined: Tue Dec 05, 2000 4:37 pm

Formatta form files jamming anytotx

Post by abosch »

In our walks, we want to include the content of Formatta form files (.pff) in the index. Anytotx seems to handle most, but not all, Formatta forms. The ones it has trouble parsing make it hang so we need to reboot the server to kill the process. What is causing anytotx to choke and is their a work-around for it?
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Formatta form files jamming anytotx

Post by mark »

Can you supply the url of an example that is causing problems?
abosch
Posts: 11
Joined: Tue Dec 05, 2000 4:37 pm

Formatta form files jamming anytotx

Post by abosch »

User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Formatta form files jamming anytotx

Post by mark »

I don't see any reason it would "hang" on that file or anything similar. What are your versions?
Texis: texis -version
Scripts: top right of the dowalk profile admin page
Anytotx: anytotx --identify
abosch
Posts: 11
Joined: Tue Dec 05, 2000 4:37 pm

Formatta form files jamming anytotx

Post by abosch »

anytotx --identify
release: 20040108 1073606452
thunderstone: 1
formats: pdf html msw xls mso swf auto other
pdf: 2.02
metaok: 1
features: meta links images rules timeout

texis -version
Texis Web Script (Vortex) Copyright (c) 1996-2004 Thunderstone - EPI, Inc.
Enterprise Webinator Version 4.04.1073606452 of Jan 8, 2004 (i686-intel-winnt-64-32)

Scripts: 4.4.11
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Formatta form files jamming anytotx

Post by mark »

I tried that file with your version on windows XP. It ran fine. Are you sure the .pff's are the problem? Even if anytotx did get confused with a file it wouldn't run longer than the page timeout.
What are the precise symptoms you are seeing?
abosch
Posts: 11
Joined: Tue Dec 05, 2000 4:37 pm

Formatta form files jamming anytotx

Post by abosch »

It's running on Windows 2000 Server SP3.

These are the symptoms:
1) A walk stalls indefinitely.

2) The walk can be stopped from the webinatoradmin, but the database remains locked by anytotx; attempts to restart the walk fail because the database files have been locked and can not be deleted. The server must be rebooted to restore access.

3) The process (pid) anytotx used to access the database can not be killed by the usual Windows tools and the ones from the Resource Kit.

4) On examination of the walker todo list (cururls.*), the only files that are different from HTML and other parseable files are the .pff files; when I exclude them from walks with the Exclusion REX, the walks succeed every time.

andrew
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Formatta form files jamming anytotx

Post by mark »

Try processing the file by hand. Download the file to the machine where webinator is installed. Then from a dos prompt run
INSTALLDIR\anytotx <04grantapp2004.pff

One thing to note, anytotx doesn't know formatta files specifically so it makes an attempt to extract generic text. There doesn't seem to be any extractable text in the example you provided. It may be true that no pff file will have extractable text, in which case it would be fairly pointless to index them.
Post Reply