In our walks, we want to include the content of Formatta form files (.pff) in the index. Anytotx seems to handle most, but not all, Formatta forms. The ones it has trouble parsing make it hang so we need to reboot the server to kill the process. What is causing anytotx to choke and is their a work-around for it?
Formatta form files jamming anytotx
Formatta form files jamming anytotx
Can you supply the url of an example that is causing problems?
Formatta form files jamming anytotx
I don't see any reason it would "hang" on that file or anything similar. What are your versions?
Texis: texis -version
Scripts: top right of the dowalk profile admin page
Anytotx: anytotx --identify
Texis: texis -version
Scripts: top right of the dowalk profile admin page
Anytotx: anytotx --identify
Formatta form files jamming anytotx
anytotx --identify
release: 20040108 1073606452
thunderstone: 1
formats: pdf html msw xls mso swf auto other
pdf: 2.02
metaok: 1
features: meta links images rules timeout
texis -version
Texis Web Script (Vortex) Copyright (c) 1996-2004 Thunderstone - EPI, Inc.
Enterprise Webinator Version 4.04.1073606452 of Jan 8, 2004 (i686-intel-winnt-64-32)
Scripts: 4.4.11
release: 20040108 1073606452
thunderstone: 1
formats: pdf html msw xls mso swf auto other
pdf: 2.02
metaok: 1
features: meta links images rules timeout
texis -version
Texis Web Script (Vortex) Copyright (c) 1996-2004 Thunderstone - EPI, Inc.
Enterprise Webinator Version 4.04.1073606452 of Jan 8, 2004 (i686-intel-winnt-64-32)
Scripts: 4.4.11
Formatta form files jamming anytotx
I tried that file with your version on windows XP. It ran fine. Are you sure the .pff's are the problem? Even if anytotx did get confused with a file it wouldn't run longer than the page timeout.
What are the precise symptoms you are seeing?
What are the precise symptoms you are seeing?
Formatta form files jamming anytotx
It's running on Windows 2000 Server SP3.
These are the symptoms:
1) A walk stalls indefinitely.
2) The walk can be stopped from the webinatoradmin, but the database remains locked by anytotx; attempts to restart the walk fail because the database files have been locked and can not be deleted. The server must be rebooted to restore access.
3) The process (pid) anytotx used to access the database can not be killed by the usual Windows tools and the ones from the Resource Kit.
4) On examination of the walker todo list (cururls.*), the only files that are different from HTML and other parseable files are the .pff files; when I exclude them from walks with the Exclusion REX, the walks succeed every time.
andrew
These are the symptoms:
1) A walk stalls indefinitely.
2) The walk can be stopped from the webinatoradmin, but the database remains locked by anytotx; attempts to restart the walk fail because the database files have been locked and can not be deleted. The server must be rebooted to restore access.
3) The process (pid) anytotx used to access the database can not be killed by the usual Windows tools and the ones from the Resource Kit.
4) On examination of the walker todo list (cururls.*), the only files that are different from HTML and other parseable files are the .pff files; when I exclude them from walks with the Exclusion REX, the walks succeed every time.
andrew
Formatta form files jamming anytotx
Try processing the file by hand. Download the file to the machine where webinator is installed. Then from a dos prompt run
INSTALLDIR\anytotx <04grantapp2004.pff
One thing to note, anytotx doesn't know formatta files specifically so it makes an attempt to extract generic text. There doesn't seem to be any extractable text in the example you provided. It may be true that no pff file will have extractable text, in which case it would be fairly pointless to index them.
INSTALLDIR\anytotx <04grantapp2004.pff
One thing to note, anytotx doesn't know formatta files specifically so it makes an attempt to extract generic text. There doesn't seem to be any extractable text in the example you provided. It may be true that no pff file will have extractable text, in which case it would be fairly pointless to index them.