Page 1 of 1

plugin difficulties

Posted: Thu Jun 29, 2000 5:08 pm
by Thunderstone


I'm testing Webinator under RedHat Linux 6.1. I've installed a PDF text
extractor and told the gw program about it like this:

-n"application/pdf,pdf,/usr/bin/pstotext"

but I'm getting errors in the gw.log file whenever it encounters a pdf file:

6/29 12:17:27 Retrieving http://test.mnu.edu/mac/transcript_request.pdf
6/29 12:17:27 Plugin Failed: /usr/bin/pstotext

BUT the pstotext program works from the command line. What could be my
problem?

-dougl

______________________________________________________________________

Doug Ledbetter -- Webmaster for MidAmerica Nazarene University
2030 East College Way, Olathe, KS 66062-1899
dougl@mnu.edu (913)782-3750 x205 http://www.mnu.edu/

"But what about you? Who do you say I am?"
--Jesus, circa 30AD
______________________________________________________________________





plugin difficulties

Posted: Thu Jun 29, 2000 5:19 pm
by Thunderstone


That's indicative of a broken pipeline. gw was not able to read
or write the pipe to the plugin filter.

The filter should accept the pdf file on stdin and send text back
on stdout. Your filter should work when invoked like this:
/usr/bin/pstotext <transcript_request.pdf >transcript_request.txt




plugin difficulties

Posted: Thu Jun 29, 2000 6:07 pm
by Thunderstone


At 05:22 PM 6/29/2000 -0400, you wrote:



Both of my tools for extracting text from PDF files requires a file name
and will NOT accept PDFs from stdin. (because "PDF documents require
random access, hence cannot be read from standard input.") I'll have to
find another tool, I guess.

Suggestions for PDF text extractors for Linux users? Anybody, anybody?

thanks,
-dougl

______________________________________________________________________

Doug Ledbetter -- Webmaster for MidAmerica Nazarene University
2030 East College Way, Olathe, KS 66062-1899
dougl@mnu.edu (913)782-3750 x205 http://www.mnu.edu/

"But what about you? Who do you say I am?"
--Jesus, circa 30AD
______________________________________________________________________





plugin difficulties

Posted: Thu Jun 29, 2000 6:13 pm
by Thunderstone


Make a wrapper that captures the input to a file then gives that to the
pdf processor. Something like this (assuming pstotext puts text on stdout):

#!/bin/sh
tf=/tmp/pdftmp.$$
cat >$tf
/usr/bin/pstotext <$tf
rm -f $tf