Page 1 of 1
plugin difficulties
Posted: Thu Jun 29, 2000 5:08 pm
by Thunderstone
I'm testing Webinator under RedHat Linux 6.1. I've installed a PDF text
extractor and told the gw program about it like this:
-n"application/pdf,pdf,/usr/bin/pstotext"
but I'm getting errors in the gw.log file whenever it encounters a pdf file:
6/29 12:17:27 Retrieving
http://test.mnu.edu/mac/transcript_request.pdf
6/29 12:17:27 Plugin Failed: /usr/bin/pstotext
BUT the pstotext program works from the command line. What could be my
problem?
-dougl
______________________________________________________________________
Doug Ledbetter -- Webmaster for MidAmerica Nazarene University
2030 East College Way, Olathe, KS 66062-1899
dougl@mnu.edu (913)782-3750 x205
http://www.mnu.edu/
"But what about you? Who do you say I am?"
--Jesus, circa 30AD
______________________________________________________________________
plugin difficulties
Posted: Thu Jun 29, 2000 5:19 pm
by Thunderstone
That's indicative of a broken pipeline. gw was not able to read
or write the pipe to the plugin filter.
The filter should accept the pdf file on stdin and send text back
on stdout. Your filter should work when invoked like this:
/usr/bin/pstotext <transcript_request.pdf >transcript_request.txt
plugin difficulties
Posted: Thu Jun 29, 2000 6:07 pm
by Thunderstone
At 05:22 PM 6/29/2000 -0400, you wrote:
Both of my tools for extracting text from PDF files requires a file name
and will NOT accept PDFs from stdin. (because "PDF documents require
random access, hence cannot be read from standard input.") I'll have to
find another tool, I guess.
Suggestions for PDF text extractors for Linux users? Anybody, anybody?
thanks,
-dougl
______________________________________________________________________
Doug Ledbetter -- Webmaster for MidAmerica Nazarene University
2030 East College Way, Olathe, KS 66062-1899
dougl@mnu.edu (913)782-3750 x205
http://www.mnu.edu/
"But what about you? Who do you say I am?"
--Jesus, circa 30AD
______________________________________________________________________
plugin difficulties
Posted: Thu Jun 29, 2000 6:13 pm
by Thunderstone
Make a wrapper that captures the input to a file then gives that to the
pdf processor. Something like this (assuming pstotext puts text on stdout):
#!/bin/sh
tf=/tmp/pdftmp.$$
cat >$tf
/usr/bin/pstotext <$tf
rm -f $tf