plugin difficulties

Post Reply
User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

plugin difficulties

Post by Thunderstone »



I'm testing Webinator under RedHat Linux 6.1. I've installed a PDF text
extractor and told the gw program about it like this:

-n"application/pdf,pdf,/usr/bin/pstotext"

but I'm getting errors in the gw.log file whenever it encounters a pdf file:

6/29 12:17:27 Retrieving http://test.mnu.edu/mac/transcript_request.pdf
6/29 12:17:27 Plugin Failed: /usr/bin/pstotext

BUT the pstotext program works from the command line. What could be my
problem?

-dougl

______________________________________________________________________

Doug Ledbetter -- Webmaster for MidAmerica Nazarene University
2030 East College Way, Olathe, KS 66062-1899
dougl@mnu.edu (913)782-3750 x205 http://www.mnu.edu/

"But what about you? Who do you say I am?"
--Jesus, circa 30AD
______________________________________________________________________




User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

plugin difficulties

Post by Thunderstone »



That's indicative of a broken pipeline. gw was not able to read
or write the pipe to the plugin filter.

The filter should accept the pdf file on stdin and send text back
on stdout. Your filter should work when invoked like this:
/usr/bin/pstotext <transcript_request.pdf >transcript_request.txt



User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

plugin difficulties

Post by Thunderstone »



At 05:22 PM 6/29/2000 -0400, you wrote:



Both of my tools for extracting text from PDF files requires a file name
and will NOT accept PDFs from stdin. (because "PDF documents require
random access, hence cannot be read from standard input.") I'll have to
find another tool, I guess.

Suggestions for PDF text extractors for Linux users? Anybody, anybody?

thanks,
-dougl

______________________________________________________________________

Doug Ledbetter -- Webmaster for MidAmerica Nazarene University
2030 East College Way, Olathe, KS 66062-1899
dougl@mnu.edu (913)782-3750 x205 http://www.mnu.edu/

"But what about you? Who do you say I am?"
--Jesus, circa 30AD
______________________________________________________________________




User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

plugin difficulties

Post by Thunderstone »



Make a wrapper that captures the input to a file then gives that to the
pdf processor. Something like this (assuming pstotext puts text on stdout):

#!/bin/sh
tf=/tmp/pdftmp.$$
cat >$tf
/usr/bin/pstotext <$tf
rm -f $tf




Post Reply