Page 1 of 1

Webinator 2.5 with PDF dropping characters

Posted: Mon Apr 08, 2002 11:03 am
by sferda0
Webinator 2.5.908387340 of Oct 14, 1998 on Irix with the PDF plugin.

When we execute searches of PDF documents, Webinator is returning descriptions with missing characters:

"fo ms a e c eated fo many special needs. A unifo m look ac oss a b oad ange of uses can be achieved th ough a common"

though no such loss of characters is in the actual PDF.

Suggestions?

Steve Ferda

Webinator 2.5 with PDF dropping characters

Posted: Mon Apr 08, 2002 1:43 pm
by mark
The text it outputs is what the Adobe PDF library gives. My guess is that you're dealing with a document that has been scanned and OCR'd and the OCR was less than great. But the PDF reader shows you the picture of the page, not the OCR'd text.

Confirm what the pdf filter is getting from Adobe with
pdftotx <yourfile.pdf |more

If it's not an OCR issue and you were to supply a link to the document in question we could try it with out our newer non-adobe based pdf filter to see if it does a better job.