Webinator 2.5 with PDF dropping characters

Post Reply
sferda0
Posts: 15
Joined: Mon May 14, 2001 5:02 pm

Webinator 2.5 with PDF dropping characters

Post by sferda0 »

Webinator 2.5.908387340 of Oct 14, 1998 on Irix with the PDF plugin.

When we execute searches of PDF documents, Webinator is returning descriptions with missing characters:

"fo ms a e c eated fo many special needs. A unifo m look ac oss a b oad ange of uses can be achieved th ough a common"

though no such loss of characters is in the actual PDF.

Suggestions?

Steve Ferda
User avatar
mark
Site Admin
Posts: 5513
Joined: Tue Apr 25, 2000 6:56 pm

Webinator 2.5 with PDF dropping characters

Post by mark »

The text it outputs is what the Adobe PDF library gives. My guess is that you're dealing with a document that has been scanned and OCR'd and the OCR was less than great. But the PDF reader shows you the picture of the page, not the OCR'd text.

Confirm what the pdf filter is getting from Adobe with
pdftotx <yourfile.pdf |more

If it's not an OCR issue and you were to supply a link to the document in question we could try it with out our newer non-adobe based pdf filter to see if it does a better job.
Post Reply