Thunderstone Support Forums

Posted: **Wed Jun 16, 2010 6:16 am**

We have a user who uses ABBYY PDF Transformer v3.0 to translate the scanned PDF documents to doc files using the above mentioned OCR (Optical Character Recognition) software.
The contents of these documents are not being able to be fetched by the search parser. For eg 1 of the documents which contained both text and images resulted in only images of the file being fetched by the search parser.
the outline of the file with headers before applying anytotx function was

X-Input-Content-Type: application/rtf
X-Translator-Status: identified
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="%multipart-mixed-boundary-19787.1.1276608591.114.1804289383%"

--%multipart-mixed-boundary-19787.1.1276608591.114.1804289383%
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

/// data

--%multipart-mixed-boundary-19787.1.1276608591.114.1804289383%
Content-Disposition: attachment; filename="0001.png"
Content-Type: text/plain
Content-Transfer-Encoding: 8bit

--%multipart-mixed-boundary-19787.1.1276608591.114.1804289383%
Content-Disposition: attachment; filename="0002.png"
Content-Type: text/plain
Content-Transfer-Encoding: 8bit

--%multipart-mixed-boundary-19787.1.1276608591.114.1804289383%
Content-Disposition: attachment; filename="0003.png"
Content-Type: text/plain
Content-Transfer-Encoding: 8bit

--%multipart-mixed-boundary-19787.1.1276608591.114.1804289383%
Content-Disposition: attachment; filename="0004.png"
Content-Type: text/plain
Content-Transfer-Encoding: 8bit

--%multipart-mixed-boundary-19787.1.1276608591.114.1804289383%--

The file that got stored in the search db was only

0001.png

0002.png

0003.png

0004.png

Do you have any idea about this anomaly ?

Thunderstone Support Forums

MIME issue

MIME issue