Page 1 of 1

MIME issue

Posted: Wed Jun 16, 2010 6:16 am
by gaurav.shetti
We have a user who uses ABBYY PDF Transformer v3.0 to translate the scanned PDF documents to doc files using the above mentioned OCR (Optical Character Recognition) software.
The contents of these documents are not being able to be fetched by the search parser. For eg 1 of the documents which contained both text and images resulted in only images of the file being fetched by the search parser.
the outline of the file with headers before applying anytotx function was

X-Input-Content-Type: application/rtf
X-Translator-Status: identified
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="%multipart-mixed-boundary-19787.1.1276608591.114.1804289383%"


--%multipart-mixed-boundary-19787.1.1276608591.114.1804289383%
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

/// data


--%multipart-mixed-boundary-19787.1.1276608591.114.1804289383%
Content-Disposition: attachment; filename="0001.png"
Content-Type: text/plain
Content-Transfer-Encoding: 8bit


--%multipart-mixed-boundary-19787.1.1276608591.114.1804289383%
Content-Disposition: attachment; filename="0002.png"
Content-Type: text/plain
Content-Transfer-Encoding: 8bit


--%multipart-mixed-boundary-19787.1.1276608591.114.1804289383%
Content-Disposition: attachment; filename="0003.png"
Content-Type: text/plain
Content-Transfer-Encoding: 8bit


--%multipart-mixed-boundary-19787.1.1276608591.114.1804289383%
Content-Disposition: attachment; filename="0004.png"
Content-Type: text/plain
Content-Transfer-Encoding: 8bit


--%multipart-mixed-boundary-19787.1.1276608591.114.1804289383%--


The file that got stored in the search db was only



0001.png




0002.png




0003.png




0004.png


Do you have any idea about this anomaly ?