We have a user who uses ABBYY PDF Transformer v3.0 to translate the scanned PDF documents to doc files using the above mentioned OCR (Optical Character Recognition) software.
The contents of these documents are not being able to be fetched by the search parser. For eg 1 of the documents which contained both text and images resulted in only images of the file being fetched by the search parser.
the outline of the file with headers before applying anytotx function was
X-Input-Content-Type: application/rtf
X-Translator-Status: identified
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="%multipart-mixed-boundary-19787.1.1276608591.114.1804289383%"
--%multipart-mixed-boundary-19787.1.1276608591.114.1804289383%
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
/// data
--%multipart-mixed-boundary-19787.1.1276608591.114.1804289383%
Content-Disposition: attachment; filename="0001.png"
Content-Type: text/plain
Content-Transfer-Encoding: 8bit
--%multipart-mixed-boundary-19787.1.1276608591.114.1804289383%
Content-Disposition: attachment; filename="0002.png"
Content-Type: text/plain
Content-Transfer-Encoding: 8bit
--%multipart-mixed-boundary-19787.1.1276608591.114.1804289383%
Content-Disposition: attachment; filename="0003.png"
Content-Type: text/plain
Content-Transfer-Encoding: 8bit
--%multipart-mixed-boundary-19787.1.1276608591.114.1804289383%
Content-Disposition: attachment; filename="0004.png"
Content-Type: text/plain
Content-Transfer-Encoding: 8bit
--%multipart-mixed-boundary-19787.1.1276608591.114.1804289383%--
The file that got stored in the search db was only
0001.png
0002.png
0003.png
0004.png
Do you have any idea about this anomaly ?
The contents of these documents are not being able to be fetched by the search parser. For eg 1 of the documents which contained both text and images resulted in only images of the file being fetched by the search parser.
the outline of the file with headers before applying anytotx function was
X-Input-Content-Type: application/rtf
X-Translator-Status: identified
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="%multipart-mixed-boundary-19787.1.1276608591.114.1804289383%"
--%multipart-mixed-boundary-19787.1.1276608591.114.1804289383%
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
/// data
--%multipart-mixed-boundary-19787.1.1276608591.114.1804289383%
Content-Disposition: attachment; filename="0001.png"
Content-Type: text/plain
Content-Transfer-Encoding: 8bit
--%multipart-mixed-boundary-19787.1.1276608591.114.1804289383%
Content-Disposition: attachment; filename="0002.png"
Content-Type: text/plain
Content-Transfer-Encoding: 8bit
--%multipart-mixed-boundary-19787.1.1276608591.114.1804289383%
Content-Disposition: attachment; filename="0003.png"
Content-Type: text/plain
Content-Transfer-Encoding: 8bit
--%multipart-mixed-boundary-19787.1.1276608591.114.1804289383%
Content-Disposition: attachment; filename="0004.png"
Content-Type: text/plain
Content-Transfer-Encoding: 8bit
--%multipart-mixed-boundary-19787.1.1276608591.114.1804289383%--
The file that got stored in the search db was only
0001.png
0002.png
0003.png
0004.png
Do you have any idea about this anomaly ?