I am integration of Vignette and Taxis search integration.
I am suppose to do search on text inside the MS word and PDF files. I have no idea how it works. If possible can somebody e-mail the manuals or sample code to do it will be great.
Thanks
See http://www.thunderstone.com/site/texisman/node64.html
Or you can run anytotx on the word and doc files manually with exec.
Or, if you're using gw, you need to use the -n option.
There's also a file called anytotx.txt in the same directory as anytotx that you can read.
Hi!
I have some basic questions:
1) How do I index pdf or word files?
2) How do I create metamorph index for pdf or word file?
3) Is there any way I can upload file using script to Texis server?
4) shall I execute anytotx using script?
thanks
-vaibhav
1,4) You index the text of those types of files. anytotx is used to extract the text from them which can then be inserted into a texis table.
<exec anytotx $pdffile></exec><$text=$ret>
<exec anytotx -fmsw $wordfile></exec><$text=$ret>
2) same as on any other varchar field.
3) Yes, see http://www.thunderstone.com/site/vortexman/node16.html
Hi there!
I got searching through Word, XLS and PPT files but doesn't search through Vignette files.
I have INDIRECT type if field and I have stored file path in there and when I try to search for word, which exists in pdf files. It doesn't return me the result.
Is there anything else I need to do? or need to upgrade Texis Software?
Does the indirect point to the original PDF or the text extracted using anytotx? You should be doing queries against the extracted text. If you still have problems you need to provide a small outline of your table and load and search procedures for us to be able to help further.
Hi Mark!
Yes I have Metamorph INDEX for INDIRECT Field "FILE_PATH" and it's pointing directly to PDF file. Same field points directly to WORD, POWER POINT and EXECL files and searches through them but doesn't serach throguh PDF files. I am not using "anytotx" as it's not necessary because I have field type of "Indirect". Can u give me some feedback on this. The table structure is something like this.
<SQL "create table cmp_Search (ID INTEGER NOT NULL, PARENT_ID INTEGER NOT NULL, PARENT_TYPE_ID INTEGER NOT NULL, TITLE CHAR(50), DESCRIPTION_TEXT VARCHAR(3000), NAME_ADDRESS CHAR(1000), FILE_PATH INDIRECT, CREATED_DATE DATE, PRIMARY KEY (ID))">
</SQL>
There is no relationship between using indirect and the need to use anytotx. An indirect simply means that the data is in an external file instead of directly in the table. The data is treated the same for searching either way.
Word files generally have the text visible within the file. When you search it you are searching the text and all of the encoding around it. PDF files do not have the text visible. The only way to get the text is with anytotx. Anytotx will also get rid of the encoding around the text in word files.
You need to create another field for the "text" of the document and populate that with the output of anytotx. That's the field that should be searched, not the raw file.