First some background:
I just inherited my company's search engine which has been on the back burner for quite a few years. I have no information as to history or whatever. We purchased webinator about 4 years ago including the pdftotx plugin.
gw -version:
Webinator WWW Site Indexer Version 2.51 (Commercial)
Copyright(c) 1995,1996,1997,1998 Thunderstone EPI Inc.
Release: 1999011
Our Unix date stamp on the pdftotx file is Oct. 22, 98. Don't know how to get the actual version. We're running HP-UX 11.0.
------
The problem i'm having is indexing pdf files. I created a pdf doc in Acrobat 5.0 format and tried indexing it. It indexed the text fine but didn't do anything with the meta tags (title, keywords, etc). I ran a gw -st against the html table in the db and the metatag field came up empty. The title field came back with something like "PDF Document (90k)". I've verfied that there is metadata (title, keywords, etc) in the pdf file. I've also verfied that my gw command is indexing meta information correctly by indexing an html file with metatags (it worked).
I'm guessing that the problem is that Adobe changed their metatag API/format and the ancient pdftotx can't read that information anymore. Is this accurate? Is there an updated version of pdftotx that can read Acrobat 5 and earlier pdfs?
Thanks,
Bob
I just inherited my company's search engine which has been on the back burner for quite a few years. I have no information as to history or whatever. We purchased webinator about 4 years ago including the pdftotx plugin.
gw -version:
Webinator WWW Site Indexer Version 2.51 (Commercial)
Copyright(c) 1995,1996,1997,1998 Thunderstone EPI Inc.
Release: 1999011
Our Unix date stamp on the pdftotx file is Oct. 22, 98. Don't know how to get the actual version. We're running HP-UX 11.0.
------
The problem i'm having is indexing pdf files. I created a pdf doc in Acrobat 5.0 format and tried indexing it. It indexed the text fine but didn't do anything with the meta tags (title, keywords, etc). I ran a gw -st against the html table in the db and the metatag field came up empty. The title field came back with something like "PDF Document (90k)". I've verfied that there is metadata (title, keywords, etc) in the pdf file. I've also verfied that my gw command is indexing meta information correctly by indexing an html file with metatags (it worked).
I'm guessing that the problem is that Adobe changed their metatag API/format and the ancient pdftotx can't read that information anymore. Is this accurate? Is there an updated version of pdftotx that can read Acrobat 5 and earlier pdfs?
Thanks,
Bob