Thunderstone Support Forums

Posted: **Mon Mar 08, 2004 12:15 pm**

I have 2 questions.

1. Many of the link titles on the results page just indicate type of document, e.g., "PDF Document", "Word Document", or even worse Untitled document which makes it difficult to get a sense of what these document's content might be. More than anything this demonstrates bad choice of link text or Doc titles on the part of the authors, but I wonder if it's possible on our end to somehow override the titles of these documents with something more meaningful? Google seems to do this somehow - not sure if it does so by parsing the PDF content or by accessing the PDF's meta data.

2. Another cool thing Google does is convert non-HTML documents to HTML. I am wondering if this is possible with the webinator tool.

Posted: **Mon Mar 08, 2004 1:05 pm**

The plugin will get title info from pdf meta data if it's available. Can you give an example where google gets a useful title for a document but webinator doesn't for the same document?

Please also provide your anytotx version (anytotx --identify).

Posted: **Mon Mar 08, 2004 1:49 pm**

The ability Google offers is to convert documents into HTML on the fly. If you do a search at Google.com and a word or PDF document comes up as a result you also get a link that states View as HTML. I used the filetype:pdf operator in my search to get results that are just pdf or word. I am still checking into the document title.

Posted: **Mon Mar 08, 2004 2:48 pm**

I'm aware of the html display for pdf documents. The plugin doesn't currently offer such a feature. People generally prefer the native format. If they can't read that the plain text is usually sufficient.

Posted: **Tue Mar 09, 2004 7:58 am**

Webinator's "Match Info" link DOES provide a view-as-html version, for PDF as well as other document types. Although not fancy, all the text is there (cached), and it highlights your search terms too.

Thunderstone Support Forums

HTML converstion

HTML converstion

HTML converstion

HTML converstion

HTML converstion

HTML converstion