HTML converstion

Post Reply
mmcfadden
Posts: 158
Joined: Tue May 20, 2003 2:17 pm

HTML converstion

Post by mmcfadden »

I have 2 questions.

1. Many of the link titles on the results page just indicate type of document, e.g., "PDF Document", "Word Document", or even worse Untitled document which makes it difficult to get a sense of what these document's content might be. More than anything this demonstrates bad choice of link text or Doc titles on the part of the authors, but I wonder if it's possible on our end to somehow override the titles of these documents with something more meaningful? Google seems to do this somehow - not sure if it does so by parsing the PDF content or by accessing the PDF's meta data.

2. Another cool thing Google does is convert non-HTML documents to HTML. I am wondering if this is possible with the webinator tool.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

HTML converstion

Post by mark »

The plugin will get title info from pdf meta data if it's available. Can you give an example where google gets a useful title for a document but webinator doesn't for the same document?

Please also provide your anytotx version (anytotx --identify).
mmcfadden
Posts: 158
Joined: Tue May 20, 2003 2:17 pm

HTML converstion

Post by mmcfadden »

The ability Google offers is to convert documents into HTML on the fly. If you do a search at Google.com and a word or PDF document comes up as a result you also get a link that states View as HTML. I used the filetype:pdf operator in my search to get results that are just pdf or word. I am still checking into the document title.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

HTML converstion

Post by mark »

I'm aware of the html display for pdf documents. The plugin doesn't currently offer such a feature. People generally prefer the native format. If they can't read that the plain text is usually sufficient.
doran
Posts: 50
Joined: Tue Jun 06, 2000 1:37 pm

HTML converstion

Post by doran »

Webinator's "Match Info" link DOES provide a view-as-html version, for PDF as well as other document types. Although not fancy, all the text is there (cached), and it highlights your search terms too.
Post Reply