How does webinator determine the title for PDF files it crawls? The titles on the search results appear (to me) to be inconsistent. Some have the file name while others have meaningful title.
Titles in PDF files
Titles in PDF files
It uses the title property stored in the PDF document. Depending on how the PDF was created it may be more or less meaningful.
John Turnbull
Thunderstone Software
Thunderstone Software
Titles in PDF files
Webinator uses the "Title" specified in the PDF document. If there is none it will revert to "PDF Document". Often pdf generators will put the source filename as the title of the document. So you end up with pdf's with useless titles like "myfile.doc".
Titles in PDF files
How does webinator determine the title for PDF files it crawls? The titles on the search results appear (to me) to be inconsistent. Some have the file name while others have meaningful title.
Titles in PDF files
I am sending this private so it does not show up on the website.
That makes me wonder how Google does it. When there is a useless title in one of our pdf, doc, xls, files, google somehow finds a meaningful title.
That makes me wonder how Google does it. When there is a useless title in one of our pdf, doc, xls, files, google somehow finds a meaningful title.
Titles in PDF files
My guess would be that they may be looking for some large/bold text at the top of the document as an alternative, although I'm not sure when they'd choose to use that versus the actual title property in the PDF.
John Turnbull
Thunderstone Software
Thunderstone Software
Titles in PDF files
Ok, one last question, I think.
Is there a way to change the title using regular expressions? I am not yet comfortable with how these are used with webinator, so it may not be possible.
Mike
Is there a way to change the title using regular expressions? I am not yet comfortable with how these are used with webinator, so it may not be possible.
Mike
Titles in PDF files
You can use the "Data from Field" to override the title, and pull it from somewhere else, however that is not currently a conditional, so it is all titles or no titles from the field. For example if you wanted the title of all results to be the first 50 characters of the Body you could have as a search:
>>=.{,50}
in the Text field.
>>=.{,50}
in the Text field.
John Turnbull
Thunderstone Software
Thunderstone Software
Titles in PDF files
Ok, thanks, but as you probably guessed, changing all the titles is not optimal for us.
Does Thunderstone have a "requested feature" list (or similar)? I would like to request something like this.
Does Thunderstone have a "requested feature" list (or similar)? I would like to request something like this.
Titles in PDF files
Yes, we do have a requested feature list, and more conditional Data from Field is on the list, but I'll make sure to add this case.
John Turnbull
Thunderstone Software
Thunderstone Software