Something that hasn't been mentioned before, all walk setting "Plugin Split" may be useful for splitting pdf text into individual pages that then refer back to the full document.
But I have decided to go with the option of replacing the 'PDF Document (84k)' that webinator inserts when it can't find the title in the PDF meta data with the file name of my PDF document.
So, I will need to first identify whether the url of the walked document is ours or not and if it is ours, I shall check for no title and replace the 'PDF Docuemnt...' with the file name.
Now, would it be better to implement the above logic in dowalk or should it be in search (ideally, if this all can be done in dowalk then my search would not slow down)?