Page 1 of 2
PDF potrait vs. landscape
Posted: Mon Nov 22, 2004 2:26 pm
by KMandalia
If the PDF file is in the landscape mode, webinator will index it but wouldn't search correctly.
Is this a known limitation of webinator or there is no way to search landscape PDF files whatsoever?
PDF potrait vs. landscape
Posted: Mon Nov 22, 2004 2:49 pm
by mark
Can you provide an example pdf url which doesn't index correctly?
PDF potrait vs. landscape
Posted: Mon Nov 22, 2004 3:11 pm
by KMandalia
Actually, I said it didn't 'search' them (I guess wouldn't build index on them, since they are walked alright)
The following is private (later I will make one available public for the sake of message board users and you can make this message public) :
http://www.creditunions.com/cud/On_Page_134.pdf
PDF potrait vs. landscape
Posted: Mon Nov 22, 2004 4:02 pm
by mark
Besides a few bad word breaks (not unheard of in pdf files) all the data seems to be there. What kind of problem are you having?
PDF potrait vs. landscape
Posted: Mon Nov 22, 2004 4:09 pm
by KMandalia
What I did was to index the whole book and try different searches.
I noticed in everything I searched for, that "almost all" the landscape portions of the book were not appearing in the results. And that's when I posted the message.
Now after adding some word forms (for phone numbers and zip codes etc. ) I re-ran a "new" walk and searched for "SAFCU" in the whole book.
This time webinator did found the page number but hilighed a completely different thing.
click on,
http://search.creditunions.com/scripts/ ... query=pima
This is weird (click on couple of the results). I am 100% sure that the first time the search didn't worked on landscape mode but now it seems to be working but hilighting completely different things (I am OK with it as long as it hilights the query terms)
BTW, I guess we can hilight it with different color (yellow, may be)?
PDF potrait vs. landscape
Posted: Mon Nov 22, 2004 5:02 pm
by mark
See
http://thunderstone.master.com/texis/ma ... 417d38e610 about what gets hilighted in acrobat. It would appear to be a little more off in your landscape docs. Not sure why.
You can tell acrobat to use a different color by modifying the <pdfxml> call in search (
http://www.thunderstone.com/site/vortexman/node178.html ) but it's my experience that acrobat pretty much ignores your color choice.
PDF potrait vs. landscape
Posted: Mon Nov 22, 2004 5:18 pm
by KMandalia
From all the discussion we have, I think that searching and hilighting PDF docs in webinator is not worth the effort since (1) the search is not guaraneed to return what you might be looking for (2) it will create bad impression on the person using the 'Search within the book' (this is what I am trying to implement) feature.
Does word expression has any influence on what gets hilighted (even though the query doesn't match the word expression)?
PDF potrait vs. landscape
Posted: Mon Nov 22, 2004 5:56 pm
by mark
Only what matches the query would ever attempt to be hilighted.
PDF potrait vs. landscape
Posted: Thu Nov 25, 2004 10:00 pm
by KMandalia
I am 100% sure that there is some kind of problem with webinator searching (not indexing) landscape pdf documents.
I would like you to walk the following folder:
http://www.creditunions.com/cud/
the whole folder is pdf documents so do all the settings that optimizes the pdf searching (and let me know as well)
next, if you see the individual body content in list/edit urls you will find that body content for landscape mode pdf is sometimes messed up for the top of the document but all actual info is there.
Try searching for various names of credit unions that you see in the body (I don't remember exact page nos. but On_Page_200.pdf to On_Page_250.pdf are good candidates for testing.
what I have found is that almost always search by name isn't working but if I search for numbers the pages 200 to 250 will appear. I have no explanation for that. However I can see that the names are breaking up many times so that webinator may not be finding it.
At this point I really don't care much about hilighting but we want our clients to trust the search engine so that whatever shows up is definately all there is (I don't care about speed of search either). This is 600 page document full of data so locating where a particular piece of infomation is all that matters.
PDF potrait vs. landscape
Posted: Fri Nov 26, 2004 11:28 am
by mark
Please provide the url of your search, what you searched for, what the unexpected result was, and all of your word definition expressions for that profile.