PDF potrait vs. landscape

KMandalia
Posts: 301
Joined: Fri Jul 09, 2004 3:50 pm

PDF potrait vs. landscape

Post by KMandalia »

If the PDF file is in the landscape mode, webinator will index it but wouldn't search correctly.

Is this a known limitation of webinator or there is no way to search landscape PDF files whatsoever?
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

PDF potrait vs. landscape

Post by mark »

Can you provide an example pdf url which doesn't index correctly?
KMandalia
Posts: 301
Joined: Fri Jul 09, 2004 3:50 pm

PDF potrait vs. landscape

Post by KMandalia »

Actually, I said it didn't 'search' them (I guess wouldn't build index on them, since they are walked alright)

The following is private (later I will make one available public for the sake of message board users and you can make this message public) :

http://www.creditunions.com/cud/On_Page_134.pdf
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

PDF potrait vs. landscape

Post by mark »

Besides a few bad word breaks (not unheard of in pdf files) all the data seems to be there. What kind of problem are you having?
KMandalia
Posts: 301
Joined: Fri Jul 09, 2004 3:50 pm

PDF potrait vs. landscape

Post by KMandalia »

What I did was to index the whole book and try different searches.

I noticed in everything I searched for, that "almost all" the landscape portions of the book were not appearing in the results. And that's when I posted the message.

Now after adding some word forms (for phone numbers and zip codes etc. ) I re-ran a "new" walk and searched for "SAFCU" in the whole book.

This time webinator did found the page number but hilighed a completely different thing.

click on,

http://search.creditunions.com/scripts/ ... query=pima

This is weird (click on couple of the results). I am 100% sure that the first time the search didn't worked on landscape mode but now it seems to be working but hilighting completely different things (I am OK with it as long as it hilights the query terms)

BTW, I guess we can hilight it with different color (yellow, may be)?
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

PDF potrait vs. landscape

Post by mark »

KMandalia
Posts: 301
Joined: Fri Jul 09, 2004 3:50 pm

PDF potrait vs. landscape

Post by KMandalia »

:)

From all the discussion we have, I think that searching and hilighting PDF docs in webinator is not worth the effort since (1) the search is not guaraneed to return what you might be looking for (2) it will create bad impression on the person using the 'Search within the book' (this is what I am trying to implement) feature.

Does word expression has any influence on what gets hilighted (even though the query doesn't match the word expression)?
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

PDF potrait vs. landscape

Post by mark »

Only what matches the query would ever attempt to be hilighted.
KMandalia
Posts: 301
Joined: Fri Jul 09, 2004 3:50 pm

PDF potrait vs. landscape

Post by KMandalia »

I am 100% sure that there is some kind of problem with webinator searching (not indexing) landscape pdf documents.

I would like you to walk the following folder:

http://www.creditunions.com/cud/

the whole folder is pdf documents so do all the settings that optimizes the pdf searching (and let me know as well)

next, if you see the individual body content in list/edit urls you will find that body content for landscape mode pdf is sometimes messed up for the top of the document but all actual info is there.

Try searching for various names of credit unions that you see in the body (I don't remember exact page nos. but On_Page_200.pdf to On_Page_250.pdf are good candidates for testing.

what I have found is that almost always search by name isn't working but if I search for numbers the pages 200 to 250 will appear. I have no explanation for that. However I can see that the names are breaking up many times so that webinator may not be finding it.

At this point I really don't care much about hilighting but we want our clients to trust the search engine so that whatever shows up is definately all there is (I don't care about speed of search either). This is 600 page document full of data so locating where a particular piece of infomation is all that matters.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

PDF potrait vs. landscape

Post by mark »

Please provide the url of your search, what you searched for, what the unexpected result was, and all of your word definition expressions for that profile.
Post Reply