Pdf files and Refresh walk setting

Post Reply
jgdoke
Posts: 167
Joined: Wed Jul 14, 2004 10:52 am

Pdf files and Refresh walk setting

Post by jgdoke »

We have a Dynamically generated Library that houses all our pdf files. Only the Library list is dynamic, all the pdf's are static. Do pdf's work on a refresh setting? The details say the site must support If-Modified-Since. How can I test if the site supports this and if pdf's return that information. 99% of the pdfs are the same each week and I would rather have the crawl just get the new or updated ones.
Thanks
John
User avatar
John
Site Admin
Posts: 2597
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

Pdf files and Refresh walk setting

Post by John »

If you look under List/Edit URLs for the details of a PDF that was added during the last walk, if the Modified time is the same as the Visited time it would suggest the web server is not using the Modified header and If-Modified-Since. You can also usually see in your browser looking at the page properties.

If that is the entire contents of the library you can set the default refresh period to 1 year, and make the library list a watch url. The list will then be refreshed every refresh, but the PDF files won't be checked for a year.
John Turnbull
Thunderstone Software
jgdoke
Posts: 167
Joined: Wed Jul 14, 2004 10:52 am

Pdf files and Refresh walk setting

Post by jgdoke »

OK it appears that the server is correctly giving the modified date.
Indexed: 2007-10-19 23:24:46
Modified: 2004-08-29 22:23:01
Last Visit: 2007-10-19 23:24:46
Next Visit: 2008-01-17 22:24:46

I want each pdf file to be checked each week to see if the date is changed and if it is get the new one.
I don't want it to wait a year until it gets the new one.
User avatar
John
Site Admin
Posts: 2597
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

Pdf files and Refresh walk setting

Post by John »

OK, then you can set the max refresh time to a week, and it should then do the relatively quick If-Modified-Since check. It appears you probably have 90 days currently as the max refresh time.
John Turnbull
Thunderstone Software
Post Reply