We have a Dynamically generated Library that houses all our pdf files. Only the Library list is dynamic, all the pdf's are static. Do pdf's work on a refresh setting? The details say the site must support If-Modified-Since. How can I test if the site supports this and if pdf's return that information. 99% of the pdfs are the same each week and I would rather have the crawl just get the new or updated ones.
Thanks
John
If you look under List/Edit URLs for the details of a PDF that was added during the last walk, if the Modified time is the same as the Visited time it would suggest the web server is not using the Modified header and If-Modified-Since. You can also usually see in your browser looking at the page properties.
If that is the entire contents of the library you can set the default refresh period to 1 year, and make the library list a watch url. The list will then be refreshed every refresh, but the PDF files won't be checked for a year.
OK it appears that the server is correctly giving the modified date.
Indexed: 2007-10-19 23:24:46
Modified: 2004-08-29 22:23:01
Last Visit: 2007-10-19 23:24:46
Next Visit: 2008-01-17 22:24:46
I want each pdf file to be checked each week to see if the date is changed and if it is get the new one.
I don't want it to wait a year until it gets the new one.
OK, then you can set the max refresh time to a week, and it should then do the relatively quick If-Modified-Since check. It appears you probably have 90 days currently as the max refresh time.