I'm writing a custom walker similar to Webinator with Vortex. This walker needs to reload or "refresh" sites. What is the best way to implement -e, -X, -V options? Walking through the database and checking the date stored would seem costly in terms of processing. How does Webinator handle this?
There's no way to know what's old without looking. It's not expensive for the database. That's how gw does it.
select Url,Visited from html where Visited<$THEDESIREDDATE
Use <urlcp ifmodsince $Visited> to replicate -V behavior. Check the http response for 304 indicating "not modified".
-X is simply deleting the record from the database if the fetch fails.