Page 1 of 2
wipe to-do table
Posted: Mon Oct 25, 2004 11:11 am
by KMandalia
I need to wipe out todo table since the webinator on refresh walk resumes the thread which I no longer wish to refresh.
Other than todo.tbl what other files need to be considered?
can I just execute one statement from command line like
<sql novars "delete * from todo"></sql>
wipe to-do table
Posted: Mon Oct 25, 2004 11:44 am
by mark
That would be
"delete from todo"
which effectively will erase all resume data. You may want to be more specific if you're walking more than one site.
"delete from todo where Url matches '
http://thesite%'"
But in general "refresh" will refresh everything in the database. You can't tell it to refresh one site and not another.
wipe to-do table
Posted: Mon Oct 25, 2004 12:41 pm
by KMandalia
I am not quite sure how intelligent refresh decides to refresh data (since it is no longer all pages in database but only those that refresh considers may have changed) but even though I have 3 servers, it resumes on just one and doesn't seem to get out of it.
I have blocked the whole site that it is refreshing right now. It is still bringing more pages that match to what is available in database.
wipe to-do table
Posted: Mon Oct 25, 2004 12:46 pm
by John
The crawl will also resume the todo list of any walks that have paused, which is probably what you are seeing, so the delete from todo will help.
wipe to-do table
Posted: Tue Nov 02, 2004 11:59 am
by KMandalia
What if I put a base url in the exclude list like
http://www.somesite.com/*
eventually all entries in html and refs table should be deleted, right?
In other words how webinator treats web pages it can't update, do they sit in the tables or get wiped out eventually?
wipe to-do table
Posted: Tue Nov 02, 2004 2:15 pm
by mark
Changing the rules won't cause a page to be deleted from the database. Pages that it tries to refresh that no longer exist on the server will be deleted from the database. To remove pages from the database that still exist on the server you need to explictly delete them.
wipe to-do table
Posted: Tue Nov 02, 2004 2:53 pm
by KMandalia
OK. So 404 errors will delete corresponding webpages but errors that denote 'request denied' or 'server unexpectedly closed connection' will not delete the webpages from the table, am I right?
wipe to-do table
Posted: Tue Nov 02, 2004 3:23 pm
by mark
All http responses 400 or greater will cause the page to be deleted as will most server errors that result in an incomplete or page.
wipe to-do table
Posted: Tue Nov 30, 2004 10:09 pm
by KMandalia
Is deleting the Visited field for the url patterns I do not wish to update (ever) an effective way of altering to do table?
Today I wiped out the todo table but some of the urls I don't wish to refresh are listed in next urls to walk and are getting crawled right now.
Also, isn't wiping out todo table assure that refresh will start with base url list and try to update everything in turn?
wipe to-do table
Posted: Wed Dec 01, 2004 6:32 am
by John
The html table has a NextCheck field in it which has the time to next refresh that Url. You can set that 5 or 10 years into the future.
The refresh will look for Urls in the html table that have a NextCheck before the current time, and refresh those urls. In addition any urls in todo will be processed along with new urls found during the refresh.