I've discovered that it takes over 2 days for the Webinator server to
scan all sites that comprise our state-wide presence even with the delay
set to -w0. Is there any way to do multiple smaller runs and then use
the database language to combine the results into a single database?
If it's acceptable to hit the web servers faster, you can run
multiple gw's on the same database at the same time to speed
up walking. try 2 to 5 copies. too many will stop helping
as you saturate your network connection.
You would need the facilities of full Texis to merge data from multiple
walk databases.
There's no standard script. It would involve using "addtable" to add the other html and refs table to the "main" database. Then inserting all of the data from the added tables into the main html and refs tables.
1. If I add the tables (html and refs) of "help" and "news" to the main database "db", wouldn't it wipe of the original tables of main database ?
2. Besides when I do the following operation it gives me an error:
D:\public_html\webinator\bin>gw -d- -st "addtable ..\help\html.tbl"
Sorry, you'll have to purchase Texis in order to perform that operation
See: http://www.thunderstone.com/webinator/ for details
Answer 1: Adding the tables does not wipe the database: You just attach the foreign table and then issue the SQL "insert into A select * from B" .
I assume that the reason you want to merge is to have a single search across several DBs. The other way to do this is with a meta-search merge; if you issue a <fetch parallel $webinator_engine_url_list> you'll be able to do the same thing without the complexity of the DB merge.
Answer 2: Give Sales a call , or leave them a question on the contact form. (PS: using a non-hotmail email address there would add a little credulity to your request.)
You should use the -l option to addtable to give the table a different name because you probably already have a table called html in your big database. The SQL syntax for copying the data from one table to the other would be:
insert into html select * from smallhtml
Then you need to update the index just as you would at the end of a walk. (gw -index)
Sorry, I just realised this post is in 'old' webinator whereas I am running the new one. I guess instructions will be the same using dowalk/reindex.txt? (or remakeindex?)
Right, just run the appropriate "update index" procedure for your version. For version 4 it would be dowalk/reindex.txt . That will drop and remake the index. Since you're inserting into a large pre-indexed dataset it may be more efficient to just update the index rather than remake it. You can update it with something like:
texis -d c:\path\bigdatabase -s "create metamorph inverted index xhtmlbodv on html(Title\Description\Keywords\Meta\Body,Visited)"