problem combining multiple runs

Post Reply
User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

problem combining multiple runs

Post by Thunderstone »

I've discovered that it takes over 2 days for the Webinator server to
scan all sites that comprise our state-wide presence even with the delay
set to -w0. Is there any way to do multiple smaller runs and then use
the database language to combine the results into a single database?

Thanks
Dave
User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

problem combining multiple runs

Post by Thunderstone »




If it's acceptable to hit the web servers faster, you can run
multiple gw's on the same database at the same time to speed
up walking. try 2 to 5 copies. too many will stop helping
as you saturate your network connection.

You would need the facilities of full Texis to merge data from multiple
walk databases.


WI-User
Posts: 4
Joined: Tue Apr 10, 2001 4:19 pm

problem combining multiple runs

Post by WI-User »

Are there any sample scripts to merge multiple databases into one ? I guess this must be a standard script.

I have databases as follows:

..\public_html\webinator\db
..\public_html\webinator\help
..\public_html\webinator\news

I would like to merge the databases "help" and "news" into db.

Appreciate any help..
User avatar
mark
Site Admin
Posts: 5513
Joined: Tue Apr 25, 2000 6:56 pm

problem combining multiple runs

Post by mark »

There's no standard script. It would involve using "addtable" to add the other html and refs table to the "main" database. Then inserting all of the data from the added tables into the main html and refs tables.

BTW, please don't cross post.
WI-User
Posts: 4
Joined: Tue Apr 10, 2001 4:19 pm

problem combining multiple runs

Post by WI-User »

Couple of questions:

1. If I add the tables (html and refs) of "help" and "news" to the main database "db", wouldn't it wipe of the original tables of main database ?

2. Besides when I do the following operation it gives me an error:

D:\public_html\webinator\bin>gw -d- -st "addtable ..\help\html.tbl"
Sorry, you'll have to purchase Texis in order to perform that operation
See: http://www.thunderstone.com/webinator/ for details

How much would Texis cost ?
bart
Posts: 251
Joined: Wed Apr 26, 2000 12:42 am

problem combining multiple runs

Post by bart »

Answer 1: Adding the tables does not wipe the database: You just attach the foreign table and then issue the SQL "insert into A select * from B" .

I assume that the reason you want to merge is to have a single search across several DBs. The other way to do this is with a meta-search merge; if you issue a <fetch parallel $webinator_engine_url_list> you'll be able to do the same thing without the complexity of the DB merge.

Answer 2: Give Sales a call , or leave them a question on the contact form. (PS: using a non-hotmail email address there would add a little credulity to your request.)
b.sims
Posts: 99
Joined: Fri Oct 26, 2001 10:40 am

problem combining multiple runs

Post by b.sims »

So if I wanted to combine a small webinator run into a smaller run, I would do:

addtable -d c:\path\bigdatabase c:\path\smalldatabase\db1\html.tbl

and then for the refs table.

Followed by:

texis -d c:\path\bigdatabase "insert * from html.tbl into c:\path\bigdatabase\db1\html.tbl"


Will this do the job of combining? Is there any reindex needed to make this operational or will Texis take care of it automatically?

-------------------------------------------
User avatar
mark
Site Admin
Posts: 5513
Joined: Tue Apr 25, 2000 6:56 pm

problem combining multiple runs

Post by mark »

You should use the -l option to addtable to give the table a different name because you probably already have a table called html in your big database. The SQL syntax for copying the data from one table to the other would be:
insert into html select * from smallhtml

Then you need to update the index just as you would at the end of a walk. (gw -index)
b.sims
Posts: 99
Joined: Fri Oct 26, 2001 10:40 am

problem combining multiple runs

Post by b.sims »

Sorry, I just realised this post is in 'old' webinator whereas I am running the new one. I guess instructions will be the same using dowalk/reindex.txt? (or remakeindex?)
User avatar
mark
Site Admin
Posts: 5513
Joined: Tue Apr 25, 2000 6:56 pm

problem combining multiple runs

Post by mark »

Right, just run the appropriate "update index" procedure for your version. For version 4 it would be dowalk/reindex.txt . That will drop and remake the index. Since you're inserting into a large pre-indexed dataset it may be more efficient to just update the index rather than remake it. You can update it with something like:
texis -d c:\path\bigdatabase -s "create metamorph inverted index xhtmlbodv on html(Title\Description\Keywords\Meta\Body,Visited)"
Post Reply