Page 1 of 1

problem combining multiple runs

Posted: Sun Feb 22, 1998 10:21 pm
by Thunderstone
I've discovered that it takes over 2 days for the Webinator server to
scan all sites that comprise our state-wide presence even with the delay
set to -w0. Is there any way to do multiple smaller runs and then use
the database language to combine the results into a single database?

Thanks
Dave

problem combining multiple runs

Posted: Mon Feb 23, 1998 3:10 pm
by Thunderstone



If it's acceptable to hit the web servers faster, you can run
multiple gw's on the same database at the same time to speed
up walking. try 2 to 5 copies. too many will stop helping
as you saturate your network connection.

You would need the facilities of full Texis to merge data from multiple
walk databases.



problem combining multiple runs

Posted: Tue Apr 10, 2001 4:58 pm
by WI-User
Are there any sample scripts to merge multiple databases into one ? I guess this must be a standard script.

I have databases as follows:

..\public_html\webinator\db
..\public_html\webinator\help
..\public_html\webinator\news

I would like to merge the databases "help" and "news" into db.

Appreciate any help..

problem combining multiple runs

Posted: Tue Apr 10, 2001 6:19 pm
by mark
There's no standard script. It would involve using "addtable" to add the other html and refs table to the "main" database. Then inserting all of the data from the added tables into the main html and refs tables.

BTW, please don't cross post.

problem combining multiple runs

Posted: Tue Apr 10, 2001 7:22 pm
by WI-User
Couple of questions:

1. If I add the tables (html and refs) of "help" and "news" to the main database "db", wouldn't it wipe of the original tables of main database ?

2. Besides when I do the following operation it gives me an error:

D:\public_html\webinator\bin>gw -d- -st "addtable ..\help\html.tbl"
Sorry, you'll have to purchase Texis in order to perform that operation
See: http://www.thunderstone.com/webinator/ for details

How much would Texis cost ?

problem combining multiple runs

Posted: Tue Apr 10, 2001 9:20 pm
by bart
Answer 1: Adding the tables does not wipe the database: You just attach the foreign table and then issue the SQL "insert into A select * from B" .

I assume that the reason you want to merge is to have a single search across several DBs. The other way to do this is with a meta-search merge; if you issue a <fetch parallel $webinator_engine_url_list> you'll be able to do the same thing without the complexity of the DB merge.

Answer 2: Give Sales a call , or leave them a question on the contact form. (PS: using a non-hotmail email address there would add a little credulity to your request.)

problem combining multiple runs

Posted: Wed Feb 13, 2002 9:04 am
by b.sims
So if I wanted to combine a small webinator run into a smaller run, I would do:

addtable -d c:\path\bigdatabase c:\path\smalldatabase\db1\html.tbl

and then for the refs table.

Followed by:

texis -d c:\path\bigdatabase "insert * from html.tbl into c:\path\bigdatabase\db1\html.tbl"


Will this do the job of combining? Is there any reindex needed to make this operational or will Texis take care of it automatically?

-------------------------------------------

problem combining multiple runs

Posted: Wed Feb 13, 2002 10:37 am
by mark
You should use the -l option to addtable to give the table a different name because you probably already have a table called html in your big database. The SQL syntax for copying the data from one table to the other would be:
insert into html select * from smallhtml

Then you need to update the index just as you would at the end of a walk. (gw -index)

problem combining multiple runs

Posted: Wed Feb 13, 2002 10:55 am
by b.sims
Sorry, I just realised this post is in 'old' webinator whereas I am running the new one. I guess instructions will be the same using dowalk/reindex.txt? (or remakeindex?)

problem combining multiple runs

Posted: Wed Feb 13, 2002 11:15 am
by mark
Right, just run the appropriate "update index" procedure for your version. For version 4 it would be dowalk/reindex.txt . That will drop and remake the index. Since you're inserting into a large pre-indexed dataset it may be more efficient to just update the index rather than remake it. You can update it with something like:
texis -d c:\path\bigdatabase -s "create metamorph inverted index xhtmlbodv on html(Title\Description\Keywords\Meta\Body,Visited)"