Shared databases

barry.marcus
Posts: 288
Joined: Thu Nov 16, 2006 1:05 pm

Shared databases

Post by barry.marcus »

We would like to structure our Texis databases in such a way that one of the tables, which contains essentially static data, could be shared by databases on multiple servers. Because of the huge size of the table, our idea is to place the .tbl file itself on a network-addressable drive, and use addtable to add it as a table to each of the databases on the network. In this scheme, only this table would be shared. That is, *none* of the other tables in any of of the databases, nor any of the indexes, would be shared. We're thinking that even the indexes for the shared table would be unshared. One requirement we have is that, while the data in the table will never be modified (it contains patent data records), records will be periodically added to the table. It will be important at those times for all of the indexes on the table (there are nearly twenty) to be maintained.

I tried setting up a simple scheme involving two servers and the network-addressable drive, for which a mapped network drive letter was assigned. (This is a Windows server network.) I put a .tbl file on the n/w drive, and was able to use addtable to add it to a database on each of the two servers. I was also able to read, (via sql select) from the table from both dbs. I also created an inverted metamorph index on the table from each database. So far, no problem and all went well, but there's an issue.

The issue is that when I insert a record into the table from the database on one server, the index for the table updates on that server, but not on the other server. I tried using chkind on the second server but that did not seem to work. Of course it was a trivial matter to recreate the index on the second server in this test scenario. But we are *loathe* to recreate all of the indexes on the table in every database every time we add data to the production table. Eventually, data will be added weekly. As I said, the table is huge and most of the indexes each take many hours to create. And also as I said, there are nearly twenty indexes. The alternative, which is the scheme we are currently using, is for each database to have its own patent data table as well as its own indexes. This seems like a huge waste of disk space, and we'd also like to streamline the (cumbersome) process of loading the weekly data into a single, shared table. Moreover, the shared scheme (assuming the indexes are maintained) seems to eliminate any possible inconsistency in the data from one server to the next.

Is the shared scheme we are envisioning a fantasy? Is there a better way to accomplish what we're looking for? Your guidance and insight would be appreciated.

Thanks.
User avatar
John
Site Admin
Posts: 2597
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

Shared databases

Post by John »

There are a few options which might work, depending on the exact needs you have and types of queries you are doing, as well as if a brief (few seconds) downtime is acceptable while making the updates available.

It might be worth going into more detail by phone or email to make sure you are on the right path.
John Turnbull
Thunderstone Software
User avatar
mark
Site Admin
Posts: 5513
Joined: Tue Apr 25, 2000 6:56 pm

Shared databases

Post by mark »

The main thing is that to share the table it must be 100% read only. The other servers have no idea what changes any other is making. Any change on one system will scramble the brains of any others and possibly cause the table to be corrupted.

One approach could be to maintain the live updating table and indices in one place. Then periodically freeze updates and copy the table and indices to the shared place for read only usage. You'd want to copy to an alternate name/location then rename the files after the copy to keep the search online during the copy.

If you really need to maintain only one copy of the table you would have to freeze searches during any table or index updates. The indices must be shared as well. And you may have to figure some way to make the searchers flush their system I/O cache of the shared table and indices before resuming searches.

Alternative scenarios could have only one server doing updates and searches on the big table and the others hitting it via a network API of some sort such as the Texis C API or HTTP and vortex with XML or such.
barry.marcus
Posts: 288
Joined: Thu Nov 16, 2006 1:05 pm

Shared databases

Post by barry.marcus »

OK. I've set up a test table in my database, the physical file of which resides on a network addressable drive. The mapped drive letter for that drive is Z:, and the file resides at Z:\PTOData\PATN_TEST in this scenario. I used the following addtable statement to add the table to the database

addtable -d D:\Crosshairs\database1 Z:\PTOData\PATN_TEST.tbl

That worked fine and when I issue a SELECT statement against the table from the TSQL prompt I get the correct results. For example, there is no problem with

select count(*) from PATN_TEST

when this is issued at the TSQL prompt or passed to tsql.exe. However, my Vortex app fails to see the table, and I get errors such as:

[Unable to open table z:\PTOData\PATN_TEST in the function opendbtbl]
[No such table: PATN_TEST in the database: D:\Crosshairs\database1\]
[SQLPrepare() failed with -1 in the function prepntexis]

Probably a simple fix, but I'm at a loss.

Thanks for your help.
User avatar
John
Site Admin
Posts: 2597
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

Shared databases

Post by John »

Typically the user the web server runs as does not have permission to network shares, and may not even know about the drive mapping.

Using addtable is probably not the way to go at all, as you run into index issues as you noted initially. If possible using a separate database on the network share makes more sense than trying to add the table into a local database.
John Turnbull
Thunderstone Software
barry.marcus
Posts: 288
Joined: Thu Nov 16, 2006 1:05 pm

Shared databases

Post by barry.marcus »

Two questions...

1) If the web server does not typically have permission to network shares, why would it have permission to access a database on the network drive, let alone a single table?

2) As this is more or less a legacy application for which we want to minimal rearchitecturing, and which we want to get up and running with network-addressable data as quickly as possible (ideally within the next few hours), can't we provide the web server user with the proper permissions to "see" the network drive? It seems that in this scenario we can deal with the indexing issues by manually copying the index files to each server following any modification to the shared data. This seems to work (that is, using TSQL), and with the proper procedure, strictly followed, seems safe enough for us to use.
User avatar
John
Site Admin
Posts: 2597
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

Shared databases

Post by John »

Is the app that is able to access the database using a mapped drive (z:\) or as the UNC path \\server\share\?
John Turnbull
Thunderstone Software
User avatar
mark
Site Admin
Posts: 5513
Joined: Tue Apr 25, 2000 6:56 pm

Shared databases

Post by mark »

Did some testing here.

Use UNC paths as drive letter mappings may not be present. Ensure that both the share and the filesystem allow access by the webserver application pool user.
barry.marcus
Posts: 288
Joined: Thu Nov 16, 2006 1:05 pm

Shared databases

Post by barry.marcus »

Here is the latest on this issue... I copied the large table file (PATN.tbl) to the network drive, then updated the record in SYSTABLES for that table to "point" to it. Neither the UNC (\\server\share\) nor mapped drive letter in the path for the value of SYSTABLES.WHAT seemed to work. But using the internal IP address and sharename DID work. That is, we have SUCCESS when the value of SYSTABLES.WHAT is updated to:

\\192.168.1.11\<mysharename>\<pathtofile>\PATN.tbl

The Vortex code DOES work properly in this case, and recognizes the table. Moreover, when we put the same path in the SYSTABLES record in the databases on the other servers, the Vortex code on those servers runs properly. Everyone seems to be sharing very nicely, and we are very pleased with that! (Thank you John and Mark for your help on this!)

I now have a general question about indexes and the underlying data for them. I ran a few rough benchmark tests to see if moving the data from a local drive to a network drive would have any effect on the speed of our searches. (Bear in mind that the indexes for this table have not moved... They are, and will remain, on the local server.) I found that when the .tbl file for the table resides on the network drive, our searches, which strictly involve index fields only, take roughly 25% longer than when the .tbl file resides locally (an entire set of searches took 43 minutes when the data was on the network drive vs 35 minutes when it was local to the server.) This confuses me. I was under the impression that once a field of a table is indexed, searches of that field use the index and not the table. So I'm wondering why the location of the .tbl file would make a difference. Just FYI, here are the commands we used to build a typical index of ours:

DROP INDEX MyIndex;
set delexp=0;
set addexp = '\alnum{1,99}';
set keepnoise=1;
set indexspace='D:\Crosshairs\index1';
set indexmeter=1;
create METAMORPH INVERTED index MyIndex on PATN(ABST_PAx,PATN_ISD);

The searches of the field ABST_PAx, and the results are sorted by PATN_ISD. That is, they are all of the form:

SELECT PATN_WKU FROM PATN WHERE ABST_PAx LIKE '<our search>' ORDER BY PATN_ISD;

My initial guess is that the slow down is attributable to the need to retrieve the value of PATN_WKU, which might take longer if the data is remote. But there are so few records per search that hit that it's hard to imagine those few retrievals taking that much longer from the network drive as opposed to the local drive.

We can live with the slow down given the advantage of having shared data. But if there is something we are doing incorrectly vis-a-vis our indexing, or anything that we can do to get the same speed as before, that would be great.

Thanks for your continuing help.
User avatar
John
Site Admin
Posts: 2597
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

Shared databases

Post by John »

That looks correct. I'm not sure how many results you actually have to look up, but for every record it reads from the table it needs to use the network, so you need to add the network latency on top of the typical disk access latency, and it is likely that there will be less benefit from caching as the client and server need to communicate to make sure the file hasn't changed.

The one thing that would help most is putting any large fields in BLOBs, so that they don't need to be read from disk when accessing PATN_WKU.
John Turnbull
Thunderstone Software
Post Reply