Indexes on large data

Post Reply
skalyanaraman
Posts: 109
Joined: Tue May 29, 2001 9:13 pm

Indexes on large data

Post by skalyanaraman »

hi, Thanks for the help with the indexing problem. It seems to be working now(I will wait to see what happens). But, looking at the amount of data and the maintanence of it we are are trying to find out other solutions here to alleviate the time it takes to index. It takes 24+hrs to index the 41.9GB blb + 2.3GB TBl file.

We were exploring other ideas and we had the following questions:

1. Could we somehow, run multiple indexes from different machine looking at the same database? would there be a locking or contention on the SYSINDEX or other sys tables?

2. can we use the indexspace set to create index on other drives and ultimately copy the index to the database and change the sysindex somehow?

3. I know that Thunderstone is working on a solution where the database could be partitioned across multiple machines physically but logically looks like a single database. What is the status on this?

4. Could we ourselves split the big table into smaller pieces and index separately and somehow merge them all to become the index for the whole table?

5. How do other Texis users get around this problem of long index times on large data?

Any help or pointers would be helpful!!

Thanks
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

Indexes on large data

Post by John »

1. If you have multiple CPUs you could create several indexes at the same time on the same machine. Using several machines to index the same table or the same database is not supported.

2. Yes, you could. In addition you can specify directories to be used for temporary files, which are likely to be much of the IO.

3. That is still under development.

4. No.

5. The key is to not reindex the whole dataset frequently, but just do incremental index updates when data changes.
John Turnbull
Thunderstone Software
skalyanaraman
Posts: 109
Joined: Tue May 29, 2001 9:13 pm

Indexes on large data

Post by skalyanaraman »

Thanks.
1. So, for multi CPU box we do not need any special code right. just run multiple indexing commands at the same time? or we can somehow pin the indexes to a particular CPU.

2. In this case, we can just index the data on separate machines and copy to the main db later and just run an update statement on the SYSINDEX? What fields should we be changing on the SYSINDEX table? any other table need to be changed? How do I set the temp file directories? why would that improve performance?

5. But, if a table already has the indexes, the insert into that table seems slow as the indexes are being touched for every insert or delete right? I think we can set the index not to be touched during this processes,can we? That is why we drop the index, insert the data and reindex the whole thing. is it a bad idea?

thanks
User avatar
Kai
Site Admin
Posts: 1272
Joined: Tue Apr 25, 2000 1:27 pm

Indexes on large data

Post by Kai »

When creating multiple indexes simultaneously be sure to reduce indexmem for each index so the machine doesn't thrash. The default is 40%; the sum total of indexmem for all simultaneous indexes shouldn't exceed, say 50-60% to leave room for disk caching etc. (Reducing indexmem does not generally have any adverse affect on creation speed.)
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Indexes on large data

Post by mark »

2. there's no need to move anything around. specify "set indexspace='somedir'" before creating an index and that index will be created there and texis will always expect it to be there. having indices and temp files in various disks can improve performance by reducing disk head seek times.

5. partial indexing is generally much faster than full indexing if you're not updating most of the database. the exact value of "most" will depend on your setup.
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

Indexes on large data

Post by John »

If you have copies of the database on several machines you could create different indexes on different machines. You would need to copy the files, and add the entries to SYSINDEX. The FNAME field contains the filename.

set addindextmp is used to add an additional temporary directory. Balancing the IO across disks can improve performance.

Inserts and updates will be slower if the table already has indexes. The metamorph indexes are designed to do a minimum amount of work during the updates, and then allow a separate process to complete the work. You can not prevent the index from being touch during updates, as that would produce invalid indexes. It's probably worth comparing the time to insert the data and update the index to dropping the index, inserting the data, and rebuilding the index.
John Turnbull
Thunderstone Software
skalyanaraman
Posts: 109
Joined: Tue May 29, 2001 9:13 pm

Indexes on large data

Post by skalyanaraman »

After reading the above answers and relating it to what we do, I still have these questions:

1. To run simultaneous indexes on the same machine(with multiprocessors), except for indexmem's, do we have to use any special set commands to force the index to use different processors?

2. if I set up the temp directory to be on different set of drives, maybe a local set of drives on the box, how will it improve performance? Such a broad question, but still curious.

Thanks!!
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Indexes on large data

Post by mark »

1. No. It's up to the OS to distribute the processing load amongst the processors.

2. it can read/write tmp files without causing head motion on the drives with the table or resulting index. ideally the table would be on drive, the temp on another, and the index on a third.
Post Reply