Indexes on large data

skalyanaraman · Post by **skalyanaraman** » Tue Jul 16, 2002 2:41 pm

hi, Thanks for the help with the indexing problem. It seems to be working now(I will wait to see what happens). But, looking at the amount of data and the maintanence of it we are are trying to find out other solutions here to alleviate the time it takes to index. It takes 24+hrs to index the 41.9GB blb + 2.3GB TBl file.

We were exploring other ideas and we had the following questions:

1. Could we somehow, run multiple indexes from different machine looking at the same database? would there be a locking or contention on the SYSINDEX or other sys tables?

2. can we use the indexspace set to create index on other drives and ultimately copy the index to the database and change the sysindex somehow?

3. I know that Thunderstone is working on a solution where the database could be partitioned across multiple machines physically but logically looks like a single database. What is the status on this?

4. Could we ourselves split the big table into smaller pieces and index separately and somehow merge them all to become the index for the whole table?

5. How do other Texis users get around this problem of long index times on large data?

Any help or pointers would be helpful!!

Thanks

Post by **John** » Tue Jul 16, 2002 2:52 pm

1. If you have multiple CPUs you could create several indexes at the same time on the same machine. Using several machines to index the same table or the same database is not supported.

2. Yes, you could. In addition you can specify directories to be used for temporary files, which are likely to be much of the IO.

3. That is still under development.

4. No.

5. The key is to not reindex the whole dataset frequently, but just do incremental index updates when data changes.

skalyanaraman · Post by **skalyanaraman** » Tue Jul 16, 2002 2:59 pm

Thanks.
1. So, for multi CPU box we do not need any special code right. just run multiple indexing commands at the same time? or we can somehow pin the indexes to a particular CPU.

2. In this case, we can just index the data on separate machines and copy to the main db later and just run an update statement on the SYSINDEX? What fields should we be changing on the SYSINDEX table? any other table need to be changed? How do I set the temp file directories? why would that improve performance?

5. But, if a table already has the indexes, the insert into that table seems slow as the indexes are being touched for every insert or delete right? I think we can set the index not to be touched during this processes,can we? That is why we drop the index, insert the data and reindex the whole thing. is it a bad idea?

thanks

Post by **Kai** » Wed Jul 17, 2002 10:20 am

When creating multiple indexes simultaneously be sure to reduce indexmem for each index so the machine doesn't thrash. The default is 40%; the sum total of indexmem for all simultaneous indexes shouldn't exceed, say 50-60% to leave room for disk caching etc. (Reducing indexmem does not generally have any adverse affect on creation speed.)

Post by **mark** » Wed Jul 17, 2002 10:55 am

2. there's no need to move anything around. specify "set indexspace='somedir'" before creating an index and that index will be created there and texis will always expect it to be there. having indices and temp files in various disks can improve performance by reducing disk head seek times.

5. partial indexing is generally much faster than full indexing if you're not updating most of the database. the exact value of "most" will depend on your setup.

Post by **John** » Wed Jul 17, 2002 12:07 pm

If you have copies of the database on several machines you could create different indexes on different machines. You would need to copy the files, and add the entries to SYSINDEX. The FNAME field contains the filename.

set addindextmp is used to add an additional temporary directory. Balancing the IO across disks can improve performance.

Inserts and updates will be slower if the table already has indexes. The metamorph indexes are designed to do a minimum amount of work during the updates, and then allow a separate process to complete the work. You can not prevent the index from being touch during updates, as that would produce invalid indexes. It's probably worth comparing the time to insert the data and update the index to dropping the index, inserting the data, and rebuilding the index.

skalyanaraman · Post by **skalyanaraman** » Thu Jul 18, 2002 2:42 pm

After reading the above answers and relating it to what we do, I still have these questions:

1. To run simultaneous indexes on the same machine(with multiprocessors), except for indexmem's, do we have to use any special set commands to force the index to use different processors?

2. if I set up the temp directory to be on different set of drives, maybe a local set of drives on the box, how will it improve performance? Such a broad question, but still curious.

Thanks!!

Post by **mark** » Thu Jul 18, 2002 3:10 pm

1. No. It's up to the OS to distribute the processing load amongst the processors.

2. it can read/write tmp files without causing head motion on the drives with the table or resulting index. ideally the table would be on drive, the temp on another, and the index on a third.