TIMPORT - Just show me how

barry.marcus · Post by **barry.marcus** » Mon Jan 29, 2007 10:07 pm

OK. I can't seem to figure out how to set up a TIMPORT schema file for my data, nor can I determine from the documentation how my data should be delimited in the data files that TIMPORT will read. But since I can be very flexible in how the data is to appear in the data files, I'm just going to describe my table and my situation and let someone guide me (or better yet, just show me).

I've got a table of 90 varchar columns. Only 20 are used, but the unused columns must remain in the table for purposes not relevant here. (That is, I cannot redefine the table so that it does not include the unused columns. They will just have to remain empty.) All of the columns that are used contain data that is arbitrarily long. The data can be in excess of 100K bytes per column per record. (FYI, each record of the table contains the contents of a single text document, where each column contains a section of that document.) The data has double quotes, single quotes, linefeeds, URLS, etc., which must *all* be preserved. I *cannot* alter the data in any way (e.g., by escaping or doubling all of double-quotes, etc.) There is simply too much data to do that (potentially in excess of 100GB!)

For the sake of this example, assume that the names of the columns in the table are MYFLD1, MYFLD2, MYFLD3, ..., MYFLD89, and MYFLD90. Also assume that the table is named MYTABLE.

One requirement: The data files *must* contain multiple records. There will typically be 300-500 records per file, so the data files will be fairly large.

What should my schema file look like? What should my delimiters look like? (It would be nice if the field and record delimiters were on lines by themselves, since that makes the data files easier to read using a simple text editor. But that is not a requirement.) What should the layout of data files look like? As I said, I can use any type of schema file, and I I can lay the data and it's delimiters out anyway that will work. Really, the only requirement is that the data between the delimiters must look exactly like it does in the section of the document from which it was retrieved. I just need something that *will* work.

In my mind the layout of the data files should be as simple as:
------------------------
MYFLD1DELIMITER:
...arbitrarily
long
and spaced MYFLD1 data with "quoted text"...

MYFLD15DELIMITER:
...arbitrarily long and
spaced
MYFLD15 data with more "quoted text"...

MYFLD35DELIMITER:
..arbitrarily

long

and spaced MYFLD35

data...
-----------------

And so on, repeating for record after record. (Notice how the data can and will span many mulitple lines per column per record.)

I just can't figure it out. Any help would be appreciated.

Barry

barry.marcus · Post by **barry.marcus** » Tue Jan 30, 2007 9:47 am

Thanks in advance.

Barry

Post by **John** » Tue Jan 30, 2007 10:01 am

Are there any characters or strings you know you can use as delimiters because they won't occur in the text? That is the general problem with importing multiple records and fields is you need to be able to identify the end of the column and record.

You may be better off writing a slightly more custom importer using either Vortex or in C based off of the loader example.

barry.marcus · Post by **barry.marcus** » Tue Jan 30, 2007 10:14 am

Can delimiters be any length? If so, then sure... I absolutely can choose delimiters that I'm certain will never occur as part of the regular text. Again, for the sake of the example, let's assume that these delimiter strings are "MYFLD1DELIMITER", "MYFLD2DELIMITER", "MYFLD3DELIMITER", etc., for each field I'm using. How should things be set up in that case?

Thanks

Post by **John** » Tue Jan 30, 2007 10:48 am

If you create a file as:

MYDELIMETERFLD1
Field 1 data
MYDELIMITERFLD8
Field 8 data
MYDELIMITERRECORD

You can have a recdelim of MYDELIMITERRECORD and then a field definition that looks like:

field MYFLD1 varchar /MYDELIMITERFLD1\P=!MYDELIMITER* ''

barry.marcus · Post by **barry.marcus** » Tue Jan 30, 2007 11:21 am

I set up the following test files, but something still is not right.

test.data
---------

MYDELIMETERFLD1
john
Here is "data"
This is "additional" data
MYDELIMETERFLD3
smith
MYDELIMITERRECORD
MYDELIMETERFLD1
sue
MYDELIMETERFLD3
adams
More data
MYDELIMITERRECORD
MYDELIMETERFLD1
bill
blah blah
MYDELIMETERFLD3
jones
MYDELIMITERRECORD
MYDELIMETERFLD1
ed
MYDELIMETERFLD3
wilson

test.schema
-----------

database /usr/local/morph3/texis/camp
table TEST
recdelim MYDELIMETERRECORD
createtable true
keepfirst

field FLD1 varchar /MYDELIMETERFLD1\P=!MYDELIMETERFLD1* ''
field FLD2 varchar -
field FLD3 varchar /MYDELIMETERFLD3\P=!MYDELIMETERFLD3* ''
field FLD4 varchar -

My command is this:

timport -s test.schema -v test.data

Post by **mark** » Tue Jan 30, 2007 11:47 am

Some problems with consistency in delimiter naming and the expressions. Try this

Data
-----------------------
MYDELIMITERFLD1
john
Here is "data"
This is "additional" data
MYDELIMITERFLD3
smith
MYDELIMITERRECORD
MYDELIMITERFLD1
sue
MYDELIMITERFLD3
adams
More data
MYDELIMITERRECORD
MYDELIMITERFLD1
bill
blah blah
MYDELIMITERFLD3
jones
MYDELIMITERRECORD
MYDELIMITERFLD1
ed
MYDELIMITERFLD3
wilson
MYDELIMITERRECORD

Schema
-------------
database /usr/local/morph3/texis/camp
table TEST
recdelim MYDELIMITERRECORD

field FLD1 varchar />>MYDELIMITERFLD1\P=!MYDELIMITER* ''
field FLD2 varchar - ''
field FLD3 varchar />>MYDELIMITERFLD3\P=!MYDELIMITER* ''
field FLD4 varchar - ''

barry.marcus · Post by **barry.marcus** » Tue Jan 30, 2007 12:27 pm

YES!! THANK YOU!!! This is working fine!

I also see that I had some typos that were messing things up, too. Argh!

Thanks again.

Barry