xml parser

jlin · Post by **jlin** » Fri Apr 27, 2001 4:18 pm

Hi,
Can <timport> import and parse XML files?
Thanks.

Post by **mark** » Fri Apr 27, 2001 4:40 pm

Yes, see http://thunderstone.master.com/texis/ma ... 3aa6845821

jlin · Post by **jlin** » Mon Apr 30, 2001 10:38 am

Mark,
Does it require a database or a table to be listed in the schema to parse XML with timport?
I'll need to parse the XML document to get the data as well as the action to be taken with the record (add, delete or update), how should i define the schema to do this?
Thanks.

Post by **mark** » Mon Apr 30, 2001 11:06 am

Database and table are not required in vortex <timport> since it's up to you what to do with the parsed data. Here's a summary of the discussion mentioned previously.

Given the data:

<DATASET>
<RECORD>
<TEXT>
here's some text
</TEXT>
<USER name='John Smith'>
</USER>
</RECORD>
</DATASET>

This is the schema:

xml
trimspace
field text varchar DATASET/RECORD/TEXT ''
field user varchar DATASET/RECORD/USER@name ''

jlin · Post by **jlin** » Mon Apr 30, 2001 11:57 am

Mark,
it is working now, thanks! BTW, what does 'trimspace' do? it seems that the script doesn't work without it.

Post by **mark** » Mon Apr 30, 2001 12:47 pm

http://www.thunderstone.com/site/texisman/node320.html

"trimspace indicates that leading and trailing whitespace should be trimmed from character fields."

It should work with or without it. It's just a matter of whether leading and trailing spaces are removed from the text fields or not. Maybe you're getting some extra space that is making you think it's not working.

jlin · Post by **jlin** » Tue May 01, 2001 10:51 am

I was using the XML parser yesterday and had some problems.

Here is my data file (output.txt):

<opt venueID="747" cost="$$$" name="55 Wall" description="In a landmark building, bulls and bears consume elegant, eclectic fare. Breakfast, lunch, dinner daily">
<restaurant cuisine="New American" payment="AE, DC, MC, V" />
<address neighborhood="Financial District" crossstreet="(btw. Hanover & William Sts.)" phone="212- 699-5555" street="55 Wall St." />
</opt>

And here is my vortex script:

<SCRIPT LANGUAGE=vortex>
<A NAME=main PUBLIC>
<$sch="
xml
trimspace
field ID varchar opt@venueID ''
field Cost varchar opt@cost ''
field Name varchar opt@name ''
field Description varchar opt@description ''
field Cuisine varchar opt/restaurant@cuisine ''
field Payment varchar opt/restaurant@payment ''
field Neighborhood varchar opt/address@neighborhood ''
field CS varchar opt/address@crossstreet ''
field Phone varchar opt/address@phone ''
field Street varchar0 opt/address@street ''
">

<read output.txt><$out=$ret>
<timport $sch $out>
<fmt $ID><fmt $Cost><fmt $Name><fmt Description>
<fmt $Cuisine><fmt $Payment>
<fmt $Neighborhood><fmt $CS><fmt $Phone><fmt0 $Street>
</timport>
</A>
</SCRIPT>

The odd thing is it didn't parse the <address> tag (the rest get printed out). But after I changed the name of the tag to <add>, it worked. Is <address> tag predefined (or something like that, if so, any other predefined names?) and is there any level of limitation of the XML parser?

Thanks.

Post by **mark** » Tue May 01, 2001 11:06 am

By default the xml parser attempts to allow HTML as "data" within the xml. So tags that are the same name as standard HTML elements are treated as text instead of tags. You can disable this by adding the "nohtml" option to xml.

xml nohtml
trimspace
...

The current list of recognized HTML tags is
"A", "ADDRESS", "APP", "APPLET", "AREA", "B", "BASE", "BASEFONT", "BLOCKQUOTE", "BODY", "BR", "CDATA", "CENTER", "CITE", "CODE", "COMMENT", "DD", "DIR", "DL", "DOCTYPE", "DT", "EM", "FONT", "FORM", "FRAME", "FRAMESET", "H1", "H2", "H3", "H4", "H5", "H6", "HEAD", "HR", "HTML", "I", "IMG", "INPUT", "ISINDEX", "KBD", "LI", "LINK", "LISTING", "MAP", "MENU", "META", "NEXTID", "NOBR", "OL", "OPTION", "P", "PARAM", "PLAINTEXT", "PRE", "SAMP", "SCRIPT", "SELECT", "STRIKE", "STRONG", "TABLE", "TD", "TEXTAREA", "TH", "TITLE", "TR", "TT", "UL", "VAR", "WBR", "XMP"

jlin · Post by **jlin** » Wed May 09, 2001 4:23 pm

Hi,
If I have a database with the following schema:

Field Type
----- ----
Id int
user varchar
text text

this is the data I want to import in XML format:

<DATASET>
<RECORD id="111">
<TEXT>
here's some text
</TEXT>
<USER name='John Smith'>
</USER>
</RECORD>
</DATASET>

and this is the XML schema:

xml
trimspace
field id int DATASET/RECORD@id ''
field user varchar DATASET/RECORD/USER@name ''
field text text DATASET/RECORD/TEXT ''

$id is an integer, and $text is of text type instead of varchar according to their corresponding field type in the database. Can I do that? It seems that it didn't work when I tried that. Does the XML schema allow for 'int' and 'text' type? Or maybe the XML data needs to be modified in order for the schema to read 'id' in as an integer?
Thanks.

Post by **mark** » Wed May 09, 2001 5:01 pm

There is no "text" type, use "varchar". "user" is a reserved word in SQL. Use something like "User" for the field name instead. We generally suggest using Uppercase first letter for field names to avoid keyword conflicts.

Also, timport will automatically create a field called id of type counter unless you use the "noid" keyword in the schema.