xml parser

jlin
Posts: 27
Joined: Fri Apr 06, 2001 4:03 pm

xml parser

Post by jlin »

Hi,
Can <timport> import and parse XML files?
Thanks.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

xml parser

Post by mark »

jlin
Posts: 27
Joined: Fri Apr 06, 2001 4:03 pm

xml parser

Post by jlin »

Mark,
Does it require a database or a table to be listed in the schema to parse XML with timport?
I'll need to parse the XML document to get the data as well as the action to be taken with the record (add, delete or update), how should i define the schema to do this?
Thanks.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

xml parser

Post by mark »

Database and table are not required in vortex <timport> since it's up to you what to do with the parsed data. Here's a summary of the discussion mentioned previously.

Given the data:

<DATASET>
<RECORD>
<TEXT>
here's some text
</TEXT>
<USER name='John Smith'>
</USER>
</RECORD>
</DATASET>

This is the schema:

xml
trimspace
field text varchar DATASET/RECORD/TEXT ''
field user varchar DATASET/RECORD/USER@name ''
jlin
Posts: 27
Joined: Fri Apr 06, 2001 4:03 pm

xml parser

Post by jlin »

Mark,
it is working now, thanks! BTW, what does 'trimspace' do? it seems that the script doesn't work without it.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

xml parser

Post by mark »

http://www.thunderstone.com/site/texisman/node320.html

"trimspace indicates that leading and trailing whitespace should be trimmed from character fields."

It should work with or without it. It's just a matter of whether leading and trailing spaces are removed from the text fields or not. Maybe you're getting some extra space that is making you think it's not working.
jlin
Posts: 27
Joined: Fri Apr 06, 2001 4:03 pm

xml parser

Post by jlin »

I was using the XML parser yesterday and had some problems.

Here is my data file (output.txt):

<opt venueID="747" cost="$$$" name="55 Wall" description="In a landmark building, bulls and bears consume elegant, eclectic fare. Breakfast, lunch, dinner daily">
<restaurant cuisine="New American" payment="AE, DC, MC, V" />
<address neighborhood="Financial District" crossstreet="(btw. Hanover & William Sts.)" phone="212- 699-5555" street="55 Wall St." />
</opt>

And here is my vortex script:

<SCRIPT LANGUAGE=vortex>
<A NAME=main PUBLIC>
<$sch="
xml
trimspace
field ID varchar opt@venueID ''
field Cost varchar opt@cost ''
field Name varchar opt@name ''
field Description varchar opt@description ''
field Cuisine varchar opt/restaurant@cuisine ''
field Payment varchar opt/restaurant@payment ''
field Neighborhood varchar opt/address@neighborhood ''
field CS varchar opt/address@crossstreet ''
field Phone varchar opt/address@phone ''
field Street varchar0 opt/address@street ''
">

<read output.txt><$out=$ret>
<timport $sch $out>
<fmt $ID><fmt $Cost><fmt $Name><fmt Description>
<fmt $Cuisine><fmt $Payment>
<fmt $Neighborhood><fmt $CS><fmt $Phone><fmt0 $Street>
</timport>
</A>
</SCRIPT>

The odd thing is it didn't parse the <address> tag (the rest get printed out). But after I changed the name of the tag to <add>, it worked. Is <address> tag predefined (or something like that, if so, any other predefined names?) and is there any level of limitation of the XML parser?

Thanks.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

xml parser

Post by mark »

By default the xml parser attempts to allow HTML as "data" within the xml. So tags that are the same name as standard HTML elements are treated as text instead of tags. You can disable this by adding the "nohtml" option to xml.

xml nohtml
trimspace
...

The current list of recognized HTML tags is
"A", "ADDRESS", "APP", "APPLET", "AREA", "B", "BASE", "BASEFONT", "BLOCKQUOTE", "BODY", "BR", "CDATA", "CENTER", "CITE", "CODE", "COMMENT", "DD", "DIR", "DL", "DOCTYPE", "DT", "EM", "FONT", "FORM", "FRAME", "FRAMESET", "H1", "H2", "H3", "H4", "H5", "H6", "HEAD", "HR", "HTML", "I", "IMG", "INPUT", "ISINDEX", "KBD", "LI", "LINK", "LISTING", "MAP", "MENU", "META", "NEXTID", "NOBR", "OL", "OPTION", "P", "PARAM", "PLAINTEXT", "PRE", "SAMP", "SCRIPT", "SELECT", "STRIKE", "STRONG", "TABLE", "TD", "TEXTAREA", "TH", "TITLE", "TR", "TT", "UL", "VAR", "WBR", "XMP"
jlin
Posts: 27
Joined: Fri Apr 06, 2001 4:03 pm

xml parser

Post by jlin »

Hi,
If I have a database with the following schema:

Field Type
----- ----
Id int
user varchar
text text

this is the data I want to import in XML format:

<DATASET>
<RECORD id="111">
<TEXT>
here's some text
</TEXT>
<USER name='John Smith'>
</USER>
</RECORD>
</DATASET>

and this is the XML schema:

xml
trimspace
field id int DATASET/RECORD@id ''
field user varchar DATASET/RECORD/USER@name ''
field text text DATASET/RECORD/TEXT ''

$id is an integer, and $text is of text type instead of varchar according to their corresponding field type in the database. Can I do that? It seems that it didn't work when I tried that. Does the XML schema allow for 'int' and 'text' type? Or maybe the XML data needs to be modified in order for the schema to read 'id' in as an integer?
Thanks.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

xml parser

Post by mark »

There is no "text" type, use "varchar". "user" is a reserved word in SQL. Use something like "User" for the field name instead. We generally suggest using Uppercase first letter for field names to avoid keyword conflicts.

Also, timport will automatically create a field called id of type counter unless you use the "noid" keyword in the schema.