Page 1 of 1

XML character problem

Posted: Thu Jun 13, 2002 5:40 pm
by skalyanaraman
Hi,
We are moving along with XML in timport fine. But hit a snag where sandr could not do a proper search and replace. Here is what I have,
<xml>
<rs:data> <z:row DOCTEXT='{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl {\f0\fswiss\fcharset0 Arial;}}
\uc1\pard\ulnone\f0\fs20\objattph\'20\par
}
'/>
</rs:data>
</xml>

But when I print out the above text after the sandr (the sandr is, <sandr "<=/?rs:data>" "" $data>), is

before,
<xml>
<rs:data>
<z:row DOCID='MSG000867433_RTF' DOCTEXT='{\rtf1\ansi\ansicpg1252\deff
0\deflang1033{\fonttbl {\f0\fswiss\fcharset0 Arial;}}
\uc1\pard\ulnone\f0\fs20\objattph\&#x27;20\par
}
&#0;'/>
</rs:data>
</xml>

and after,

<xml>

<z:row DOCID='MSG000867433_RTF' DOCTEXT='{\rtf1\ansi\ansicpg1252\deff
0\deflang1033{\fonttbl {\f0\fswiss\fcharset0 Arial;}}
\uc1\pard\ulnone\f0\fs20\objattph\&#x27;20\par
}


As you can see, there is a &#0; character in the first printout of the data. What is this? and it looks like it is breaking because of this.

Any help will be good.
thanks!!

XML character problem

Posted: Fri Jun 14, 2002 12:07 pm
by Kai
There are nul bytes in your source data apparently; these will need to be removed first because <sandr> does not handle nuls (it drops everything at and after the first nul, because it handles arguments as C strings).

One way is with <split>, which like <rex> can handle nuls:

<split "\x00+" $data><sum "%s" "" $data><$data = $ret>

Do this before the <sandr>.

XML character problem

Posted: Mon Jun 17, 2002 2:56 pm
by skalyanaraman
Thanks Kai. That worked!!
Now, one more efficiency question. The split actually gives a list. So I have to do a,
<split > ....</split>
in a loop and sum the chunks of data up. The sum seems to be a drag inside a split loop if there are a lot of items in the list. is there a efficient way to take out the nulls and come up with a string before the sandr?

For example,
in one case the split found 13000 nulls and when I do a sum inside the split loop it takes a long time and gives the impression it is not going finish.

thanks!!

XML character problem

Posted: Mon Jun 17, 2002 3:48 pm
by mark
The <split> command given is not a looping op. No </split> needed. But yes, it returns a list. <sum> will sum all items of the list. No need to do inside a loop. Use the code kai gave you above.

XML character problem

Posted: Mon Jun 17, 2002 5:52 pm
by skalyanaraman
Hmm.. What Kai gave, the exact syntax did not work. I had to put it in a split loop with an accululative sum.

Do I have to do,
<split "\x00+" $data><sum "%s" "" $ret><$data = $ret>

instead of
<split "\x00+" $data><sum "%s" "" $data><$data = $ret>

(use $ret instead of $data in the sum?)

Lemme know.
thanks!!

XML character problem

Posted: Mon Jun 17, 2002 6:00 pm
by mark
Yes, sorry. Problem of posting untried code.