XML character problem

Post Reply
skalyanaraman
Posts: 109
Joined: Tue May 29, 2001 9:13 pm

XML character problem

Post by skalyanaraman »

Hi,
We are moving along with XML in timport fine. But hit a snag where sandr could not do a proper search and replace. Here is what I have,
<xml>
<rs:data> <z:row DOCTEXT='{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl {\f0\fswiss\fcharset0 Arial;}}
\uc1\pard\ulnone\f0\fs20\objattph\'20\par
}
'/>
</rs:data>
</xml>

But when I print out the above text after the sandr (the sandr is, <sandr "<=/?rs:data>" "" $data>), is

before,
<xml>
<rs:data>
<z:row DOCID='MSG000867433_RTF' DOCTEXT='{\rtf1\ansi\ansicpg1252\deff
0\deflang1033{\fonttbl {\f0\fswiss\fcharset0 Arial;}}
\uc1\pard\ulnone\f0\fs20\objattph\&#x27;20\par
}
&#0;'/>
</rs:data>
</xml>

and after,

<xml>

<z:row DOCID='MSG000867433_RTF' DOCTEXT='{\rtf1\ansi\ansicpg1252\deff
0\deflang1033{\fonttbl {\f0\fswiss\fcharset0 Arial;}}
\uc1\pard\ulnone\f0\fs20\objattph\&#x27;20\par
}


As you can see, there is a &#0; character in the first printout of the data. What is this? and it looks like it is breaking because of this.

Any help will be good.
thanks!!
User avatar
Kai
Site Admin
Posts: 1272
Joined: Tue Apr 25, 2000 1:27 pm

XML character problem

Post by Kai »

There are nul bytes in your source data apparently; these will need to be removed first because <sandr> does not handle nuls (it drops everything at and after the first nul, because it handles arguments as C strings).

One way is with <split>, which like <rex> can handle nuls:

<split "\x00+" $data><sum "%s" "" $data><$data = $ret>

Do this before the <sandr>.
skalyanaraman
Posts: 109
Joined: Tue May 29, 2001 9:13 pm

XML character problem

Post by skalyanaraman »

Thanks Kai. That worked!!
Now, one more efficiency question. The split actually gives a list. So I have to do a,
<split > ....</split>
in a loop and sum the chunks of data up. The sum seems to be a drag inside a split loop if there are a lot of items in the list. is there a efficient way to take out the nulls and come up with a string before the sandr?

For example,
in one case the split found 13000 nulls and when I do a sum inside the split loop it takes a long time and gives the impression it is not going finish.

thanks!!
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

XML character problem

Post by mark »

The <split> command given is not a looping op. No </split> needed. But yes, it returns a list. <sum> will sum all items of the list. No need to do inside a loop. Use the code kai gave you above.
skalyanaraman
Posts: 109
Joined: Tue May 29, 2001 9:13 pm

XML character problem

Post by skalyanaraman »

Hmm.. What Kai gave, the exact syntax did not work. I had to put it in a split loop with an accululative sum.

Do I have to do,
<split "\x00+" $data><sum "%s" "" $ret><$data = $ret>

instead of
<split "\x00+" $data><sum "%s" "" $data><$data = $ret>

(use $ret instead of $data in the sum?)

Lemme know.
thanks!!
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

XML character problem

Post by mark »

Yes, sorry. Problem of posting untried code.
Post Reply