Page 3 of 3

Encoding Type used to store input

Posted: Wed Oct 11, 2006 12:32 pm
by mark
No. contenttypeparam is the type specified in the downloaded page. charsettxt is what the extracted text is. It could be different.

Encoding Type used to store input

Posted: Thu Feb 08, 2007 1:17 pm
by aitchon
If I detect that <urltext> produces invalid UTF-8 data, is there a way to replace all invalid UTF-8 characters with a valid character like a space?

Encoding Type used to store input

Posted: Thu Feb 08, 2007 2:51 pm
by mark
If you can detect them you can replace them. See <sandr>.

Encoding Type used to store input

Posted: Thu Feb 08, 2007 4:10 pm
by aitchon
I used the procedure in message#11 to detect for existence of invalid UTF-8 characters, but it seems that I found some text that was produced from urltext used the following escape sequence and wasn't detected:

&#3;

Should I just replace anything that starts with &# and ends with ;?

Encoding Type used to store input

Posted: Thu Feb 08, 2007 4:51 pm
by mark
Sounds reasonable.