Page 3 of 3
Encoding Type used to store input
Posted: Wed Oct 11, 2006 12:32 pm
by mark
No. contenttypeparam is the type specified in the downloaded page. charsettxt is what the extracted text is. It could be different.
Encoding Type used to store input
Posted: Thu Feb 08, 2007 1:17 pm
by aitchon
If I detect that <urltext> produces invalid UTF-8 data, is there a way to replace all invalid UTF-8 characters with a valid character like a space?
Encoding Type used to store input
Posted: Thu Feb 08, 2007 2:51 pm
by mark
If you can detect them you can replace them. See <sandr>.
Encoding Type used to store input
Posted: Thu Feb 08, 2007 4:10 pm
by aitchon
I used the procedure in message#11 to detect for existence of invalid UTF-8 characters, but it seems that I found some text that was produced from urltext used the following escape sequence and wasn't detected:

Should I just replace anything that starts with &# and ends with ;?
Encoding Type used to store input
Posted: Thu Feb 08, 2007 4:51 pm
by mark
Sounds reasonable.