Encoding Type used to store input

User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Encoding Type used to store input

Post by mark »

No. contenttypeparam is the type specified in the downloaded page. charsettxt is what the extracted text is. It could be different.
aitchon
Posts: 119
Joined: Mon Jan 22, 2007 10:30 am

Encoding Type used to store input

Post by aitchon »

If I detect that <urltext> produces invalid UTF-8 data, is there a way to replace all invalid UTF-8 characters with a valid character like a space?
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Encoding Type used to store input

Post by mark »

If you can detect them you can replace them. See <sandr>.
aitchon
Posts: 119
Joined: Mon Jan 22, 2007 10:30 am

Encoding Type used to store input

Post by aitchon »

I used the procedure in message#11 to detect for existence of invalid UTF-8 characters, but it seems that I found some text that was produced from urltext used the following escape sequence and wasn't detected:

&#3;

Should I just replace anything that starts with &# and ends with ;?
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Encoding Type used to store input

Post by mark »

Sounds reasonable.
Post Reply