No. contenttypeparam is the type specified in the downloaded page. charsettxt is what the extracted text is. It could be different.
Encoding Type used to store input
Encoding Type used to store input
If I detect that <urltext> produces invalid UTF-8 data, is there a way to replace all invalid UTF-8 characters with a valid character like a space?
Encoding Type used to store input
If you can detect them you can replace them. See <sandr>.
Encoding Type used to store input
I used the procedure in message#11 to detect for existence of invalid UTF-8 characters, but it seems that I found some text that was produced from urltext used the following escape sequence and wasn't detected:

Should I just replace anything that starts with &# and ends with ;?

Should I just replace anything that starts with &# and ends with ;?
Encoding Type used to store input
Sounds reasonable.