Hi,
I crawl pages in French and noticed a problem in displaying accented French characters. When French accented charaters are entered directly instead of their HTML entity encoding, for example é instead of "é", I'm getting incorrect display.
One page has none HTML encoded title in French like this:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="fr" lang="fr">
<!-- START HEAD -->
<head>
<title>Le Comité consultatif de rédaction</title>
...
And the search result comes as
Le Comit� consultatif de r�daction
for the title field.
All pages are UTF-8 encoded.
Because the titles of these pages are described in XML so we cannot use HTML entity encoding for French accented characters, whereas the rest of the website uses HTML entity encoding for all accented characters. Is there a way to avoid this?
Thank you in advance!
I crawl pages in French and noticed a problem in displaying accented French characters. When French accented charaters are entered directly instead of their HTML entity encoding, for example é instead of "é", I'm getting incorrect display.
One page has none HTML encoded title in French like this:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="fr" lang="fr">
<!-- START HEAD -->
<head>
<title>Le Comité consultatif de rédaction</title>
...
And the search result comes as
Le Comit� consultatif de r�daction
for the title field.
All pages are UTF-8 encoded.
Because the titles of these pages are described in XML so we cannot use HTML entity encoding for French accented characters, whereas the rest of the website uses HTML entity encoding for all accented characters. Is there a way to avoid this?
Thank you in advance!