I recently added an additional field for the meta description tag to one of our search profiles. In the Data From Field section, I set the REX Search to .+, the Replace to \1, the From Field to Description, and the To field to my additional field MetaDes. I do this so that I can choose to use the meta description if it's there - otherwise I use the query abstract (many sites crawled and not all docs have meta description).
However, there seems to be an encoding issue with something as simple as the registered trademark symbol. When I crawl this page - http://iwww.plasticsportal.com/products/ultraform.html - I get the following in the MetaDes additional field:
<MetaDes>Acetal polyoxymethylene (POM) copolymer products under the tradename Ultraform�...</MetaDes>
The unknown character should be a ®. Then, when loading the xml doc from the appliance, it fails because of "Invalid character in the given encoding." After setting the Output to No on the additional field, it works fine. Any ideas?
However, there seems to be an encoding issue with something as simple as the registered trademark symbol. When I crawl this page - http://iwww.plasticsportal.com/products/ultraform.html - I get the following in the MetaDes additional field:
<MetaDes>Acetal polyoxymethylene (POM) copolymer products under the tradename Ultraform�...</MetaDes>
The unknown character should be a ®. Then, when loading the xml doc from the appliance, it fails because of "Invalid character in the given encoding." After setting the Output to No on the additional field, it works fine. Any ideas?