Hello,
We need to index text that is pre-formatted in html within the main html document we are indexing. To prevent this text from appearing when a user views the page, it is also commented out. In order to index this into proper XML, then, we need to 1)remove the comments and 2)escape the brackets (< and >).
Here is an example:
Or put that text into a meta field and extract that.
Text in comments is intended to not be seen or generally accessible to the user so it is removed from indexing.
Having the indexer keep commented data would open a whole big can of worms when it started indexing all kinds of irrelevant cruft.
Thanks for your help.
I didn't mention that this text will probably have to be stored in the head section of the html, so that would rule out the solution with the span tag.
But is it safe to put a considerable amount of text in the meta tag (could be a few hundred characters)?