Page 1 of 1
HTML entities and search
Posted: Wed Mar 08, 2006 5:06 pm
by bhiggins
We have a database in which special characters are saved as HTML entities. I'll use Guantanamo as my example. In the database it's saved as Guant&#aacute;anamo. Of course, a search for Guantanamo won't find it.
There are any number of ways to approach this, but I'm wondering if there is something we can do from the search side rather than doing a lot of reworking of the DB, which is huge.
If you believe sandr or regex work on the DB is required I'd appreciate any suggestions on that end as well.
(FYI: This is full-blown Texis running on a Unix server. I'm in the unfortunate position of picking up support from a guy who has left the company.)
Thanks for any and all input.
Bob
Bob
HTML entities and search
Posted: Wed Mar 08, 2006 6:06 pm
by bhiggins
I only have to deal with 10% of the database; the rest was handled differently. Of that 10% I'd estimate a third of them have accents.
Some background: This is a database of newspaper stories. For the older records the developer had a separate field in the DB where he stored the non-accented version of each accented word (resume, cliche, etc.). The search would then find either version of the word. When HTML entities were introduced about a year ago it broke that functionality. So that's what I'm trying to remedy.
Meanwhile, on equivalence: There's something I'm not doing right, apparently. I created a very simple eqvsusr.lst file with just one line:
guantanamo,guantánamo
Ran backref, no errors reported. Equivs are on in the search page. But search results are no different. What might I be missing?
Thanks,
Bob
HTML entities and search
Posted: Thu Mar 09, 2006 11:03 am
by mark
; is special in the equiv. Escape it with \
guantanamo,guantá\;namo
So you're saying all new data has the entities? I'd take John's suggestion and get rid of the entities by replacing them with their equivalent characters at import time as well as fixup the small fracton of records that already have the entities. Then the search will continue to work as before.
HTML entities and search
Posted: Thu Mar 09, 2006 12:30 pm
by bhiggins
I'll try out your suggestions, thanks.
I escaped the ; in the equivalence file, but results are the same on search. Any further suggestions welcomed.
Bob
HTML entities and search
Posted: Thu Mar 09, 2006 2:02 pm
by mark
Do you have all the requisite settings to use and allow the equivs?
<apicp eqprefix /full/path/to/your/equiv/file>
<apicp keepeqvs 1>
<apicp alequivs 1>
Make sure the equiv file is readable by the texis user.
Check the source of your results page and/or vortex.log to see if there are any errors generated.