Removing accents

Post Reply
User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

Removing accents

Post by Thunderstone »




I checked the archives for accents related postings, unfortunately
I didn't find exactly what I was looking for. Is there a slick way
to prevent accents in the database and in the query-string?

The intention is to force, for example &agrave (or its 8-bit
represention à), into its corresponding accentless version (a)
so that there are actually no accents in the DB.

The associated problem is to remove accents from the
query-string so that a search will effectively retrieve
the accentless documents

I am maintaining a bilingual site (english and french) and the
interest in accentless behavior arises from the fact that while
some users search for french accented query-strings, others search
for accentless query-strings. So by filtering accents all the way
down to the DB, everybody will be dealing with the same thing
and queries will return more matches.

On another issue, I saw that you encourage the use of a single
database per site (if possible). Im my case, wouldn't that be
better to use two databases (one for french documents and the
other for english documents), thereby mitigating the risk of
exceeding the limit of 10,000 documents per database? What
are the pros and cons?

In advance, thanks.
--
RBS



Post Reply