Trouble with apostrophes

Post Reply
michel.weber
Posts: 256
Joined: Sat Oct 08, 2005 12:40 pm

Trouble with apostrophes

Post by michel.weber »

I'm trying to index french documents, but i can't figure out how to search properly for apostrophes.

for ex. a search for : migration et d'asile
does not find any results, but sugests : migration et asile

When i look at the suggestions, 'asile' is not highlighted.

What am i doing wrong?

Word forms :[\alnum\'\x80-\xff]{1,70}
Language Characters : \alpha\'\x80-\xFF
post-processing is enabled.
User avatar
Kai
Site Admin
Posts: 1270
Joined: Tue Apr 25, 2000 1:27 pm

Trouble with apostrophes

Post by Kai »

Is the apostrophe the same character (ASCII, U+0027) in both the query and the document? If not, the query term and document word will be considered different and not match.

Can you post the URL to an example document that should match but does not?
michel.weber
Posts: 256
Joined: Sat Oct 08, 2005 12:40 pm

Trouble with apostrophes

Post by michel.weber »

Yes it's the same.

I have taken the apostrophe out of the 'word forms' and 'language characters', and now "asile" or "d'asile" match correctly.

I also added 'd' to the noise word list which for french makes sense, but somehow it still shows up as highlighted.


I can't provide a link as i'm working on a test machine which is not accessible from the internet.
User avatar
Kai
Site Admin
Posts: 1270
Joined: Tue Apr 25, 2000 1:27 pm

Trouble with apostrophes

Post by Kai »

Can you open a tech support ticket with a copy of All Walk Settings, a copy of the query, and a copy of an (HTML) page that it should match (as a ZIPed attachment)? We'll take a look.
User avatar
Kai
Site Admin
Posts: 1270
Joined: Tue Apr 25, 2000 1:27 pm

Trouble with apostrophes

Post by Kai »

Fixes for these issues are now in the scripts on our web site. Note that highlighting of a *phrase* such as `"d asile"' (as opposed to the non-phrase words `d asile') will not span apostrophes in the text, because linear matching of phrases will only match whitespace between words (to avoid spanning sentence endings).
Post Reply