Noise words in OR statement

Post Reply
edev
Posts: 127
Joined: Wed Sep 14, 2005 5:10 pm

Noise words in OR statement

Post by edev »

Hi,

our QA analyst just discovered that the OR format (a,b) seems to ignore words in the noise list:

Noise word, when "OR" phrase entered appear to negate any legitimate word in the phrase: (qu',que,sa,ses,son,la peinture) is entered there are no results found. When la peinture is entered there are 295 results found.
(son,la peinture) - no results found
(que,la peinture) - no results found
(qu',la peinture) - no results found
(sa,la peinture) - no results found
(ses,la peinture_ - no results found

but ses peintures calls up 191 results
Likewise when
(après,as,aussi,autre peinture) is entered - no results found.

Is this a known bug? Thank you.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Noise words in OR statement

Post by mark »

I assume you've customized the noise list since none of the terms you mention are default noise words.
Under search settings set "Keep noise words" to "Y" if you don't want noise words stripped from the query.
edev
Posts: 127
Joined: Wed Sep 14, 2005 5:10 pm

Noise words in OR statement

Post by edev »

Thanks Mark,

my problem is when I set $SSc_keepnoise = N, and have a noise word as the first term of an OR statement

eg. (the,montreal), the search returns no results because it ignores the word after the noise word.

But when I put in (montreal,the) I get the correct number of results on "montreal". So it seems like the noise word cannot be put in as the first term in an OR statement.

Is there a way to fix that? Thanks.
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

Noise words in OR statement

Post by John »

The first word is the root or main word, so if it matches noise the term is removed unless you set keepnoise to Y.
John Turnbull
Thunderstone Software
edev
Posts: 127
Joined: Wed Sep 14, 2005 5:10 pm

Noise words in OR statement

Post by edev »

Thanks John.

I also did set $SSc_keepnoise = Y and <apicp keepnoise 1>, but when I do a search with (the,montreal) I get both the results for "the" and "montreal", it doesn't seem to recognize that "the" is a noise word.

So whether I set $SSc_keepnoise to Y or N I still don't get the correct results.

Any ideas how to fix this? Thanks.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Noise words in OR statement

Post by mark »

With keepnoise on it is supposed to look for the noise words. Help me understand what you're trying to accomplish by searching for (the,montreal) when you don't want it to find "the"?
edev
Posts: 127
Joined: Wed Sep 14, 2005 5:10 pm

Noise words in OR statement

Post by edev »

This issue was brought up by our QA analyst, sorry I gave you a bad example. What I meant to say was, if a user enters the query (the ice sculpture,paintings), the search script is not able to identify the noise word "the" because it resides within an OR statement.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Noise words in OR statement

Post by mark »

Paren lists aren't meant so much as an OR but as an equivalence set where each word or phrase is taken as-is without any noise stripping. Noise stripping only occurs on individual words, not within phrases in quotes or equiv sets. The idea being that if the user asked for something that specific they meant it.
Post Reply