proximity search question

Post Reply
gazim
Posts: 66
Joined: Sun Feb 18, 2001 1:01 pm

proximity search question

Post by gazim »

I am trying to find WORD1 within 2 words of WORD2. my query looks like..

$tsql "set withinmode='word';set indexwithin=1;select * from tbltest where author like 'Income w/6 Company"


ID author
------------+------------+
1 Income produced by edible lipgloss brought the company ,000,000 last year.

If i change the proximity buffer from 4 to 2 (ie: Income w/2 Company), i still get the same hit, even though Company is not within TWO words of Income. I am sure I am doing something wrong. Please help!

Thanks in advance.
gazim
Posts: 66
Joined: Sun Feb 18, 2001 1:01 pm

proximity search question

Post by gazim »

my index expressions-

tsql -q "set keepnoise='on';set delexp=0;set addexp='\punct{1,5}';set addexp='\alnum{1,99}';set addexp='>>\alpha{1,50},=\alph
a{1,50}';create metamorph inverted index idxmauthor on tbltest(author);"
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

proximity search question

Post by mark »

You need to use
<apicp withinmode word>
instead of the sql set variation. So you won't be able to do it in tsql, only in vortex or C.
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

proximity search question

Post by John »

Which version of tsql are you using? It should work in tsql with the set withinmode='word'. The following works for me:

create table tbltest(ID int,author varchar(20));
insert into tbltest values(1,'Income produced by edible lipgloss brought the company ,000,000 last year.');
create metamorph inverted index idxmauthor on tbltest(author);
set withinmode='word';
set indexwithin=1;
select * from tbltest where author like 'income company w/1';
select * from tbltest where author like 'income company w/2';
select * from tbltest where author like 'income company w/3';
select * from tbltest where author like 'income company w/4';
select * from tbltest where author like 'income company w/5';
select * from tbltest where author like 'income company w/6';
select * from tbltest where author like 'income company w/7';
drop table tbltest;

I only get the result out for w/4 and higher.
John Turnbull
Thunderstone Software
gazim
Posts: 66
Joined: Sun Feb 18, 2001 1:01 pm

proximity search question

Post by gazim »

Thank you.
I am using 05.01.1138031466(20060123) version.

I tried using your tsql statements and they worked like a charm. It returned hits for w/4 or higher.

Then I added a few index expressions to the create index and ran the script again. This time I got hits for w/2 or higher. I narrowed it down to the expression “set addexp='\alnum{1,99}'”. It appears that if I remove \alnum{1,99} from my create index statement, then I get the right hits. With this expression added to the create index statement, hits are being returned for w/2 - which is obviously erroneous. Is this an expected behavior?



drop table tbltest;
create table tbltest(ID int,author varchar(20));
insert into tbltest values(1,'Income produced by edible lipgloss brought the company ,000,000 last year.');
set keepnoise='on';set delexp=0;set addexp='\punct{1,5}';set addexp='\alnum{1,99}';set addexp='>>\alpha{1,50},=\alph
a{1,50}';create metamorph inverted index idxmauthor on tbltest(author);
set withinmode='word';
set indexwithin=1;
select * from tbltest where author like 'income company w/1';
select * from tbltest where author like 'income company w/2';
select * from tbltest where author like 'income company w/3';
select * from tbltest where author like 'income company w/4';
select * from tbltest where author like 'income company w/5';
select * from tbltest where author like 'income company w/6';
select * from tbltest where author like 'income company w/7';
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

proximity search question

Post by John »

If you do a tsql "set indexaccess=1; select * from idxmauthor" what do you get? It still works for me with those index expressions.
John Turnbull
Thunderstone Software
gazim
Posts: 66
Joined: Sun Feb 18, 2001 1:01 pm

proximity search question

Post by gazim »

I get the following. \alnum{1,99} and \punct{1,5) were included in create index.

Word Count
------------+------------+
, 1
. 1
000 1
brought 1
by 1
company 1
edible 1
income 1
last 1
lipgloss 1
produced 1
the 1
year 1
gazim
Posts: 66
Joined: Sun Feb 18, 2001 1:01 pm

proximity search question

Post by gazim »

John, the updated version that you sent us seems to have resolved the anomaly. Thank you!

I have one more question about the withinmode settings. Sometimes it appears that some of the words within the proximity buffer are being ignored.
For example,

'Income w/4 Company' finds

"Income produced by edible lipgloss brought the company ,000,000 last year."


There are 6 words between Income and Company. It appears that 'by' and 'the' are being ignored. Does this behavior have anything to do with noise words?
User avatar
Kai
Site Admin
Posts: 1272
Joined: Tue Apr 25, 2000 1:27 pm

proximity search question

Post by Kai »

It's not due to noise words (they always count for w/N), but the fact that linear w/N search currently matches words up to 2N left and 2N right of the anchor word, instead of N left and N right. Index search (activated with set indexwithin=7) may do the same, but usually is limited to N left and N right as expected.
Post Reply