Page 1 of 2
strange problem with search term "privacy"
Posted: Wed Oct 13, 2004 3:44 pm
by zoeoberon
I'm having a strange problem that I don't see any obvious answer too.
The query in question is:
select count(*) as total_count from corpus where ( Title\Keywords\Description\Content\MediaFileText likep 'privacy')
If I run this from TSQL I get a count around 12K which I expect. If I run this from a Vortex script, I get a count of zero. There are no messages in the Vortex or error log indicating any errors.
Here're the various setttings at the top of the Vortex script:
<apicp keepeqvs no>
<apicp alpostproc yes>
<apicp alwithin yes>
<apicp qminwordlen 1>
<SQL NOVARS "set hyphenphrase=0"></SQL> <apicp minwordlen 6>
<apicp exactphrase on> (or off depending on double quotes, doesn't matter here, same problem)
Other terms work fine including 'private', 'personal privacy' and 'privacy concerns'. Why would only 'privacy' return zero from the script?
Thanks for you help.
strange problem with search term "privacy"
Posted: Wed Oct 13, 2004 5:38 pm
by Kai
I would suspect some other setting is taking effect that we're not aware of.
Assuming you are using a metamorph index, you can set indextrace=50 to see what atomic terms are being looked up in the index, after suffix processing etc. This will produce copies error messages; just save the initial few dozen or so until "allmatch: N and: N ...". "fdbix_seek()" messages indicate words actually found in the index.
strange problem with search term "privacy"
Posted: Thu Oct 14, 2004 11:12 am
by zoeoberon
Okay, here's the results. I'm not sure what this is telling me except that it looks like it found something?
//////////////////////////////////////
/usr/local/morph3/htdocs$ texis -traceidx -traceidx -traceidx -traceidx -traceidx sitename=directmag terms=privacy pbmmsearch/main.txt
[select count(*) as total_count from pbmmcorpus where ( Title\Keywords\Description\Content\MediaFileText likep 'privacy' ) and ( Sitedirname = 'directmag' )]
200 pbmmsearch:336: openfdbi(/pirt-dl/pbmm/db/xcorpus1, R, F) = 0x85FE768
200 pbmmsearch:336: mmap(/pirt-dl/pbmm/db/xcorpus1.tok, 0x0, 0x1AD9B0, R) = 0x402E7000
200 pbmmsearch:336: mmap()ing entire Metamorph index token file /pirt-dl/pbmm/db/xcorpus1.tok in the function openfdbi
200 pbmmsearch:336: Can't mmap() Metamorph index data file /pirt-dl/pbmm/db/xcorpus1.dat: (indexmmap & 2) off; using file I/O in the function openfdbi
200 pbmmsearch:336: 1/2 privacist
200 pbmmsearch:336: 1/1 privacist's
200 pbmmsearch:336: 1/1 privacists
200 pbmmsearch:336: 213884/220650 privacy
200 pbmmsearch:336: allmatch: 1 and: 0 set: 1 not: 0 minsets: 1
200 pbmmsearch:336: kdbf_readchunk(0x11E387BF, 0x10000) = 0x10000
[total_count=0]
No matching records found.<br>
<!-- end Texis -->
200 closefdbi(0x85FE768)
200 munmap(/pirt-dl/pbmm/db/xcorpus1.tok, 0x402E7000, 0x1AD9B0)
//////////////////////////
Here's the index being created:
set indexmem=40;
set delexp=0;
set addexp='[\alnum&\-\x80-\xff]{1,99}';
set addexp='[\alnum&\-\x80-\xff\x27]{1,99}';
create metamorph inverted index xcorpus1 on pbmmcorpus(Title\Keywords\Description\Content\MediaFileText);
////////////////////////
Have to say I'm still baffled - does this mean anything to you?
strange problem with search term "privacy"
Posted: Thu Oct 14, 2004 11:17 am
by Kai
Yes, several words were found in the index that suffix-match `privacy'. So you should get results from the LIKEP.
But I noticed that your SQL now has an AND clause, which may be reducing the results. Try running the same SQL without the AND, exactly as you did in your tsql example.
strange problem with search term "privacy"
Posted: Thu Oct 14, 2004 11:32 am
by zoeoberon
Sorry, I simplified the query when I put up the initial public question to not have the client names. The exact SQL I was using for all of this testing is what you see with the AND clause. In other words, when I run it in TSQL I get 12K results and from Vortex nothing.
Removing the AND clause does return results with the following index trace:
[select count(*) as total_count from pbmmcorpus where ( Title\Keywords\Description\Content\MediaFileText likep 'privacy' )]
200 pbmmsearch:336: openfdbi(/pirt-dl/pbmm/db/xcorpus1, R, F) = 0x85FDB28
200 pbmmsearch:336: mmap(/pirt-dl/pbmm/db/xcorpus1.tok, 0x0, 0x1AD9B0, R) = 0x402E7000
200 pbmmsearch:336: mmap()ing entire Metamorph index token file /pirt-dl/pbmm/db/xcorpus1.tok in the function openfdbi
200 pbmmsearch:336: Can't mmap() Metamorph index data file /pirt-dl/pbmm/db/xcorpus1.dat: (indexmmap & 2) off; using file I/O in the function openfdbi
200 pbmmsearch:336: 1/2 privacist
200 pbmmsearch:336: 1/1 privacist's
200 pbmmsearch:336: 1/1 privacists
200 pbmmsearch:336: 213884/220650 privacy
200 pbmmsearch:336: allmatch: 1 and: 0 set: 1 not: 0 minsets: 1
<snip out kdbf_readchunk>
[total_count=100000]
<!-- end Texis -->
200 closefdbi(0x85FDB28)
200 munmap(/pirt-dl/pbmm/db/xcorpus1.tok, 0x402E7000, 0x1AD9B0)
strange problem with search term "privacy"
Posted: Thu Oct 14, 2004 12:28 pm
by John
For questions you don't want to be public you can use the Tech Support link to send a private message to tech support.
If you run the queries with a "set verbose=2;" it will show a little more about how it is processing the query as it appears that the Metamorph index portion is correct.
strange problem with search term "privacy"
Posted: Thu Oct 14, 2004 1:26 pm
by zoeoberon
Here's the same command but with verbose=2 in the script:
<snip>
200 pbmmsearch:337: 1/2 privacist
200 pbmmsearch:337: 1/1 privacist's
200 pbmmsearch:337: 1/1 privacists
200 pbmmsearch:337: 213884/220650 privacy
200 pbmmsearch:337: allmatch: 1 and: 0 set: 1 not: 0 minsets: 1
200 pbmmsearch:337: kdbf_readchunk(0x11E387BF, 0x10000) = 0x10000
<snip kdbf_readchunk>
200 pbmmsearch:337: Looking for index on pbmmcorpus (Sitedirname)
200 pbmmsearch:337: Opening index /pirt-dl/pbmm/db/xcorpus2 in the function ixbtindex
200 pbmmsearch:337: Comparing records
<snip Comparing records>
200 pbmmsearch:337: Expect to read 4% of the index in the function ixbtindex
999 pbmmsearch:337: Handling a table project in the function dotree
999 pbmmsearch:337: Handling a table select in the function dotree
999 pbmmsearch:337: No more rows [0] from pbmmcorpus
999 pbmmsearch:337: Deleting temp row
999 pbmmsearch:337: Handling a table project in the function dotree
[total_count=0]
///////////////////////////
The Sitedirname index is just simply:
create index xcorpus2 on pbmmcorpus(Sitedirname);
which also seems fine.
strange problem with search term "privacy"
Posted: Mon Oct 18, 2004 11:49 am
by zoeoberon
Any further thoughts on this?
strange problem with search term "privacy"
Posted: Tue Oct 19, 2004 5:47 pm
by John
If you reorder the clause to put the Sitedirname part first does that help?
strange problem with search term "privacy"
Posted: Wed Oct 20, 2004 2:51 pm
by zoeoberon
I moved the terms to the last condition on the SQL and that did fix the problem. Why would putting it through that index first make a difference?