Indexing differences between Vortex and TSQL

barry.marcus
Posts: 288
Joined: Thu Nov 16, 2006 1:05 pm

Indexing differences between Vortex and TSQL

Post by barry.marcus »

I am confused about why I get different results when searching a table that has been indexed via a vortex script vs. when the same table has been indexed from the command line using tsql.exe. Here is the Vortex script:

<script language=vortex>
<a name=main>
<sql db=D:\myDB "set delexp=0;"></sql>
<sql db=D:\myDB "set addexp='\alnum{1,99}';"></sql>
<sql db=D:\myDB "set keepnoise=1;"></sql>
<sql db=D:\myDB "create metamorph inverted index testdata_mi on testdata (PATN_TTL\ABST_PAx\CLMS_PAx)"></sql>
</a>
</script>

Here is the tsql.exe command line:

tsql.exe -d D:\myDB -i D:\indexScript.sql

The content of D:\indexScript.sql is as follows:

set delexp=0;
set addexp='\alnum{1,99}';
set keepnoise=1;
create metamorph inverted index testdata_mi on testdata (PATN_TTL\ABST_PAx\CLMS_PAx);


As you can see, the same sequence of statements is used to create the index in both cases. However, when I index the table using the vortex script, I find that one of our metamorph queries against the table does NOT return a hit, whereas the same metamorph query DOES return a hit when the table is indexed using the tsql.exe command. I understand that there are subtle differences between vortex and tsql, but this one has me stumped. The only thing I can think of that might be going on here is in how vortex and tsql differ when it comes to handling multiple statements. But if that is so, I don't see how to get around it. We want to create the index via our vortex script, but that is not working. As I said, we are finding that there are queries that do NOT return a hit in this case, when we know that they should.

I have not included the specific query nor the data in the table testdata, since (it seems to me) that the problem is somewhere in the way that I am indexing. However, I would be happy to include both of those if it would be helpful. The query is about 4800 characters in length, and rather complex, involving many references to terms found in a large equivalence file.

FYI, our Texis version is:

Commercial Version 6.01.1342629243 20120718 (i686-intel-winnt-64-32)

Thanks for your help.
barry.marcus
Posts: 288
Joined: Thu Nov 16, 2006 1:05 pm

Indexing differences between Vortex and TSQL

Post by barry.marcus »

As a side question to this issue, I am wondering if it is reliable, or even viable, to issue a metamorph query against the data *without* the existence of a metamorph index. I know that that is folly when the data is very large. However, in the case that gave rise to this issue, the data is actually really small, typically 20 to 30 rows, and just a few paragraphs per row in both the ABST_PAx and CLMS_PAx columns. I tried running the query in question against the table without an index, and the hits are consistent with those that I get when the table is indexed via tsql.exe from the command line. That is, they are the results we expect, at least in this case. And, given the small size of the table, it is fast enough for our purposes. My only question here is whether or not results in this case are consistently reliable. In other words, is a metamorph index REQUIRED for a metamorph query.
User avatar
Kai
Site Admin
Posts: 1271
Joined: Tue Apr 25, 2000 1:27 pm

Indexing differences between Vortex and TSQL

Post by Kai »

Given the SQL and properties set, the index produced should be the same from both tsql and Vortex: those particular SQL properties are global (not just local to the <sql> handle), and default the same in tsql and Vortex in version 6 anyway.

Are you sure you're running the same release of tsql and texis? Run tsql -? and texis -version and compare.

Also, try making the index with tsql, then make it with Vortex -- but with a different index name. Then compare the index files produced: are they the same size? Do they compare the same with cmp (or its equivalent under Windows, I don't know what)? Note the ..._P.tbl file especially.

As for querying without a Metamorph index, in theory the results are supposed to usually match, with or without an index (ignoring perm settings like allinear which would prevent linear queries altogether). However, in practice, because of the different nature of indexed (word-based, index expressions) vs. linear (character-based) queries, differences can unavoidably occur. If consistency is paramount, always make indexes.
barry.marcus
Posts: 288
Joined: Thu Nov 16, 2006 1:05 pm

Indexing differences between Vortex and TSQL

Post by barry.marcus »

The versions are the same for tsql.exe and texis.exe. The indexes *are* different, though! Not sure why that would be?!? Perhaps we are overriding some version 6 default behavior with a setting in our texis.ini file. When we upgraded from v5 to v6 we did set a few parameters in the .ini file in order to retain the v5 behavior without a modifcation of our code. Are there any parameters that come to mind that might be coming back to haunt us now? The only one I thought we set was the one that controls arrayconvert behavior in v6. As for file comparison, I ran FC.EXE /B (file compare binary), and a considerable list of differences was returned. I don't really know how to interpret those differences, though, as it is a list of offsets where differences were encountered and byte values from each file at those offsets. It's really just a list of hex numbers!?! And the _P.tbl created by tsql is larger than that created by vortex. Are there tools available in Texis/TSQL to "dissect" these two indexes and maybe get a handle on what's what in each?
User avatar
Kai
Site Admin
Posts: 1271
Joined: Tue Apr 25, 2000 1:27 pm

Indexing differences between Vortex and TSQL

Post by Kai »

1) Verify you're running the texis you think you are: check the version inside the Vortex script too: <vxinfo version> $ret

2) Altering [Apicp] Texis Search Mode in texis.ini can affect Metamorph indexes; there may be others I've forgotten at the moment. Obviously any texis.ini changes would affect Vortex scripts but not tsql.

3) You can examine a foo_P.tbl via SQL with:

copy foo_P.tbl temp.tbl
addtable temp.tbl
tsql "select * from temp" (will still have some binary output)

Then you can diff two _P.tbl files.
barry.marcus
Posts: 288
Joined: Thu Nov 16, 2006 1:05 pm

Indexing differences between Vortex and TSQL

Post by barry.marcus »

Yes. As I look at our texis.ini file I see that we DO override the default value for Text Search Mode, from the default v6 value of 'unicodemulti, ignorediacritics, expandligatures, ignorewidth, ignorecase' to the (presumably) v5 value of 'ctype, ignorecase, iso-8859-1'. Hmm... I guess this was done in a knee-jerk manner, upon reading in the new INI file that the v5 default was different, in an effort to maintain v5 behavior when we upgraded to v6. Granted, my overall understanding of how all this works is a bit shaky, but here's what we do... When we update our large indexes (i.e., after periodic data imports into our data tables using TIMPORT we issue CREATE INDEX statements against the existing indexes) we use tsql.exe to execute scripts much like the one in the example above. Do I take it then that since upgrading to v6, those index updates have been subject to the v6 default apicp parameters, whereas the indexes that we create on the small tables using the vortex scripts use the v5 override parameters specified in texis.ini? (I've not yet tried setting the v6 parameters in the vortex, but I will and let you know what effect that has on the results.) But now my larger question is, if this is the reason we are seeing discrepancies, should we even be using the v5 parameters at all any more, since the indexes have been updated using v6 tsql? I don't believe that we have have completely rebuilt our indexes since upgrading to v6 from v5, so is it possible that parts of the indexes are "v5-like" while the *newer* parts are "v6-like"? (Our "master" data tables are HUGE, and each of the ten or so indexes that we have for them takes *hours* to build from scratch. In other words, it's a lengthy proposition to reindex everything. (My apologies if these questions sound dumb or naive! Like I said, my understanding of all this is a bit shaky.)
User avatar
Kai
Site Admin
Posts: 1271
Joined: Tue Apr 25, 2000 1:27 pm

Indexing differences between Vortex and TSQL

Post by Kai »

Yes, you've pretty much got it: any indexes that have been created *from scratch* (not just update-pre-existing) since you upgraded to v6 -- and using either tsql, or texis/Vortex *before* changing texis.ini textsearchmode to v5 -- now have the v6 textsearchmode.

You can tell if any indexes are v5 with "select NAME, PARAMS from SYSINDEX": the textsearchmode (and stringcomparemode for B-tree indexes) will be given in PARAMS. If textsearchmode is `unicodemulti,ignorecase,ignorediacritics,expandligatures,ignorewidth', or stringcomparemode is `unicodemulti,respectcase', then the index is v6. Otherwise it is (probably) v5, and probably should be dropped and re-created as v6 for consistency going forward.
barry.marcus
Posts: 288
Joined: Thu Nov 16, 2006 1:05 pm

Indexing differences between Vortex and TSQL

Post by barry.marcus »

None of the indexes, with the exception of ivid which I guess, is one that Texis builds on the vortex table, has a value for PARAMS at all. Also, I commented out the line in texis.ini which overrode the setting for Text Search Mode, and uncommented the v6 line, but I am still seeing the same results. That is, no hit on our query when the index is created via vortex. Also, just to be thorough, I dropped and rebuilt the same index on the our large table. Any ideas?
User avatar
mark
Site Admin
Posts: 5514
Joined: Tue Apr 25, 2000 6:56 pm

Indexing differences between Vortex and TSQL

Post by mark »

When you say "via vortex" do you mean running "texis yourscript" on the command line or accessing /texis/yourscript through a web server? If a webserver which one?

In your script do
<vxinfo version> $ret
to ensure you really running the same version there as tsql.
barry.marcus
Posts: 288
Joined: Thu Nov 16, 2006 1:05 pm

Indexing differences between Vortex and TSQL

Post by barry.marcus »

Either one. The webserver is IIS7. We have code that runs as CGI on that webserver, which is the code that I mean when I say "via Vortex". But in trying to isolate this issue, I also created a tiny little script that I run from the command line, i.e.,

texis myScript

In both cases (i.e., in our CGI code and in the test script) I put in the <vxinfo version> and get the same thing, namely:

Commercial Version 6.01.1342629243 20120718 (i686-intel-winnt-64-32)

When we create our large indexes we do not use vortex code. In that case, I have a file of sql create index commands which I run with:

tsql -d <mydb> -i <myindexscriptfile>

The output of tsql -? is:

Texis Version 06.01.1342629243(20120718) Copyright (c) 1988-2012 Thunderstone EPI

i.e., the versions are all the same.
Post Reply