Page 1 of 1

Max size exceeded in the function kdbf_endalloc

Posted: Mon Dec 29, 2014 6:13 pm
by rabbott
Received the following message while trying to reindex a 219 GB blob file:

006 Won't finish block at offset 0xAF9A6458 size 0x10208E9E3 in KDBF file ../idxmtblnewx_DOCTEXT.dat: Max size exceeded in the function kdbf_endalloc

Not sure if it's hardware related or the data is corrupt. If it's hardware related, it should be an easy fix; however I'm not sure how to identify the corrupt data.


Max size exceeded in the function kdbf_endalloc

Posted: Mon Dec 29, 2014 9:12 pm
by Kai
`Max size exceeded' in this case is a hard limit: typically around 4GB on 64-32 platforms (larger on 64-64). It is the max size of a single KDBF block, which in this case is storing location info for a word in the Metamorph index. That word is probably noisy (occurs a lot), which makes its .dat block large.

There are two ways around this: 1) reduce or eliminate the number of occurrences of this word, or 2) increase the max block size by switching to a 64-64 version of Texis.

Option 1a: if the word is known -- it may have been printed in an adjacent error message -- it can be added to the noise list. This makes the word unindexed and therefore only linear-searchable (time consuming), but removes the need to make an (impossibly-large) .dat block for it.

Option 1b: switch to a non-inverted metamorph index. This only stores row locations, not within-doc positions, so it needs less .dat space and thus this word should fit. The downside is that search times will probably increase, as lack of within-doc positions means full ranking (LIKEP) and phrase resolution must be done in a post-process, which can take a long time (but is generally faster than a full linear search). Fully indexable non-phrase queries with LIKE, LIKE3 or LIKER should be unaffected however.

Option 1c: split the data into multiple tables, and index and search them separately, merging results. Downside is potential increased search time, as the results must be merged (though probably still faster than a non-inverted Metamorph index). However, there is potential for speed *gain* here, if the tables can be split across multiple independent machines and searched in parallel (with e.g. Webinator meta search or custom <fetch parallel>).

Option 2: switch to a 64-64 version of Texis (if available for the platform), which will have a nearly-64-bit per-block size limit. Downside is of course porting tables and data to the new version, whose databases are not 64-32 compatible.