The texis that comes with Webinator 2 limits tables loaded via vortex to 30 mb. Webinator 4 addresses that issue. You can see the licensing levels and try the beta at http://www.thunderstone.com/texis/site/ ... ator4.html
The full version will be released "very soon now".
That table contains all of the text from the walked pages. If you have more pages or more actual content (as opposed to HTML coding) per page the table will be larger. If you've deleted a lot of data, there's free space in the table file that should get reused.
The default schema will place the page text into the blb file. So the .tbl file will not contain that part. There are more fields in the html table than you describe though. None of them large. So if you're using the standard schema with the Body in a blob the bulk of the data will be in the .blb file unless the pages are mostly empty.
I just ran an index of the same site from our own machine (other process was on client machine) and have had no problems on our side. The final html.tbl file from our side is about 1 MB, however when the same process is run from the client machine, the file ends up being over 30 MB.
I'm assuming you started with a new empty database or wiped it first.
Are the following the same?
dowalk script
number of urls retrieved
database schema (html.tbl and html.blb both exist)
texis version
operating system
In both cases started with a new database. The other elements are basically the same with the exception of the OS WinNT vs. WinNT Small Business Server version.
I presume the Texis version is the same as they two softwares were purchased within a couple of month, but can double check this. How do I check the version number?
Regarding the number of urls, how can I extract this from the table or log to compare the two?