Webinator file size limits

Post Reply
b.sims
Posts: 99
Joined: Fri Oct 26, 2001 10:40 am

Webinator file size limits

Post by b.sims »

I recently ran a couple of test runs using Webinator 4 and Windows 2000. The first I stopped manually after a weekend, the second seemed to have stopped finding new pages and I did the same.

Both of these could have failed for other reasons; however, I noticed that in each case the most recent 'pages downloaded' figure was about 2GB. The tables themselves however add up to about 240MB for each run.

Is this a coincidence or a size problem? Below are relevant extracts from each status screen:

-- 4793 pages fetched (144,382,126 bytes) from http://www.unep.org/Documents/Default.a ... mentID=108
<P>48082 pages (2,089,237,648 bytes) so far.
4909 errors so far.
5462 duplicate pages so far.

-- 779 pages fetched (35,225,923 bytes) from http://www.marine.ie/datacentre/
43285 pages (2,091,143,626 bytes).
3172 errors.
3646 duplicate pages.


thanks
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Webinator file size limits

Post by mark »

That byte count is the amount of data downloaded. Modern HTML is very messy and the content to formatting ratio is very low. Webinator only stores content, not formatting, so the table size will generally be much smaller than the download size.

The printed byte count is a signed 32 bit quantity (on 32 bit operating systems), so it could roll over when it reaches the 2GB point. If you have not set a max bytes limit (left it at -1) it's not a problem.
Post Reply