readln inconsistent

Post Reply
Fippy
Posts: 11
Joined: Fri Feb 09, 2007 7:15 pm

readln inconsistent

Post by Fippy »

Hello all,
I'm using <readln row max=-1 $filename> to read lines from an XML file immediately followed by a line counter.

Every time I run the script I get varying number of lines. Often it reads every line in the file (550,000 odd) and sometimes it only reads (214,000) and sometimes even less.

The file has not changed so why does readln interpret the number of lines differently? What does readln consider a line break character and does it have a maximum length of bytes it can read per line.

The biggest problem I have is why is it inconsistent for the same file and same code?

The original author put in the max=-1 option which sounds very dubious to me? Thunderstone docs do not say that -1 serves any purpose.

Thanks!
Graeme
User avatar
Kai
Site Admin
Posts: 1272
Joined: Tue Apr 25, 2000 1:27 pm

readln inconsistent

Post by Kai »

A max of -1 for <readln> is indeed the same as no limit specified (the default); documentation oversight. <readln> does not have a line length limit (other than allocatable memory in general), and considers either CR or LF, followed by an optional LF or CR, respectively, as end of line.

Are there errors printed/logged by the script, eg. a timeout being reached at varying points in the file?
Which version of Vortex is this (output of texis -version)? For the short-line-count runs, can you tell if the last data returned is from somewhere in the middle of the file, or is it indeed reaching EOF but misrepresenting the line count?
Fippy
Posts: 11
Joined: Fri Feb 09, 2007 7:15 pm

readln inconsistent

Post by Fippy »

There are no errors ands the timeout is set to 3 hrs and it takes 5 mins to run. It looks like it is truncating the input, i.e. somehow skipping lines or taking multiple lines as 1 line or something. The last record in the physical file is not the last record read in when it goes wrong, if you see what I mean.It is possible it is skipping records throughout the file but I would have to investigate more carefully to see if that were the case.

Texis Web Script (Vortex) Copyright (c) 1996-2004 Thunderstone - EPI, Inc.
Commercial Version 5.00.1086121238 20040601 (i686-unknown-linux2.4.9-64-32)

Thanks
Fippy
Posts: 11
Joined: Fri Feb 09, 2007 7:15 pm

readln inconsistent

Post by Fippy »

Here's some more info that might help.

Here's a dump from a diff file between a completely procesed file and an incomplete one:

< 497297 : 462 bytes : <DescriptionHtml><![CDATA[Beautiful meticulously maintained center hall Colonial across from desirable Forest Ave school. Lovely open floor plan. Living room with fireplace and built-ins. Large modern eat-in-kitchen with center island and service bar opens to family room with fireplace. Bonus Florida room. Large deck. Central air on second floor. Built-ins galore. Separate tool house/potting shed. 1/2 block to shuttle to NY train.]]></DescriptionHtml>
> 497297 : 368 bytes : <DescriptionHtml><![CDATA[Beautiful meticulously maintained center hall Colonial across from desirable Forest Ave school. Lovely open floor plan. Living room with fireplace and built-ins. Large modern eat-in-kitchen with center island and service bar opens to family room with fireplace. Bonus Florida room. Large deck. Central air on second floor. Built-ins ga

Notice how in the 2nd case it just decided to truncate the line midway. The first number is the line number and the second number is line length read by readln.
It then immediately went on to exit the <readln> loop and reported:
> Finished processing 10699 records out of 497297 lines.

So it just thought that the file ended right there on that line.
Any known issues with readln and large quantities of data?

Thanks
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

readln inconsistent

Post by mark »

Just a thought...
Maybe the OS is encountering a glitch and returning a read error which causes readln to stop. Are these files on a network drive such as nfs or windows share?
Fippy
Posts: 11
Joined: Fri Feb 09, 2007 7:15 pm

readln inconsistent

Post by Fippy »

It turns out that the path is mounted across NFS, yes. I didn't realize that. I suspect that NFS is not that keen about reading a long file line by line.

My testing indicates that simply copying the file to local disk first has solved the error.

Thanks everyone.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

readln inconsistent

Post by mark »

Glad to hear you resolved it. As a side note, NFS shouldn't be that unreliable. You might want to check your mount options and network infrastructure.
Post Reply