Title: Intelligently reading a file one line at a time
Question: Have you ever needed to read a file one line at a time?
Answer:
Have you ever needed to read a file one line at a time? You can simply use a textfile type and call ReadLn, but on a large file thats rather inefficient, or you can use the TStrings.LoadFromFile method, which is also great, but the data resides in memory; which is no problem for a small text file, but when you need to read a larger file it can be a real headache.
Introducing TabLineStreamer!
This object accepts any stream as an input and then intelligently buffers an 8K (adjustable) chunk of data (which takes about the same time to read as a single line from a hard disk drive), which it then splices into lines, which can be accessed through the GetLine function. As you call the GetLine function it will read from the internal buffer (a TStringList) until the buffer is empty, and then read again from the file.
This object can only be used in a forward-only mode; it simply allows you to process data one line at a time. In a future article Ill introduce a helper object that can read a CSV file efficiently.
A simple example:
var
Streamer : TabLineStreamer;
fStream : TFileStream;
Line : String;
begin
fStream := TFileStream.Create('c:\filetoread.csv', fmOpenRead + fmShareDenyNone);
Streamer := TabLineStreamer.Create(fStream);
pbMax.Max := Streamer.Size;
while not Streamer.EOF do
begin
// Get the line
Line := Streamer.GetLine;
// Do something with the line
pbMax.Position := Streamer.Position;
Application.ProcessMessages;
end; // while
Streamer.Free;
fStream.Free;
Properties and functions:
Create(Stream : TStream; OwnStream : Boolean = false);
The Create constructor accepts any Stream descendant so this can be used with nearly any source (memory stream, etc), and a Boolean value indicating if the object should take ownership of the stream, if this is true when the object is freed it will also free the stream its maintaining.
GetLine : String;
This function returns the next available line.
Position : Integer;
Returns the position of the stream. This is useful in displaying a progress bar.
Size : Integer;
Returns the size of the stream.
EOF : Integer;
This returns true when the stream has reached the end of the file.
I hope you found this article and function to be useful; Id love to hear your comments, suggestions, etc.
-David Lederman
dlederman@InterentToolsCorp.com
The following is the source code for the functions described above, feel free to use the code in your own programs, but please leave my name and address intact!
// ---------------------------ooo------------------------------ \\
// 2000 David Lederman
// dlederman@internettoolscorp.com
// ---------------------------ooo------------------------------ \\
unit abStreams;
interface
uses
Classes, Sysutils;
const
BlockSize = 8192;
type
TabLineStreamer = class
private
DataStream : TStream;
DataOwner : Boolean;
CurrentLine, MaxLine, CurrentBufferLine : Integer;
Buffer : String;
BufferList : TStringList;
procedure InternalBufferData;
function GetUsableLines(DataToParse : String; var StringList : TStringList) : String;
public
published
constructor Create(Stream : TStream; OwnStream : Boolean = false);
destructor Destroy; override;
function GetLine : String;
function Position : Integer;
function Size : Integer;
function EOF : boolean;
protected
end;
implementation
{ TabLineStreamer }
constructor TabLineStreamer.Create(Stream : TStream; OwnStream : Boolean = false);
begin
DataStream := Stream;
CurrentLine := 0;
MaxLine := 0;
Buffer := '';
BufferList := TStringList.Create;
DataOwner := OwnStream;
// Now prepare the stream for usage
InternalBufferData;
end;
destructor TabLineStreamer.Destroy;
begin
BufferList.Free;
if DataOwner then FreeAndNil(DataStream);
inherited;
end;
function TabLineStreamer.EOF: boolean;
begin
// See if we are at the end
if CurrentLine MaxLine then
begin
EOF := False;
exit;
end;
// Now see if there is any more data
EOF := (Position = Size);
end;
function TabLineStreamer.GetLine: String;
begin
// Result line
if CurrentLine = MaxLine then
begin
// See if more data can be read
if not EOF then
begin
InternalBufferData;
end
else
begin
raise Exception.Create('EOF: Out-of-range');
end;
end;
// Now Return The Data
Result := BufferList[CurrentBufferLine];
Inc(CurrentBufferLine);
Inc(CurrentLine);
end;
function TabLineStreamer.GetUsableLines(DataToParse: String;
var StringList: TStringList): String;
var
StartPos : Integer;
Line : String;
begin
// ---------------------------ooo------------------------------ \\
// This function will look for the #13#10 Sequence and
// add it to the stringlist, if an item remains then it is
// returned and becomes the new buffer
// ---------------------------ooo------------------------------ \\
while Pos(#10, DataToParse) 0 do
begin
StartPos := Pos(#10, DataToParse);
Line := Copy(DataToParse, 1, StartPos);
Line := Trim(Line);
StringList.Add(Line);
Delete(DataToParse, 1, StartPos);
end;
Result := DataToParse;
end;
procedure TabLineStreamer.InternalBufferData;
var
NewBuffer: PChar;
DataRead: integer;
BufferData : array[0..BlockSize] of Char;
begin
// Step 1. Read the data from the stream
// Read The Data
DataRead := DataStream.Read(BufferData, SizeOf(BufferData));
// Allocate the new buffer
GetMem(NewBuffer, BlockSize + 1);
// Copy the New Data Into The Buffer
StrPLCopy(NewBuffer, BufferData, DataRead);
// Concat the buffers
Buffer := Buffer + NewBuffer;
// Return the buffer memory
FreeMem(NewBuffer);
// Step 2. Chop the data into a stringlist
BufferList.Clear;
Buffer := GetUsableLines(Buffer, BufferList);
// Step 3. Update the numbers
Inc(MaxLine, BufferList.Count);
CurrentBufferLine := 0;
end;
function TabLineStreamer.Position: Integer;
begin
Result := DataStream.Position;
end;
function TabLineStreamer.Size: Integer;
begin
Result := DataStream.Size;
end;
end.