Title: Speeding AnsiStrings tricks, and some code (1-Intro)
Question: How AnsiStrings work, some tricks and reusable code to reduce unnecessary reallocations.
Answer:
As you have probably seen in many places, Delphi's AnsiStrings (that is, the default string type in Win32 Delphi) are really pointers to a character array that is both reference-counted and #0-terminated. In this article I intend to explore some implications these characteristics bring to how you should code.
Note:
If you are not familiar with the inner workings of AnsiStrings, read first the small intro I added at the end. Please, I could use some comments telling me where my explanation is not clear enough?
While these tricks can be applied everywhere, some of them --marked with "(!)"-- may decrease the legibility of your code and be a potential source of new bugs. So they should be used mainly when developing your library of code snippets and (probably) components --where anything you write will be debugged once and reused a lot--, or when you detect a bottleneck after profiling.
Trick #1: Constant strings
Many times I've found something like this:
procedure MyProc( Str : string );
There are very few cases where it should be written this way. While in old days' ShortStrings a call to MyProc:
MyProc( s );
would imply copying the string to a temporary buffer --which was why usage of const was highly recommended--, even today something like this is needed (and implicitly generated by the compiler):
procedure MyProc( Str : string );
begin
_LStrAddRef( Str );
// User code...
_LStrClr( Str );
end;
This way, if inside MyProc some code tried to modify Str, it would detect that its RefCount is higher than one (that is, the string is shared), and would automatically create a copy (copy-on-write). So, you may think, what's the big deal? First, it's code running without need. Second, since MyProc could throw an exception, in reality the added code will be more like this:
procedure MyProc( Str : string );
begin
_LStrAddRef( Str );
try
// User code...
finally
_LStrClr( Str );
end;
end;
Which is not be as efficient as you could get by simply adding a const before Str.
Trick #2: Reallocations are bad!
Every time you create a new object, dynamic array or string, or change the length of one by concatenating or calling SetLength, the memory manager kicks in, doing a lot of stuff behind curtains, which beside just slowing your program, can rapidly lead to memory fragmentation.
With pre-2006 Delphi versions' default memory manager, reallocations could get you into trouble real fast.
One easy and important step to speed-up your application and reduce your memory footprint, is to reduce unnecessary memory reallocations.
Trick #3: Use FastMM!
Even after reducing memory reallocations, a better memory manager will make your life a lot easier. So, if you're still not using FastMM, don't wait any more, go get it! By installing its BorlndMM.dll version, your old Delphi will be noticeably faster. By using it within your apps, they might get faster too (if they allocate lots of memory, be it objects or strings, you should experience a speed-up.) You might even detect some new sneaky bugs in your code, specially with the debug build.
Trick #4: Don't concat!
You've probably seen something like this before:
function MyFunc( Count : integer ) : string;
var
i : integer;
begin
Result := ''; // BTW: Not needed, this code is automatically generated
for i := 1 to Count do
Result := Result + char( i );
end;
Behind the scenes, it becomes:
for i := 1 to Count do
_LStrCat( Result, _LStrFromChar( char(i) ) );
Now, if you take a look in System.pas, something like this should be happening inside LStrCat:
procedure LStrCat( var Dest; const Source );
if ( Dest = nil ) or ( lenDest = 0 )
then Dest := Source
else
begin
SetLength( Dest, lenDest + lenSource );
Move( Source, Dest+lenDest, lenSource );
end;
That is, the string is allocated Count times, its contents copied Count - 1 from its previous location, and then the new fragment (which, being a char, must first be converted to a string itself) is appended to the end of Dest. If you've been wondering why Delphi strings are so slow, now you know! ;)
We can fix it:
function MyFunc( Count : integer ) : string; // v2
var
i : integer;
begin
SetLength( Result, Count );
for i := 1 to Count do
Result[i] := char( i );
end;
And it becomes:
SetLength( s, Count );
for i := 1 to Count do
begin
_InternalUniqueString( s );
Result[i] := char( i );
end;
Which is still slower than should be, due to the call to UniqueString. For a possible solution when you don't know before-hand the final size of the string, see the unit attached at the end.
Trick #5: PChars are faster (!)
So, how can we get rid of UniqueString? You might try:
function MyFunc( Count : integer ) : string; // v3
var
i : integer;
begin
SetLength( Result, Count );
for i := 0 to Count - 1 do
pchar(Result)[i] := char( i + 1 ); // pchars go from 0 to Len - 1
end;
This will indeed remove the call to UniqueString (and any checks on whether you're shooting yourself on the foot), but now there is a call to LStrToPchar, which we can still remove from the loop:
function MyFunc( Count : integer ) : string; // v4
var
i : integer;
szResult : pchar;
begin
SetLength( Result, Count );
szResult := pchar( Result );
for i := 0 to Count - 1 do
szResult[i] := char( i + 1 );
end;
Trick #6: Absolute vars are a nice way to typecast (though they don't work in .NET) (!)
All LStrToPchar does is to check whether the string is empty, returning a pointer to a #0 in that case, or the same string otherwise (remember AnsiStrings are storage-compatible with PChars.) Since we know Result '', we can clean up the code, and delete the szResult variable:
function MyFunc( Count : integer ) : string; // v5
var
i : integer;
szResult : pchar absolute Result;
begin
SetLength( Result, Count );
for i := 0 to Count - 1 do
szResult[i] := char( i + 1 );
end;
Note that this works anywhere you would use a typecast, removing the (usually) cumbersome syntax from your code:
procedure myTest( Obj : TObject );
var
wEdit : TEdit absolute Obj;
wPanel : TPanel absolute Obj;
begin
// Crazy example:
if Obj is TEdit
then
begin
wEdit.Text := ...
wEdit.ReadOnly := ...
end
else
if Obj is TPanel
then
begin
wPanel.BevelInner := ...
wPanel.BevelOuter := ...
end
else ...
end;
Appendix: AnsiStrings Intro
Old Pascal strings (ShortString) are blocks of characters with a structure like this:
type
ShortString[MaxLen] =
packed record
strLen : byte; // Accessed as s[0], didn't use a union to keep things more clear
strChars : array[1..MaxLen] of char;
end;
So, every string had a fixed size (MaxLen+1), a string could not hold more that 255 characters, and every time you copied a string the whole block was copied from one string to the other.
With an AnsiString, the string is really a reference to the block of characters, the string has a reference count (two strings can point to the same block), the length is an integer (so the max length is 2GB), and every string ends in #0 (they can be casted to a pchar).
type
PRealAnsiString = ^TRealAnsiString;
TRealAnsiString =
packed record
strRefCount : integer;
strLen : integer;
strChars : array[0..strLen - 1] of char;
strSZ : char = #0; // The last one is always #0
end;
AnsiString = @TRealAnsiString.strChars;
Let's take a look to some basic operations:
s1 := 'A String'; // Now s1 points to "A String"#0, strRefCount = 1, strLen = 8
s2 := s1; // Now both s1 & s2 point to the same block, strRefCount = 2, strLen = 8, no copy was made
s2[1] := 'B'; // Now s2 points to a new block holding "B String"#0, strRefCount is 1 in both strings
// The compiler-generated hidden code calls InternalUniqueString, which checks
// if strRefCount 1, reserving a new block if needed, and copying the old contents
// to the new block. Then the normal code is executed and the first character is modified.
History:
January, 23/ Started tracking changes.
January, 23/ Changed "for i := 0 to Count - 1 do szResult[i-1] := char( i );" to "for i := 0 to Count - 1 do szResult[i] := char( i + 1 );" (Stupid mistake)