A story of optimization.
This article originally appeared in Delphi Developer
Copyright Pinnacle Publishing, Inc. All rights reserved.
Optimizing Object Pascal
This article examines optimizing Delphi code based on standard techniques, fully understanding the subject matter of the program and a case of sheer dumb luck. A poker library is used to illustrate these points.
This is a story about a simple poker library and optimizing code in Delphi. The poker library began as a Turbo Pascal project years ago, one that I had been meaning to convert to Delphi for some time. As it happened, Delphi 5 hit the shelves before I found time to make the conversion.
My first step was to redesign the library as an object-oriented project. The original code had been purely procedural. In its new format, the library provides three classes for programming poker games. These are, oddly enough - a card class, a deck class and a hand class. The deck and hand are descendants of a card list class. The library supports poker games, which use hands of five or more cards, has support for high and low variations and supports wild cards. It was this last feature that proved the most difficult. I ended up building a complete library with no wild card support and used this as the basis for the wild card version.
To validate the library, I built a series of nine test applications. These included programs to test the creation and shuffling of a deck of cards, to allow me to pick any five cards and evaluate them as a poker hand, a program that dealt and evaluated five-card hands of all standard type, a program that did the same for seven card hands, a video poker game, and four benchmarks to test the speed of the library with wild cards and without.
It took a couple of weeks to build the library and the process went smoothly. I was able to use a good portion of the old Pascal code. This proved to be both good and bad. Good, because it sped the conversion along. Bad, because I made a small mistake which I will explain later. In another couple of weeks the project was tested and complete except for one small problem. It was too slow to be of any use. Thus begins our adventure. The results of the first speed trials are listed in Table 1.
Benchmark
Hands per Second
Total Time
No Wild Cards
16,490
157.6
One Joker
3,453
830.9
Two Jokers
703
4,496.7
Deuces Wild
N/A
N/A
Table 1. Wild Card Library Benchmarks - Times are in seconds
Delphi 5 on a 400Mhz Pentium II with 128 megabytes of RAM
The benchmark programs are very simple. Each program creates all possible five-card hands in a deck of cards. It then calls the SetHighValues method of the hand object, which evaluates the hand and returns a high rating and a high name. The high rating is a long integer used as a bit field. The low order twenty bits contain the ranks of the cards in standard poker order. The high order eleven bits contain the hand type. (For a more detailed explanation, see the library source, which is available for download.) This allows a poker program to compare the values of two hands by comparing their high ratings. For example:
if Hand[1].HiRating > Hand[2].HiRating then
The benchmark program uses the high rating to determine the type of hand and keeps a count on all standard types and the total number of hands. This serves as further validation of the library routines. In a 52-card standard deck, there are 2,598,960 possible five-card hands, four royal flushes, 5,108 standard flushes etc. If the benchmark reports different numbers then we know there is a problem with the evaluation routines.
As you can see, each wild card used leads to roughly an 80% drop in speed. I didn’t bother running the deuces wild benchmark. The two jokers benchmark took roughly an hour and a quarter to complete. The deuces wild benchmark would have taken an estimated twenty-five hours to complete.
Making it Faster
The first step is to find what the problem areas are. The benchmarks themselves point to one area that needs work. The method used to rate wild card hands amounts to converting the wild card into all possible replacement cards and then calling the actual evaluation routines. It is clear that this could use some work.
To find other areas for improvement, I got out my trusty code profiler. I use StopWatch, part of Sleuth QA from TurboPower. There are others available and every programmer should have one. I ran the no wild cards benchmark under the profiler and waited for the report. Actually, I went to bed. Since the library is a wee bit slow, I ran the profile at night. The next morning, 273 million function calls later, the results were in. See Table 2.
Number of Calls
Procedure/Function
194,922,000 TcaaPCardList.GetCard
14,593,208 SwapCards
12,994,800 TcaaPokerHand.CopyCardFromDeck
7,787,360 IsStraight
5,197,916 IsStraightFlush
4,203,056 SortCardsLowToHigh
3,917,000 RankString
2,615,028 IsFlush
2,598,960 TcaaPokerHand.SetHighValues
2,598,960 SetHiValNoWC
Table 2. Ten most called routines in the poker library.
StopWatch returns a lot of information. There are timings on every routine called and the percentage of the total this amounts to. In this case, some of the most important information is in the table above.
The first thing of note, is that there are almost 195 million calls to the GetCard method. This is expected. Each hand evaluated by the SetHighValues method generates 25 calls to GetCard. Each call to CopyCardFromDeck generates ten more. The total is correct. So is the total for CopyCardFromDeck. Further, a quick glance at the listings indicates there isn’t much that can be changed in either of these routines. They are essentially just assignment statements.
It’s time to think like a poker player instead of a programmer. There are 2,598,960 possible five-card hands in a standard deck of cards. Forty of these are straight flushes and four of the straight flushes are royal flushes. There are 5,108 straights and 10,200 flushes in a standard deck. The numbers of calls to IsStraight, IsFlush and especially to IsStraightFlush are way out of line. Every hand is being tested three times to see if it is a straight. (Actually 2.996336996336996336996336996337 times, but why quibble?)
SetHiValNoWC is the workhorse of the poker library for evaluating high hands. It tests each hand passed to it and finds what type it is. Then it formats the hand, gives it a numeric rating and an appropriate name. To do this it calls the IsStraight function, and several others. It does this in a nice safe way, with the comparisons ordered from highest to lowest. That is where one of our problems lies.
Each hand is tested to see if is five of a kind. If it is, we are done. If not it is tested to see if it is a royal flush. This means testing to see if it is a straight, a flush, a straight flush and finally a royal. If it is a royal flush, it checks to see if it is a natural royal or if it uses wild cards. Either way, we are done. If it’s not a royal, it is then tested to see if it’s a straight flush, so the test for straights is repeated. See the problem? This is a bad algorithm.
Optimization 1
Our first improvement is to change the order of the tests. This time, instead of doing it the safe way, I will try to do it the clever way. There are likely to be many solutions. How many ways can ten tests be arranged? Ouch! Again, it is time to put on my poker hat.
Poker hands can be broken down into two distinct types. The first type contains two or more cards of the same rank. All pairs, two pair hands, trips, full houses, four of a kind and five of a kind hands fall into this category. In the second category are all hands with no cards of matching rank. These include royal flushes, straight flushes, straights, flushes and high card hands. (A high card hand is defined as five unrelated cards.)
This divides the problem into two smaller problems. In each case, we want to test a hand against a particular type only once and we would prefer to test for the rarer hands after the more common. Psuedo-code for a partial solution is in listing 1.
Listing 1. Pseudo-code for new hand testing order.
{test first group - hands with cards of same rank }
if isOnePair then
if isThreeOfAKind then
if isFourOfAKind then
if IsFiveOfAKind then
do five of a kind processing
else { not five, but four of a kind }
do four of a kind processing
end
end
else
if isFullHouse then
do full house processing
else { no, just trips }
do three of a kind processing
end { if isThreeOfAKind }
{ no trips, how about two pairs }
else
if isTwoPairs then
do two pair processing
end { if isTwoPairs }
else
do one pair processing
end { if isOnePair }
This looks like it should perform better but it has the problem of being a lot trickier to work with. After implementing this, I tested it and fixed the new bugs that were introduced. Using no wild cards, the library could now rate 25,800 hands per second. That’s a good improvement. However, it is still not good enough. Benchmarking for two wild cards would still take close to an hour - ten hours for four.
Optimization 2
Going back to the profile, I see that SwapCards is being called a lot. So we should check out its code and find the places from which it is being called. The code is plain vanilla. There is no way short of assembly language to make it faster. However, the sorting routine is calling SwapCards instead of doing its own swapping. The sort routine is listed below:
procedure SortCardsLowToHigh(var Hand: TcaaEvaluationHand);
var
leftCardPos, rightCardPos : integer;
begin
for leftCardPos := 2 to 5 do
for rightCardPos := 5 downto leftCardPos do
if Hand[rightCardPos-1].Rank >
Hand[rightCardPos].Rank then
SwapCards(Hand,rightCardPos-1,rightCardPos);
end;
Changing the code to do its own swapping will save a function call, each time a swap is made. The sort routine is itself called over four million times. That should give us some more speed. It may also be possible to improve the sort itself. I spent a little time trying other algorithms, but the theoretically faster sorts yielded no improvements and in many cases slowed things down. The reason for this is the length of the array combined with the overhead many of the other sorting methods.
Optimization 3 - Fixing a Mistake
It’s time to look at the IsFlush function and all of its siblings. The code is listed below:
function IsFlush(Hand: TcaaEvaluationHand): Boolean;
begin
Result := False;
if (Hand[1].Suit = Hand[2].Suit) and (Hand[1].Suit = Hand[3].Suit
and (Hand[1].Suit = Hand[4].Suit)
and (Hand[1].Suit = Hand[5].Suit) then Result := True;
end;
Nothing much to do here. Or is there? The Hand parameter is an array of records and it is being passed as a standard parameter - which means on the stack. Oops. Dumb mistake. This is code I copied form the original Pascal library. The only changes I made were assigning to the Result variable instead of the function name. I did something that wasn’t necessary instead of something that would have helped. Making Hand a const parameter will improve speed greatly.
I then went through the rest of the code to find any arrays, records or strings being passed to functions as standard parameters. Yes, I found a lot of them. With this fixed, the library can evaluate 38,900 hands per second using no wild cards. Using one joker gives a respectable 9,800 hands per second.
Optimization 4
I next looked at what the benchmark results are pointing to. Remember that each wild card is adding about an 80% speed penalty. The reasons for this were explained earlier. I waited to fix it because I hadn’t figured out what to do. Let's look at a small part of the code.
WildCards := NumWildCards(Hand);
case WildCards of
0 : { not interested in this }
1 : begin { One Wild Card }
for wc5 := caaClubAce to caaSpadeKing do
begin
TempHand := Hand;
TempHand[5].Rank := TcaaRank((wc5) mod 13);
TempHand[5].Suit := TcaaSuit((wc5) div 13);
Do I really need to test all fifty-two possible cards? It’s certainly the safest thing to do, as a first try. It works. But it sure is slow. Time to put on my poker hat again. All thirteen ranks must be tested. Do I have to test all four suits? No, I don’t. Here’s why. The only significance the suit has is in determining if the hand is some form of flush. The only way a suit can make a difference is if it matches the suit of the other cards. I can just check one of the (non-wild) cards and use only that suit. Here’s what the new code looks like.
WildCards := NumWildCards(Hand);
case WildCards of
0 : { not interested in this }
1 : begin { One Wild Card }
{ set start to start of suit in first card }
StartCardPos := Ord(Hand[1].Suit) * 13;
EndCardPos := StartCardPos + 12;
for wc5 := StartCardPos to EndCardPos do
begin
TempHand := Hand;
TempHand[5].Rank := TcaaRank((wc5) mod 13);
TempHand[5].Suit := TcaaSuit((wc5) div 13);
Now instead of testing fifty-two variants for each wild card I am only testing thirteen. This same modification needs to be made for two wild card and three wild card hands as well. (The routines for four and five wild card hands, don’t work the same way and don’t need to be changed.) The speed discrepancy between wild card and non wild card card card evaluations has been cut roughly in half. Using one Joker, the library now evaluates 22,000 hands per second and 12,000 hands per second with two jokers.
Optimization 5
The fifth optimization comes from the “I’d rather be lucky than good” school of programming. I was happy with the speed at this point, so I decided to test the library in other versions of Delphi. I started with Delphi 1. Since the string type in Delphi 1 is fixed and the default declaration is equivalent to a declaration of string[255], I created two new string types. One would hold the card name. The other would hold the hand name.
My experience with short strings has been that they are extremely fast. Inprise would probably like to phase the short string out. Multiple string types complicate implementing the language. Nevertheless, I often use them in my code and I saw no reason not to use them in all versions of the library. This would keep things simple. But, I wanted to test them. It turned out to be the single greatest “optimization” I could make. See Table 3.
Compiler
Time (Lower = Better)
Hands/Second (Higher = Better)
String Type
Delphi 1
204.6
12,705
short (what else?)
Delphi 2
22.2
117,001
short
Delphi 2
32.3
80,361
long
Delphi 3
49.9
52,112
short
Delphi 3
70.1
37,057
long
Delphi 4
54.0
48,111
short
Delphi 4
78.5
33,105
long
Delphi 5
23.5
110,759
shortM
Delphi 5
43.1
43,094
long
Table 3. Standard Poker Library Benchmark - All times are in Seconds. All tests were run on a 400Mhz Pentium II with 128 Megabytes of RAM.
The tests above were made against the standard version of the poker library. Similar speed differences exist in both. Short strings are faster using every version of Delphi. The most striking difference is in Delphi 5. Are short strings always faster? I am told that Delphi 5 converts short strings to the new dynamic strings before performing some operations. This should make short strings slower in those operations. In this case, strings are only used to provide names for cards and hands. This aids in debugging. However, the only manipulations being done are assignments and concatenations. See Table 4 for improvements using the wild card library.
Benchmark
Hands per Second
Total Time
No Wild Cards
97,161
26.7
One Joker
66,424
43.2
Two Jokers
41,719
75.8
Deuces Wild
14,016
185.4
Table 4. Wild Card Library Benchmark - Times are in seconds Delphi 5 on a 400Mhz Pentium II with 128 megabytes of RAM
What next? The library now appears fast enough. 14,016 hands per second on the slowest test should certainly be fast enough. We know it’s fast under Delphi 5. Let’s try it out under all versions and see what happens. Table 5 contains the results.
Delphi 1
Delphi 2
Delphi 3
Delphi 4
Delphi 5
No Wild Cards
12,526
108,920
44,267
40,274
97,161
One Joker
10,037
69,963
36,770
33,808
66,424
Two Jokers
7,336
43,095
27,846
25,995
41,719
Deuces Wild
2,976
13,590
11,766
11,440
14,016
Table 5. Wild Card Library Benchmark - Hands per second Delphi 5 on a 400Mhz Pentium II with 128 megabytes of RAM
A New Profile
As a final check, we’ll run the profiler again and see what we get. See Table 6.
Number of Calls
Procedure/Function
194,922,000
TcaaPCardList.GetCard
12,994,800
TcaaPokerHand.CopyCardFromDeck
4,203,056
SortCardsLowToHigh
3,917,000
RankString
2,598,960
MakeAcesLow
2,598,960
TcaaPokerHand.SetHighValues
2,598,960
SetHiValNoWC
1,876,344
SwapCards
1,125,312
MakeAcesHigh
1,098,240
SetCardOrderOnePair
Table 6. Ten most called routines after optimizations
This sure has changed. The good news is that none of the routines seem to be called more than they should. The better news is that it is probably time to stop. The library is fast enough. Could it be faster? Sure. There are several things that still could yield some extra speed. However, there doesn’t appear to be anything that will gain any huge increase.
From now on, small improvements will be the order of the day. I set up a new folder on my hard drive and copied the finished library there. This would serve as an experimental version, where ideas could be tested without breaking working code. Here are a few of the things I tried with the experimental version that have not made it to the production code.
Replacing the sorting routine with an “unrolled” bubble sort. This creates a long, ugly piece of code but gives a modest speed boost. About two to three percent.
Replacing the RankString function with an array of constants containing rank names. This also gives a modest increase in speed.
Commenting out the card name and hand name fields and associated functions. This yields a ten- percent speed improvement, at the expense of complicating debugging.
Adding {$MinEnumSize 4} compiler directive to 32 bit versions of library. This forces the compiler to make enumerated types 32 bit and adds a small speed increase.
Building hands in SetHighValues in sorted order. Bad idea. This slowed things down considerably.
Making the swap variable in the sorting routine global. This gives a very tiny speed increase.
Making the swap variable in the SwapCards routine global. This slowed the program down.
Replaced calls to hand naming routines with inline code with mixed results. Doing so for one and two pair hands sped things up. Adding the three of a kind routine slowed things down. (Puzzling!)
Finally, I removed the call to MakeAcesLow in the high hand evaluating routine. This adds a modest speed boost. Removing the code from the library altogether causes speed to fall. (Also puzzling! A problem with code location in memory?)
Source Code
Source code for the wild card library, a companion video poker library, a sample video poker game and all the utilities described in the article can be downloaded from http://charlesappel.home.mindspring.com/.