Title: Inline Assembler in Delphi (VII) - 128-bit integer arithmetic
Question: This article shows the use of inline assembler to wotk with 128-bit integers.
Answer:
Inline Assembler in Delphi (VII)
128-bit integer arithmetic
By Ernesto De Spirito edspirito@latiumsoftware.com
Part 1: Introduction
With 32 bits we can represent 2^32 different numbers, i.e. 4294967296 (~4 billion) different numbers, like signed integers from -2147483648 to +2147483647 or unsigned integers from 0 to 4294967295 (types Longint and Longword respectively).
That's enough for many purposes, like for example holding a position of a byte within a 4GB file, but sometimes we need more than that, and there we have TLargeInteger (Windows unit) and Int64 (since Delphi 4) to represent 64-bit integers that can have 2^64 different values, i.e. 18446744073709551616 (~18 sixtillons) values, from -9223372036854775808 to +9223372036854775807 (~9 sixtillons, 17-18 decimal digits).
That number of digits is really more than enough for me, and right now I really can't figure any practical use for more than that. Hey, not even Bill Gates counts his money in sixtillons! ;) But from time to time I see someone in a forum asking for more digits than what the Int64 offers...
Anyway, whether useful or completely useless for a practical purpose, we'll see the implementation of many procedures and functions designed to work with 128-bit integers, that will serve for the purpose of showing examples of the basic assembler instructions. These "large integers", "big integers" or "huge integers" can hold 2^128 different values (38-39 decimal digits).
Representation of the huge integer
I called the new type Hugeint, but for example Bigint (big integer) or Int128 could have been good names. Largeint (large integer) could be confused with the type in the Windows.pas unit which refers to a 64-bit integer.
When it comes to the representation of the new type, there also many ways to do it. I decided the most simple is representing it as an array of four 32-bit integers:
type
Hugeint: packed array [0..3] of longword;
I also decided to use the little-endian format since it's the standard in the Intel architecture, and this means that the first element of the array (lowest address) will hold the low-order (least-significant) 32 bits of the large integer, and the last element of the array (highest address) will hold the high-order (most-significant) 32 bits of the large integer.
This is how the numbers 5 and 5000000000 ($12A05F200) would be represented:
+---- Low-order 32 bits
|
v
+-------------+-------------+-------------+-------------+
| $00000005 | $00000000 | $00000000 | $00000000 | = 5
+-------------+-------------+-------------+-------------+
0 1 2 3
+-------------+-------------+-------------+-------------+
| $2A05F200 | $00000001 | $00000000 | $00000000 | = 5000000000
+-------------+-------------+-------------+-------------+ $12A05F200
^
|
High-order 32 bits ----+
Integers themselves are also stored in little-endian format (low-order byte first). If we see the byte representation of the numbers in a memory dump, it would look like this (byte values are represented in hexadecimal notation):
$00000005
+-------------+-------------+-------------+-------------+
| 05 00 00 00 | 00 00 00 00 | 00 00 00 00 | 00 00 00 00 | = 5
+-------------+-------------+-------------+-------------+
0 1 2 3
+-------------+-------------+-------------+-------------+
| 00 F2 05 2A | 01 00 00 00 | 00 00 00 00 | 00 00 00 00 | = 5000000000
+-------------+-------------+-------------+-------------+ $12A05F200
$2A05F200 $00000001
However, for almost all operations we can make abstraction of the byte order and consider the 32-bit integers as atomic units, since the byte order is handled transparently.
A few useful instructions
Before we begin, let's see some useful instructions that we might use in this article (mainly in the continuation of this part), but first allow me to say that it isn't the purpose of this article to actually teach you assembler. All I can do in this limited space is just showing you examples of some instructions. For reference material, I recommend you these links:
Intel 80386 Reference Programmer's Manual
An HTML version of this Intel manual. The pseudo-code helps explain the instructions and their effects on the flags. Excellent.
http://people.freebsd.org/~jhb/386htm/toc.htm
There are some broken links, but the pages are there. Try finding them in the directory index (http://people.freebsd.org/~jhb/386htm/).
iAPx86 - Norton Guide
Not as much explicative as the above document, but contains all the instructions from 8086 to Pentium and Pentium Pro, with size and timing information not included in the above document.
http://www.clipx.net/ng/iapx86/index.php
The IA-32 Intel Architecture Software Developer's Manual, Volume 2:
Instruction Set Reference
PDF Manual describing the instructions for the IA-32 processors (Pentium, Pentium Pro, Pentium II, Pentium III, Pentium 4, and Xeon). Includes pseudo-code to explain the instructions and how they affect the flags in the flags register.
http://www.intel.com/design/pentium4/manuals/245471.htm
Optimization
How to optimize for the Pentium family of microprocessors
Excellent optimization guide written by Agner Fog
http://fatphil.org/x86/pentopt/index.html
Optimizations for Intel's 32-Bit Processors
Another excellent optimization guide.
http://x86.ddj.com/ftp/manuals/686/optimgd.pdf
OK, now let's get to the instructions.
Reference:
Z/ZF: Zero Flag
S/SF: Sign Flag
C/CF: Carry Flag
P/PF: Parity Flag
A/AF: Auxiliary Flag
s: sign bit (high-order bit)
o: odd bit (low-order bit)
x: bit value
0: namely, the value 0
1: namely, the value 1
r: bit is reversed from previous value
u: bit is unchanged from previous value
XX: unknown value (register, immediate, or memory reference)
In the examples it should be assumed the value of AL previous to each operation is sxxxxxxo (sign bit, 6 unknown bits, and odd bit).
Here are some instructions to begin:
SHL al,1 AL := xxxxxxo0 CF := s Shift left
SAL al,1 AL := xxxxxxo0 CF := s Synonym for SHL
SHR al,1 AL := 0sxxxxxx CF := o Shift right
SAR al,1 AL := ssxxxxxx CF := o Shift arithmetic right
SAR al,7 AL := ssssssss CF := x This extends the sign bit
ROL al,1 AL := xxxxxxos CF := s Rotate left
ROR al,1 AL := osxxxxxx CF := o Rotate right
RCL al,1 AL := xxxxxxoC CF := s Rotate thru carry left
RCR al,1 AL := Csxxxxxx CF := o Rotate thru carry right
AND al,al AL := uuuuuuuu CF := 0 Sets flags (see below)
AND al,-1 AL := uuuuuuuu CF := 0 -1 = $FF = 1111111
Sets flags (see below)
AND al,$01 AL := 0000000u CF := 0 $01 = 00000001
AND al,$80 AL := u0000000 CF := 0 $80 = 10000000
AND al,$5A AL := 0u0uu0u0 CF := 0 $5A = 01011010
AND al,0 AL := 00000000 CF := 0 XOR AL,AL or MOV AL,0 are better
TEST AL,XX AL := uuuuuuuu
TEST is like AND, but the result doesn't get stored in the
destination. The result is used to set flags (see below).
TEST AL,-1 It's usually better than AND AL,-1 and OR AL,AL because it
doesn't write to AL, which allows certain optimizations in
some cases.
OR al,al AL := uuuuuuuu CF := 0 Sets flags (see below)
OR al,$01 AL := uuuuuuu1 CF := 0 $01 = 00000001
OR al,$80 AL := 1uuuuuuu CF := 0 $80 = 10000000
OR al,$5A AL := u1u11u1u CF := 0 $5A = 01011010
OR al,-1 AL := 11111111 CF := 0 Same as MOV AL,1
XOR al,al AL := 0 CF := 0 Use MOV AL,0 to preserve flags
XOR al,$5A AL := ururruru CF := 0 $5A = 01011010
XOR al,-1 AL := rrrrrrrr CF := 0 Same as NOT AL
Except for the rotation instructions (ROL, RCL, ROR, and RCR) all of the above set SF, ZF and PF based on the result of the operation:
SF = value of the high-order bit of the result
ZF = 1 ("set") if the result is zero, 0 ("cleared") otherwise
PF = 1 ("set") if the low-order byte of the result contains an even
number of 1 bits, 0 ("cleared") otherwise
Let's see more instructions:
STC CF := 1 Set Carry Flag
CLC CF := 0 Clear Carry Flag
CMC CF := r Complement Carry Flag
LAHF AH := SZxAxPxC
SAHF Assuming AH is SZxAxPxC:
ZF := S; ZF := Z; AF := A; PF := P; CF := C
SETc AL AL := CF Set if carry
SETs AL AL := SF Set if sign
SETz AL AL := ZF Set if zero
SETe AL AL := ZF Set if equal (synonym of SETZ)
SETp AL AL := PF Set if parity
SETpe AL AL := PF Set if parity even (synonym of SETP)
SETo AL AL := OF Set if overflow
SETnc AL AL := NOT CF Set if not carry
SETns AL AL := NOT SF Set if not sign
SETnz AL AL := NOT ZF Set if not zero
SETne AL AL := NOT ZF Set if not equal (synonym of SETNZ)
SETnp AL AL := NOT PF Set if not parity
SETpo AL AL := NOT PF Set if parity odd (synonym of SETNP)
SETno AL AL := NOT OF Set if not overflow
SETa (or SETNbe), SETae (or SETnb), SETb (or SETnae), SETbe (SETna),
SETg (or SETNle), SETge (or SETnl), SETl (or SETnge), and SETle
(SETng) set the destination byte to 1 or 0 depending on whether the
specified condition is met or not.
ADD AL,XX AL := AL+XX CF := 1 if operation generated a carry
0 otherwise
SUB AL,XX AL := AL-XX CF := 1 if operation needed a borrow
0 otherwise
SUB AL,0 AL := uuuuuuuu Set flags based on AL
SUB AL,AL AL := 0 Same as XOR AL,AL or MOV AL,0
CMP AL,XX CMP is like SUB, but the result doesn't get stored in the
destination. The operation simply set the flags
ADC AL,XX AL := AL+XX+C CF := 1 if operation generated a carry
0 otherwise
SBB AL,XX AL := AL-C-XX CF := 1 if operation needed a borrow
0 otherwise
NEG AL AL := -AL CF := 1 if previous AL 0
NOT AL; INC AL is the same
NOT AL AL := rrrrrrrr CF := u XOR AL,-1 is the same
Conversion functions
These functions will help us understand the representation of these huge integers.
Longword to Hugeint
Let's start by converting a Longword into a huge integer. The lowest 32 bits of the result will be the 32 bits of the parameter and the higher 96 bits will be zero.
function UToHugeint(const x: Longword): Hugeint; overload;
// Result := Hugeint(x);
// Parameters: EAX = x; EDX = @Result;
asm
xor ecx, ecx // ECX := 0;
mov [edx+_0_], eax // Result[0] := x;
mov [edx+_1_], ecx // Result[1] := 0;
mov [edx+_2_], ecx // Result[2] := 0;
mov [edx+_3_], ecx // Result[3] := 0;
end;
Comments:
* "_0_", "_1_", "_2_", and "_3_"? What are these?
They are constants that represent the offsets of the four elements of
the array, allowing us to write cleaner code.
const
_0_ = 0;
_1_ = 4;
_2_ = 8;
_3_ = 12;
Longint to Hugeint
The lowest 32 bits of the result will be the 32 bits of the parameter. If the number is positive or zero, then the higher 96 bits will be 0, but if the number is negative, the higher 96 bits will be 1.
It might seem like we need to make a comparison or test the sign and then to perform a conditional jump based on the result:
function ToHugeint(const x: Longint): Hugeint; overload;
// Result := Hugeint(x);
// Parameters: EAX = x; EDX = @Result;
asm
or eax, eax // EAX := EAX or EAX; // EAX remains unchanged
// Side effect: SF (Sign Flag) := EAX mov ecx, 0 // ECX := 0;
jns @@not_negative // if not SF then goto @@not_negative;
dec ecx // ECX := ECX - 1; // 0 - 1 = -1 = $FFFFFFFF
@@not_negative:
mov [edx+_0_], eax // Result[0] := x;
mov [edx+_1_], ecx // Result[1] := ECX; // 0 or $FFFFFFFF
mov [edx+_2_], ecx // Result[2] := ECX; // 0 or $FFFFFFFF
mov [edx+_3_], ecx // Result[3] := ECX; // 0 or $FFFFFFFF
end;
Comments:
Notice the use of "MOV ECX, 0" instead of "XOR ECX, ECX" to avoid changing the state of the Sign Flag (SF) set in the preceding instruction (OR) and then used in the conditional jump that appears in the following instruction (JNS). Of course we could have changed the order of the operations for this to be unnecessary...
Instead of:
or eax, eax
jns @@not_negative
the following pairs of instructions would achieve the same:
* and eax, eax // EAX keeps the value, but SF gets the sign
jns @@not_negative // if SF = 0 then goto @@not_negative
* test eax, $80000000 // result will be zero only if sign bit is 0
jz @@not_negative // if ZF then goto @@not_negative
* test eax, $87654321 // any value with bit 31 (sign bit) set
jns @@not_negative // if SF = 0 then goto @@not_negative
* cmp eax, 0 // compares eax with 0
jge @@not_negative // if greater or equal then goto @@not_negative
Notice the use of "DEC ECX" to turn the value of ECX from $00000000 to $FFFFFFFF (by decrementing the value of the register). "NOT ECX" would have accomplished the same thing (by inverting the bits), at the same speed, and taking the same number of bytes to code the instruction, but NOT isn't a pairable instruction while DEC is. For this reason NOT is usually avoided and substituted as follows:
If you know beforehand that the previous value is 0, use DEC Dest
If you know beforehand that the previous value is 1, use INC Dest
If you don't know what the previous value is, use XOR Dest, -1
Also notice in the order of the instructions that we never used a register that was set in the immediately previous instruction. This is one of the conditions for pairing to occur. You'll find more information about instruction pairing in the documents about optimization that we recommended above.
We can simplify the function thanks to the CDQ instruction which extends the sign of EAX into EDX. This is basically how CDQ works:
if EAX = 0 then
EDX := $0
else
EDX := $FFFFFFFF;
Here's a smaller and simpler implementation using CDQ:
function ToHugeint(const x: Longint): Hugeint; overload;
// Result := Hugeint(x);
// Parameters: EAX = x; EDX = @Result;
asm
mov ecx, edx // ECX := @Result;
cdq // EDX := IIF(x=0, 0, $FFFFFFFF);
mov [ecx+_0_], eax // Result[0] := x;
mov [ecx+_1_], edx // Result[1] := EDX; // 0 or $FFFFFFFF
mov [ecx+_2_], edx // Result[2] := EDX; // 0 or $FFFFFFFF
mov [ecx+_3_], edx // Result[3] := EDX; // 0 or $FFFFFFFF
end;
CDQ is usually replaced using MOV and SAR, which offer the advantage that the source doesn't have to be EAX and the destination doesn't have to be in EDX (plus they are pairable instructions). Let's see an example:
function ToHugeint(const x: Longint): Hugeint; overload;
// Result := Hugeint(x);
// Parameters: EAX = x; EDX = @Result;
asm
mov ecx, eax // ECX := x;
sar ecx, 31 // ECX := IIF(x=0, 0, $FFFFFFFF);
mov [edx+_0_], eax // Result[0] := x;
mov [edx+_1_], ecx // Result[1] := EDX; // 0 or $FFFFFFFF
mov [edx+_2_], ecx // Result[2] := EDX; // 0 or $FFFFFFFF
mov [edx+_3_], ecx // Result[3] := EDX; // 0 or $FFFFFFFF
end;
Hugeint to Longint
A Hugeint can be converted to a Longint by simply taking the low-order 32 bits. The high-order 96 digits of the Hugeint should be all 0 or all 1 matching the sign bit of would be the result (bit 31) for the Hugeint value to be in the range of a Longint, but the function doesn't check for that and performs the conversion blindly (in the same way that a Longint is converted to a Shortint, for example).
function ToLongint(const x: Hugeint): Longint; overload;
// Result := Longint(x);
// No exception is raised if the value is not within
// range (high-order 96 bits are discarded).
// Parameters: EAX = @x;
asm
mov eax, [eax+_0_] // Result := x[0];
end;
Int64 to Hugeint
Int64 parameters are passed on the stack, so functions with an Int64 parameter will automatically create a stack frame. The lowest 64 bits of the result will be the 64 bits of the parameter, and the higher 64 bits of the result will extend the sign bit of the high-order integer that makes up the int64 value.
{$IFDEF DELPHI4}
function ToHugeint(const x: Int64): Hugeint; overload;
// Result := Hugeint(x);
// Parameters: x on the stack; EAX = @Result;
asm
mov edx, dword[x+_0_] // EDX := x[0];
mov ecx, dword[x+_1_] // ECX := x[1];
mov [eax+_0_], edx // Result[0] := x[0];
mov [eax+_1_], ecx // Result[1] := x[1];
sar ecx, 31 // ECX := IIF(x[1]=0, 0, $FFFFFFFF);
mov [eax+_2_], ecx // Result[2] := ECX; // 0 or $FFFFFFFF
mov [eax+_3_], ecx // Result[3] := ECX; // 0 or $FFFFFFFF
end;
{$ENDIF}
Int64 values are stored in little-endian format, so the low-order integer is the first, at offset 0 from the base address of the variable, and the high-order integer is the second, at offset 4 from the base address of the variable. In this case the base address of the variable is EBP+8 (see the first chapter of this series of articles), so the first element is at EBP+8 (EBP+8+0), and the second element is at EBP+12 (EBP+8+4). I could have used EBP+8 and EBP+12 to address the elements, but "x+_0_" and "x+_1_" refer to these addresses more transparently. The "DWORD" size specifier is mandatory since the assembler takes "x+_0_" and "x+_1_" as pointers to 64-bit data (because "x" is considered a pointer to 64-bit data) and doesn't allow to move the referenced value to a 32-bit register.
Hugeint to Int64
A Hugeint can be converted to an Int64 by simply taking the low-order 64 bits. The high-order 64 digits of the Hugeint should be all 0 or all 1 matching the sign bit of would be the result (bit 31) for the Hugeint value to be in the range of an Int64, but the function doesn't check for that and performs the conversion blindly:
{$IFDEF DELPHI4}
function ToInt64(const x: Hugeint): Int64; overload;
// Result := Int64(x)
// No exception is raised if the value is not within
// range (high-order 64 bits are discarded).
// Parameters: EAX = @x;
asm
mov edx, [eax+_1_] // EDX := x[1];
mov eax, [eax+_0_] // EAX := x[0];
// Result = EDX:EAX = x[1]:x[0]
end;
{$ENDIF}
Comment:
Int64 return values should be placed in the EDX (high-order 32 bits) and EAX (low-order 32 bits).
More assembler instructions
In the source code example (attached) you'll find the implementation of some functions to operate with the Hugeint data type. The purpose is to exemplify the instructions we've seen so far along with some new ones:
BT (Bit Test):
BT dword ptr [eax], edx -- CF = value of the EDXth bit in the
memory pointed by EAX
BTS (Bit Test and Set):
BTS dword ptr [eax], edx -- sets to 1 the EDXth bit in the memory
pointed by EAX
CF = previous value of that bit
BTR (Bit Test and Reset):
BTR dword ptr [eax], edx -- sets to 0 the EDXth bit in the memory
pointed by EAX
CF = previous value of that bit
BTC (Bit Test and Complement):
BTC dword ptr [eax], edx -- toggles the value of the EDXth bit in
the memory pointed by EAX
CF = previous value of that bit
We won't reproduce the functions here since you can find them in the source code attached, but we'll show different possible implementations of the function _IsNeg, simply to provide more examples of the instructions we've seen so far:
function _IsNeg(x: Hugeint): boolean;
// Result := x // Parameters: EAX = @x
asm
mov eax, [eax+_3_] // EAX := High order 32 bits of x
shr eax, 31 // AL := High order bit of EAX (sign bit)
end;
function _IsNeg(x: Hugeint): boolean;
asm
cmp dword ptr [eax+_3_], 0 // if x[3] jl @@negative // goto @@negative
mov al, 0 // Result := False;
ret // exit;
@@negative: // @@negative:
mov al, 1 // Result := True;
end;
function _IsNeg(x: Hugeint): boolean;
asm
// set the Sign Flag and then put it in AL
mov eax, [eax+_3_] // EAX := High order 32 bits of x
or eax, eax // SF := Sign bit of EAX
// alt.: add eax, 0
// also: sub eax, 0
// also: and eax, eax
// also: and eax, -1 // or any negative value
// also: test eax, eax
// also: test eax, -1 // or any negative value
sets al // AL := SF; // Sign Flag
// alt.: lahf; shr ax, 31
// also: lahf; rol ax, 1; and al, $1
end;
function _IsNeg(x: Hugeint): boolean;
asm
// set the Carry Flag with the Sign Bit to put it in AL
mov eax, [eax+_3_] // EAX := High order 32 bits of x
bt eax, 31 // CF := Sign bit of EAX
// alt.: shl/rol/rcl eax, 1
setc al // AL := CF; // Carry Flag
// alt.: mov al, 0; rcl, 1
// also: mov al, 0; adc al, al
// also: lahf; mov al, ah; and al, $1
// also: lahf; ror/rcr/shr/sar ax, 1; shr al, 7
// also: lahf; ror/shr/sar ax, 8; and al, $1
// also: lahf; rol ax, 8; and al, $1
// also: lahf; rcl ax, 9; and al, $1
end;
function _IsNeg(x: Hugeint): boolean;
asm
// set the Parity Flag and then negate it in AL
mov al, [eax+_3_+3] // EAX := High order 8 bits of x
or al, $7F // PF := Not Sign bit
// alt.: and eax, $80000000
setnp al // AL := Not PF; // Not Parity Flag
// alt.: lahf; rol/shl ax, 6 / rcl ax, 7;
xor al,-1 / not al; and al, $1;
// also: lahf; ror/shr/sar ax, 10 / rcr ax, 11;
xor al,-1 / not al; and al, $1;
end;
In the next part we'll see functions to add, subtract, multiply and divide huge integers.
Part 2: The four fundamental operations
In this second and last part we'll finally get to see the actual arithmetics, with the four fundamental operations (addition, subtraction, multiplication and division).
Before getting into them I'd like to say that the procedures and functions introduced in the preceeding two parts have been corrected and also further optimized. I still haven't been able to test them as much as I'd have liked to. If you find any bugs or have any comments about the source code, please drop me an email.
Addition
How do we add two numbers, each made up of four 32-bit integers? Well, it's actually pretty easy. We simply add them in the same way that we would add two numbers of four decimal digits (like 3597 and 0015 for instance), except that here each "digit" can have about 4 billion different (2^32) values instead of just ten. The algorithm would be like this:
function AddWithCarry(x: Longint; y: Longint;
var Carry: Boolean): Longint; forward;
function HugeAdd(x: Hugeint; y: Hugeint): Hugeint;
// Result := x + y;
var
Carry: Boolean;
begin
Carry := False;
Result[0] := AddWithCarry(x[0], y[0], Carry);
Result[1] := AddWithCarry(x[1], y[1], Carry);
Result[2] := AddWithCarry(x[2], y[2], Carry);
Result[3] := AddWithCarry(x[3], y[3], Carry);
end;
AddWithCarry is a fictitious function which returns an integer with the low order 32 bits of the result of the addition of the two arguments, plus 1 if Carry (the third argument) is True. It also stores True or False to the Carry (passed by reference) depending on whether the addition generated a carry or not (or whether the carry is 1 or 0, if you want to see it that way). Actually, this function doesn't have to be fictitious:
function AddWithCarry(x: Longint; y: Longint;
var Carry: Boolean): integer;
asm
// if Carry then CF := 1 else CF := 0;
test byte ptr [ecx], -1 // Side-effect: CF := 0;
jz @@NoCarry
stc // CF := 1;
@@NoCarry:
// Result := x + y + CF; CF := GeneratedCarry;
adc eax, edx
// Carry := CF;
setc byte ptr [ecx]
end;
It would be more efficient to have HugeAdd coded entirely in assembler:
function HugeAdd(x: Hugeint; y: Hugeint): Hugeint;
// Result := x + y;
// Parameters: EAX = @x; EDX = @y; ECX = @Result
asm
push esi
mov esi, [eax+_0_] // ESI := x[0];
add esi, [edx+_0_] // ESI := ESI + y[0];
mov [ecx+_0_], esi // Result[0] := ESI;
mov esi, [eax+_1_] // ESI := x[1];
adc esi, [edx+_1_] // ESI := ESI + y[1] + Carry;
mov [ecx+_1_], esi // Result[1] := ESI;
mov esi, [eax+_2_] // ESI := x[2];
adc esi, [edx+_2_] // ESI := ESI + y[2] + Carry;
mov [ecx+_2_], esi // Result[2] := ESI;
mov esi, [eax+_3_] // ESI := x[3];
adc esi, [edx+_3_] // ESI := ESI + y[3] + Carry;
mov [ecx+_3_], esi // Result[3] := ESI;
pop esi
end;
Subtraction
Subtraction works very much like addition, but instead of generating a carry, the operation generates a borrow (also represented by the Carry Flag) if the minuend (first operand) is less than the subtrahend (second operand):
function SubtractWithBorrow(x: Longint; y: Longint;
var Borrow: Boolean): Longint; forward;
function HugeSub(x: Hugeint; y: Hugeint): Hugeint;
// Result := x - y;
var
Borrow: Boolean;
begin
Borrow := False;
Result[0] := SubtractWithBorrow(x[0], y[0], Borrow);
Result[1] := SubtractWithBorrow(x[1], y[1], Borrow);
Result[2] := SubtractWithBorrow(x[2], y[2], Borrow);
Result[3] := SubtractWithBorrow(x[3], y[3], Borrow);
end;
function SubtractWithBorrow(x: Longint; y: Longint;
var Borrow: Boolean): Longint;
asm
// if Borrow then CF := 1 else CF := 0;
test byte ptr [ecx], -1 // Side-effect: CF := 0;
jz @@NoBorrow
stc // CF := 1;
@@NoBorrow:
// Result := x - y - CF; CF := NeededBorrow;
sbb eax, edx
// Borrow := CF;
setc byte ptr [ecx]
end;
You should be ready to write a pure assembler version of HugeSub, since it's the same as HugeAdd, but all you have to do is replace ADD and ADC with SUB and SBB respectively.
Opposite number
Given a number, these implementations of HugeNeg return it's opposite number (two's complement):
function HugeNeg(x: Hugeint): Hugeint;
begin
// Result := (Not x) + 1;
Result := HugeAdd(HugeNot(x), IntToHuge(1));
end;
function HugeNeg(x: Hugeint): Hugeint;
begin
// Result := 0 - x;
Result := HugeSub(IntToHuge(0), x);
end;
The second one is the simplest and fastest because it involves a single operation, and now that we know how to subtract, we can implement it in assembler:
function HugeNeg(x: Hugeint): Hugeint;
// Result := -x;
// Parameters: EAX = @x; EDX = @Result
asm
// Result := 0 - x;
push esi
xor esi, esi
mov ecx, [eax+_0_] // x[0]
sub esi, ecx // 0 - x[0]
mov ecx, 0
mov [edx+_0_], esi // Result[0]
mov esi, [eax+_1_] // x[1]
sbb ecx, esi // 0 - x[1] - Borrow
mov esi, 0
mov [edx+_1_], ecx // Result[1]
mov ecx, [eax+_2_] // x[2]
sbb esi, ecx // 0 - x[2] - Borrow
mov ecx, 0
mov [edx+_2_], esi // Result[2]
mov esi, [eax+_3_] // x[3]
sbb ecx, esi // 0 - x[3] - Borrow
mov [edx+_3_], ecx // Result[3]
pop esi
end;
Multiplication
A way of multiplying numbers is by means of an addition loop:
function HugeMul(x: Hugeint; y: Hugeint): Hugeint;
begin
SetZero(Result);
while not HugeIsZero(y) do begin
Result := HugeAdd(Result, x);
HugeSub(y, 1)
end;
end;
Computationally speaking, this algorithm is quite poor. For example, if the value of "y" was 4 million, the loop would repeat 4 million times! Anyway, the idea would still good if we could somehow accelerate the process. Let's play a little bit with algebra:
x * y = x * (y[3]*2^96 + y[2]*2^64 + y[1]*2^32 + y[0]*2^0)
= (x*y[3])*2^96 + (x*y[2])*2^64 + (x*y[1])*2^32 + (x*y[0])*2^0
Now we have reduced the problem of multiplying two Hugeint numbers to multiplying a Hugeint number by a 32-bit integer. We multiply the first operand by the four integers that make up the second operand and then we shift the partial results by 0, 32, 64, and 96 bits (to multiply them by 2^0, 2^32, 2^64 and 2^96), and finally we add these values to get the final result.
function HugeMulInt(x: Hugeint; y: Longint): Hugeint; forward;
function HugeMul(x: Hugeint; y: Hugeint): Hugeint;
begin
Result := HugeShl(HugeMulInt(x, y[3]), 96)
+ HugeShl(HugeMulInt(x, y[2]), 64)
+ HugeShl(HugeMulInt(x, y[1]), 32)
+ HugeMulInt(x, y[0]);
end;
This is exactly the way we multiply decimal numbers when performing caculations on a paper, except that here the base is 2^32 instead of ten. Let's see now how we can a multiply a Hugeint by an integer:
function MultiplyWithCarry(x: Longint; y: Longint;
var Carry: Longint): Longint; forward;
function HugeMulInt(x: Hugeint; y: Longint): Hugeint;
// Result := x * y;
var
Carry: Longint;
begin
Carry := 0;
Result[0] := MultiplyWithCarry(x[0], y, Carry);
Result[1] := MultiplyWithCarry(x[1], y, Carry);
Result[2] := MultiplyWithCarry(x[2], y, Carry);
Result[3] := MultiplyWithCarry(x[3], y, Carry);
end;
function MultiplyWithCarry(x: Longint; y: Longint;
var Carry: Longint): integer;
// Result := LoDWord(x * y + Carry);
// Carry := HiDWord(x * y + Carry);
asm
// EDX:EAX := EAX * EDX; // x * y
mul edx
// Inc(EDX:EAX, Carry);
add eax, [ecx]
adc edx, 0
// Carry := EDX; // High order 32 bits of the result
mov [ecx], edx;
end;
MultiplyWithCarry is very much like AddWithCarry, but it performs a multiplication instead of an addition, and it generates a carry of 32 bits instead of just one bit (the multiplication of two 32-bit values generates a 64-bit result, while the addition of two 32-bit values can generate a 33-bit result).
MultiplyWithCarry first performs an unsigned multiplication of "x" (EAX) by "y" (EDX), using the MUL opcode. The result is a 64-bit unsigned integer in EDX:EAX, to which the function adds the Carry passed by parameter. The function returns the lower 32 bits of this final result (located EAX), and the higher 32 bits (EDX) constitute the carry for the next multiplication, which are stored in the Carry parameter (passed by reference).
An assembler implementation of HugeMul and HugeMulInt can be found in the source code attached. For reasons of simplicity, in the examples above the functions consider the numbers are unsigned, but the assembler implementations consider signed numbers. Also, the attached version of HugeMul doesn't call HugeMulInt or HugeShl, and is highly optimized.
Instead of considering a Huge integer as four 32-bit integers multiplied by four powers of 2^32, we consider them as 128 1-bit integers multiplied by 128 powers of 2:
bit127 * 2^127 + bit126 * 2^126 + ... + bit1 * 2^1 + bit0 * 2^0
Since each bit can only be 0 or 1, the algorithm shown above can be greatly simplified:
function HugeMul(x: Hugeint; y: Hugeint): Hugeint;
// Result := x * y;
var
i: Longint;
begin
SetZero(Result);
for i := 0 to 127 do
if BitTest(y, i) then
Result := HugeAdd(Result, HugeShl(x, i));
end;
The idea is to add different powers of 2 of "x", depending those powers on the bits set on "y". For example, if "y" was 20, bits 5 and 3 would be on (20 in decimal is 10100 in binary), so only two additions would be performed, and the result would be HugeShl(x, 3) plus HugeShl(x, 5).
This algorithm can be coded quite efficiently in assembler, but still the first algorithm will work faster. The reason why I've shown this is because it'll make it easier to understand the algorithm we'll use for divisions.
Division
Let's first see the case of a division of a Hugeint by a 32-bit integer, which should be easy to understand:
function DivideWithRemainder(x: Longint; y: Longint;
var Remainder: Longint): Longint; forward;
function HugeDivInt(x: Hugeint; y: Longint): Hugeint;
// Result := x div y;
var
Remainder: Longint;
begin
Remainder := 0;
Result[0] := DivideWithRemainder(x[3], y, Remainder);
Result[1] := DivideWithRemainder(x[2], y, Remainder);
Result[2] := DivideWithRemainder(x[1], y, Remainder);
Result[3] := DivideWithRemainder(x[0], y, Remainder);
asm
mov edx, Remainder
end;
end;
function DivideWithRemainder(x: Longint; y: Longint;
var Remainder: Longint): Longint;
// Result := Remainder:x div y;
// Remainder := Remainder:x mod y;
asm
push esi
mov esi, edx // y
mov edx, [ecx] // Remainder
// EAX := EDX:EAX div ESI;
// EDX := EDX:EAX mod ESI;
div esi
// Remainder := EDX;
mov [ecx], edx;
pop esi
end;
HugeDivInt leaves the remainder of the division in EDX, so it can be used in a function returning the remainder of the division:
function HugeModInt(dividend: Hugeint; divisor: Longint): Longint;
// Result := dividend mod divisor;
// Parameters: EAX = @dividend; EDX = @divisor;
asm
sub esp, TYPE(Hugeint) // Make place on the stack for a Hugeint
mov ecx, esp // to hold the result of the division
call HugeDivInt // Perform the division
add esp, TYPE(Hugeint) // Restore the stack pointer
mov eax, edx // Result := Remainder; // was left in EDX
end;
For the case of two huge integers we can think of an algorithm like the one we would use to divide two numbers of four digits with paper and pencil, but it turns to be quite complex, plus it isn't actually very fast since it implies divisions, multiplications, and substractions, and sometimes you take one step forwards and two steps back. Is there another possible algorithm? Yes, there is:
function HugeDiv(dividend: Hugeint; divisor: Hugeint): Hugeint;
// Result := dividend div divisor;
begin
if HugeIsZero(divisor) then
raise EDivByZero.CreateRes(@sDivByZero);
Result := 0;
while HugeCmp(dividend, divisor) = 0 do begin
dividend := HugeSub(dividend, divisor);
Result := HugeAdd(Result, IntToHuge(1));
end;
end;
Of course, this algorithm turns out to be awfully slow (if we divide 12 million by 3, the loop would execute 4 million times), but we can speed things up if we subtract from the dividend the divisor multiplied by different powers of 2, from higher to lower, setting the corresponding bit of the result every time we perform a subtraction (the bit in the position of the power of 2 that was used). It's the inverse of what we did in the case of a multiplication shown above. The division process would then be reduced to just 128 subtractions at most.
In the following example, the dividend is 20 (10100 in binary) and the divider is 3 (11 in binary):
10100 - 11 * 2^2 = 10100 - 1100 = 1000 Result := 100
1000 - 11 * 2^1 = 1000 - 110 = 10 Result := 110
Initially, 11 * 2^2 is the highest value that is less or equal to the dividend, so we subtract that value from the dividend and we set bit 2 of the result because we subtracted the divisor multiplied by two to the power of 2. So far, the remainder is 8 (1000 in binary), and 11 * 2^1 is the highest value that is less than or equal to this remainder, so we subtract that value from the remainder, and we set bit 1 of the result because we subtracted the divisor multiplied by two to the power of 1. The remainder is 2 (10 in binary), and since the divisor is greater than that value, division stops there. The remainder of the operation would then be 2 (10 in binary) and since bits 2 and 1 of the result were set, the result is 110 in binary, i.e. 6 in decimal.
function HugeDiv(dividend: Hugeint; divisor: Hugeint): Hugeint;
var
_r_: Hugeint; // remainder
_d_: Hugeint; // divisor
_q_: Hugeint; // quotient
BitPosR, BitPosD, count: integer;
begin
_r_ := dividend;
_d_ := divisor;
HugeSetZero(_q_);
BitPosD := HugeBitScanReverse(_d_);
if BitPosD = -1 then RaiseDivByZero;
BitPosR := HugeBitScanReverse(_r_);
count := BitPosD - BitPosR;
if count 0 then
_d_ := HugeShl(_d_, count);
repeat
if HugeCmp(_d_, _r_) _r_ := HugeSub(_r_, _d_);
HugeBitSet(_q_, count);
end;
_d_ := HugeShr(_d_, 1);
dec(count);
until count Result := _q_;
asm
lea edx, _r_
end;
end;
HugeBitScanReverse is a function that returns the position of the first non-zero bit, performing the search from bit 127 to bit 0. If all bits are zero, the result is -1. We use HugeBitScanReverse to determine the first power of two we should multiply the divisor in order to begin the iteration.
The assembler implementation of HugeDiv that you can find attached supports signed numbers. It is just a first approximation, and it can be heavily optimized.
The function leaves in EDX the address of the remainder, so it can be used by a function returning the modulus of the division:
function HugeMod(dividend: Hugeint; divisor: Hugeint): Hugeint;
// Result := dividend Mod divisor;
// Parameters: EAX = @dividend; EDX = @divisor; ECX = @Result
asm
push ecx // @Result
call HugeDiv // EDX := @remainder;
pop eax // EAX := @Result;
call HugeMov // EAX^ := EDX^;
end;
Previous: Inline Assembler in Delphi (VI) - Calling external procedures