Title: Towards a more accurate sort order
Question: Sorting Addresses is a pain at the best of times, especially when a client supplies bad data (You may define clear fields in your DB, but when the data comes in, does it fit easily??)
This attempts to resolve this issue
Answer:
unit AddrSortOrder;
{The custom sort order is used to deal with the fact that the
house and flat numbers are sorted as strings. They are stored as
strings to allow things like '150-175' as a house number, or '3a',
or perhaps even simply a flat 'A'.
The need for a custom sort order is caused by the fact that with an
ordinary ASCII sort order '4' will appear after '30'. This is not
desirable behaviour.
This approach to fix this problem is to look for the first number
in the string (if there is one) and then use this as some kind of
primary sort order. The rest of the sorting will then be done on
the remaining characters (with preceding and trailing spaces
stripped out), based on the ASCII value of their upper-
case varients. Potential problems caused by this approach include
(but are not limited to) the use of accented characters will
possibly cause strange orderings and furthermore, if there is a block
of flats with three floors A, B, C for example then supposing the
flats on those floors are A1, A2, A3, B1, B2, B3 then the ordering
of records will not be ideal - this approach will sort them as
A1, B1, A2, B2, A3, B3. This behaviour is regrettable, but
acceptable - we cannot tell that it is not flat A on floor 1 for
example. It's unlikely that we will be able to find a sort order
that always produces ideal results.
Some examples of sorted lists (not all ideal):
EXAMPLE 1 EXAMPLE 2 EXAMPLE 3
Flat 1 1 A
Flat 2 -2 B
3 2-4 C
3B 3a 1
Flat 3A 5 2
}
interface
uses SysUtils;
function CalcSortIndex(NumStr:string):double;
implementation
function CalcSortIndex(NumStr:string):double;
var strlength,i,j,tmp:integer;
found:boolean;
numpart,strpart,divisor:double;
choppedstr:string;
begin
//This function will return the sort index value for the string passed
strlength:=length(NumStr);
if strlength=0 then
begin
result:=0;
exit;
end;
found:=false;
//split the string into a 'number' and a 'string' part..
//initialise
choppedstr:=numstr;
numpart:=0;
//Locate the first digit (if there)
for i:=1 to strlength do
begin
if numstr[i] in ['0'..'9'] then
begin
found:=true; //First digit found!!
break;
end;
end; //for i..
if found then
begin
//now get the to the end of the digits..
found:=false;
for j:=i to strlength do
begin
if not(numstr[j] in ['0'..'9']) then
begin
found:=true; //end of digits found
break;
end;
end; //for j..
//Separate out the string parts
if found then
begin
//Number was embedded..
val(copy(numstr,i,j-i),numpart,tmp);
Delete(choppedstr,i,j-i);
end
else
begin
//Number went to the end of the string
val(copy(numstr,i,strlength),numpart,tmp);
Delete(choppedstr,i,strlength);
end;
end;
choppedstr:=Uppercase(trim(choppedstr));
strlength:=length(choppedstr);
//evaluate a number for the remaining part of the string
strpart:=0;
divisor:=1;
for i:=1 to strlength do
begin
divisor:=divisor/256;
//convert from Char to single using a variant conversion
strpart:=strpart+(ord(choppedstr[i])*divisor);
end;
//All done, return the value
result:=numpart+strpart;
end;
end.
NB a version of this Algorithm for MSSQL7 is also posted (Title 'Towards a more accurate sort order in MSSQL7' ArticleID 2983)