Object data must ultimately be stored in a database. This article shows you how.
Persistent Objects
In the last couple of articles we have introduced business objects and shown how simple relationships can be represented. The relationships expressed as object properties are a more formal (and neutral) representation of the relationships between data entities that can be implicit within the database design, or explicit within the database schema. Moving away from a data-centric view of entities and relationships to a more structured, object-oriented exposition has many benefits, but most languages (including Delphi) cater only for an in-memory object model; objects have no state persistence capability unless one is provided for them.
There are many design solutions to this requirement, and we have already established that the business (problem domain) objects themselves should not communicate directly with the database. An appropriate design is to have another set of classes that exist purely as an object interface to the database, and will be responsible for loading and saving problem domain object state. Our problem domain objects will know nothing about how their state is stored on disk, only that there is a particular class that is responsible for this operation. The business objects will make a method call onto such an object and delegate the work to it.
The first decision that must be taken is to decide how to map objects onto a database schema. Although a number of object databases exist, most companies have standardised on an existing RDBMS with which they are comfortable and experienced. Promoting a development change to a more object-oriented approach is difficult enough without demanding that the typically large investment in a highly evolved database technology is replicated for a new, unfamiliar one. Therefore, this article will focus on mapping objects to a SQL-based RDBMS, as this constitutes the majority of installed development systems. It should, however, be stressed that one of the advantages of our object model is that it is totally architecture-neutral and can be applied to a large number of different database topologies, including ISAM and object-based ones.
Mapping objects to databases
It is obvious and natural that there is a relatively simple conceptual correspondence between an object and a database tuple (record). Therefore, when storing object state in a RDBMS a very common, and practical, solution is to map a problem domain classes to tables, objects to records and properties to fields. Note that because our database will be storing object state, it must be a complete representation and therefore it is likely that there will be more fields in the database table than a given class has public properties. In some cases a public property may not have a directly corresponding field, but the general principle is to map a property onto a database field of similar fundamental type (character, numeric, date etc.)
Let us now extend our basic framework classes to support object persistence. Our problem domain objects (TPDObject) need to permit other classes to force them to load or save their state. There are a number of alternative ways of approaching this situation; one is to force all objects to implicitly save their state before they are destroyed. In practice, applications require a finer degree of control over when object state should be persisted and so our TPDObject will gain two new public methods, Load and Save. The Save method is parameterless, but our Load method must define exactly which object is to be loaded from persistent store. Within our framework, all problem domain objects that have been saved are allocated a unique ID within a given context (this might be within objects of the same class, objects of the same ancestry, the application or universally). We will use this ID as a parameter to the Load method to standardise on the means by which we establish object identity. Note that this is always used, even if a specific class might have a suitable alternative “primary key” concept. Standardising on a single concept of object identity is useful, as our framework can use this consistency to treat our problem domain objects in a generic and polymorphic fashion. Some might question the presence of a public property (albeit read-only) that exposes a type chosen for convenience to suit the internals of our framework. In practice, the identification of objects by ID is something that occurs almost entirely within the framework code and is rarely encountered within the actual application logic itself. Within that arena, objects are handled using concepts much more familiar to the end-user, such as “the set of customers called Smith”, rather than through developer-oriented abstracts.
Having defined the public method interface on our TPDObject problem domain class we must now consider the implementation. In fact, this is very simple. We have already stated that the business object knows nothing about how it is stored, only that there is another object responsible for this task. Therefore, the implementation of our persistence methods on TPDObject simply delegate the work directly to another object referenced within a private field. Each of these method calls is parameterised with Self, so that the delegated object knows with which instance it is dealing. In effect, when a problem domain object is instructed to save (or load) itself, it simply instructs another object to “save me”.
Data Management
The task of actually saving the object state to a database falls to a set of classes in the data management layer. Predictably, we will have a class hierarchy allowing us to provide significant database-independent functionality. All of our classes responsible for data management will descend from an abstract TDMObject class. This class provides an object-based, database-independent interface for database operations. Specific database support is provided by creating a concrete descendant of this class that provides a connection to the database of choice using whatever technology is appropriate, and additionally provides some base services. These base services will be totally customised to the specific features of the database concerned, and will be used by application-dependent descendants customised for handling a particular TPDObject. This design does not dictate the means by which the TDMObjects communicate with the database engine: each database layer is free to choose the most suitable (easiest/fastest) technology available. This might be ADO for SQL Server 7, IBExpress for Interbase, the BDE for Paradox files, a custom API or indeed it is possible to interface to something like ODBC or CORBA for generic handling. My preference is to optimise the connection using a database-dependent set of components as these are generally offer the greatest functionality and speed. Note that selecting a database-dependent API does not restrict your application to running with this database; it is possible to substitute one database-dependent TDMObject layer for another. The interface to the TDMObject is purely object based, so it can be guaranteed that, provided our database layers implement the concrete methods correctly, the application will run identically without changes.
The actual implementation of each database-specific layer will depend upon the database concerned, but in general for SQL-capable databases it is worthwhile to provide a generic Execute command that takes a valid SQL command as a parameter. Once we have a TDMObject descendant for a chosen database we must provide a number of descendants from this class, one for every TPDObject in our application. These will be customised to handle a very specific combination of saving a particular class in a particular database. The details of these classes depend upon the features provided by the data management class for the particular database, but one feature is vital: the Save operation must return the ID of the object saved. The reason for this is that in our chosen model, a TPDObject does not have an ID until the first time it is saved. A particular benefit of this model is that it is very easy for the data management layer to detect whether it is necessary to generate an INSERT or UPDATE type database action. Typically, the allocation of the ID might be done by another object designed purely for this purpose, but there is an very beneficial optimisation that can be made if we rely upon our data management object to do this task for us. Remember that our ID must be unique in a given context; if the context chosen is that objects of the same class have unique ID’s, then this can be equated to records in the table having unique ID’s. Most databases have some facility for generating sequential unique ID’s for inserted records, and make this value available after the database update. Using this facility within our data management layer can avoid replicating the effort to guarantee uniqueness, and in the best case can save on extraneous database operations to establish the next available ID. If the chosen database lacks such features then an internal ID allocator scheme within the data management hierarchy can be used.
Listing 1 shows the extensions to our Framework unit to support Load and Save operations, together with the outline of a class to handle data access to a database through ADO. It is assumed that the ADO connection has been established and that utility routines are provided within the class to execute SQL commands and to handle back the received data. In this implementation descendant classes are required to override the Load method, and also to provide a new Insert and Update implementation. It is fairly easy to see that these three methods should generate appropriate SQL code and update the object from the resultset (or vice versa). With a bit more effort it is possible to place more work within the generic TADO_DMObject class itself, imposing a more rigid interface on descendant classes that requires less implementation. An example of this would be the ancestor class generating SQL commands dynamically, given a set of property names and values.
Our class-specific data management objects (such as TCustomerDM, corresponding to TCustomer) must have intimate knowledge about the internals of the PD object for which it is responsible. In this sense they can be viewed as “friend” classes of the problem domain object, and in Delphi this means that the implementations must be in the same unit.
This articles’s problem
Our design requires a TDMObject to be provided for every TPDObject. Where and when might this provision take place? What is this issue with this approach and, by analysing the pattern of method calls to our TDMObjects, how may it be circumvented?
((( Listing 1 – Data management objects and interfaces)))
unit Framework;
interface
type
TDMObject = class;
TPDObject = class
private
DMObject: TDMObject;
FID: TObjectID;
public
procedure Load (const ID: TObjectID);
procedure Save;
end;
TDMObject = class
public
procedure Load (PDObject: TPDObject; const ID: TObjectID); virtual; abstract;
function Save (PDObject: TPDObject): TObjectID; virtual; abstract;
end;
TADO_DMObject = class (TDMObject);
private
FADO: TADOExpress;
protected
// Features to assist descendants to interact with the database
procedure Execute (SQL: String);
property ADO: TADOExpress read FADO;
// Methods that descendants must provide
function Insert (const PDObject: TPDObject): TObjectID; virtual; abstract;
procedure Update (const PDObject: TPDObject); virtual; abstract;
public
function Save (PDObject: TPDObject): TObjectID; override;
end;
implementation
procedure TPDObject.Load (const ID: TObjectID);
begin
Assert (DMObject <> nil, 'No Data Management object available');
DMObject.Load (Self, ID);
end;
procedure TDMObject.Save;
begin
Assert (DMObject <> nil, 'No Data Management object available');
FID := DMObject.Save (Self);
end;
function TADO_DMObject.Save (PDObject: TPDObject): TObjectID;
begin
if PDObject.ID = NotAssigned then begin
Result := Insert (PDObject);
end else begin
Update (PDObject);
Result := PDObject.ID;
end;
end;
end.
((( End Listing 1 )))