![]() Join Up! 96812 members and counting! |
|
|||
A Practical Approach to Object-Relational Mapping in PHP
by Kieran Mathieson
PHP applications use various technologies, like HTML, relational databases (RDB), JavaScript,
object-oriented programming (OOP), and so on. Most of them work well together. For instance,
PHP code can easily write JavaScript, JavaScript can use a browser's document object model
(DOM) to change HTML, etc.
OOP and RDB don't fit together quite as well. When programmers write code mapping PHP classes to
RDB tables, they need to make sure the classes do not become too dependent on each other. Otherwise,
changes in one class will require changes in others, raising development and maintenance costs.
If software changes become too expensive and take too much time, applications will restrict business
innovation, rather than enabling it.
This paper offers a practical solution to the object-relational problem in PHP. "Practical" means
an approach meeting the constraints that usually apply in application development:
We'll use the term "application utility tasks" to refer to the things listed in the last bullet point.
They often consume significant resources, sometimes more than the rest of the application. Carefully
managing application utility tasks is important to application stability and performance. It's usually
best to start building them in at the very beginning of a project, rather than try to add them later.
This paper assumes you're familiar with PHP, OOP, and relational databases. If you're not, you might
want to consult these tutorials:
Background
Suppose your company breeds hippos for the army. Some are used as beasts of burden. Others are equipped with
weapons, like pie throwers, and spit projectors. Your company doesn't install the weapons, just the sockets
the weapons plug into. Armed hippos can have more than one mount point.
First, we'll create the hippos table, with the fields id and name.
Now some PHP. For simplicity, methods not directly related to the topic we're discussing have been omitted.
Hippo is a "business classes," that is, a class describing something meaningful to
nontechnical people in the business. Some classes do not correspond to business entities. For
example, classes for database objects, like Table, Record, and Field, might not mean anything
to the average sales representative. However, he or she will understand what a Hippo is, and
will know that they have id numbers and names.
Now let's write
Hippo::load() and Hippo::save().
(We're using the notation Hippo::load() to
refer to the load method of Hippo. In this context, the notation has nothing to do with static
methods.) Code for getting and setting properties, error checking, etc., has been omitted for
simplicity.
In
Hippo:load(), Line 3 copies the method's parameter into the id property.
Lines 4 to 7 open the database and retrieve the name of a hippo. Hippo::save() uses the id property to
determine whether an object represents a new hippo, or one that is already in the database (line 13).
If the id is null, the method creates a new record. Otherwise, it updates an existing one. The call to the
function application_utility_tasks()(lines 8 and 25) are just to remind us that we need to do these
things somewhere.
Let's create a subclass of Hippo called
ArmedHippo. Hippo has the attributes
id and name. ArmedHippo
inherits these attributes, and adds one of its own: number of mount points.
The next step is to write
ArmedHippo::load() and
ArmedHippo::save(). It's in data storage methods in subclasses
like this that object-relational mapping becomes a problem. If we're not cautious, we can
make the OOP code hard to maintain. Further, unless we carefully prepare the groundwork now,
future application updates will make things even worse.
We're going to look at several ways of doing the same three things:
The best case is that each change won't disturb existing code. The more code that is
disturbed, the more costly it will be to maintain the application.
Let's look at some options.
Option 1: Repeating Code
The easiest way to implement ArmedHippo's RDB methods is to first add a new column to the hippos table:
Then we can cut-and-paste
Hippo::load() and Hippo::save()
into ArmedHippo, and add the num_mount_points field.
The save() method looks at the id property on line 15. If id is null, the method adds a new
record, and gets the id number allocated to it by MySQL on line 21. If id is not null,
save() uses it an a primary key value in an SQL UPDATE statement (line 25).
This will work, but loses some of the advantages of OOP. If we add a weight property to Hippo, we'll
also need to change
ArmedHippo::load() and ArmedHippo::save().
If we add StealthHippo, we'll introduce more code that needs to be changed as Hippo changes.
Further, all versions of the load() and save()
methods call application_utility_tasks() separately. If we need to change the call, the code must be
updated in every place.
Option 2: Extra Queries for New Fields
Another option is to add num_mount_points to the hippos table as above, but change
ArmedHippo
so it calls Hippo::load() to handle the id and name, and then deals with
num_mount_points itself. For example, we can implement
ArmedHippo::load() as:
Line 2 calls
Hippo::load() to get id and name from the database. The
method then opens the database, executes an SQL statement, and fetches num_mount_points.
How will this work when we add the weight property? Quite well, since changes to the hippos table
only affect Hippo's methods. However, this approach reduces application performance.
ArmedHippo::load() calls Hippo::load(),
which open the database and runs an SQL statement. Then ArmedHippo::load()
connects to the database again, and runs another SQL statement. Add
StealthHippo, and things get worse. Caching and connection
pooling will reduce the performance hit, but not eliminate it. Further,
application_utility_tasks() is still called all over the place.
This illustrates a general problem with OOP. Classes hide implementation details, including
their use of resources. Sometimes this can adversely affect system performance. And since
you don't normally look at all the code in existing classes when you extend them, you might
not know why performance has suddenly decreased.
Option 3: New Tables for Subclasses
A third option is to leave the hippos table as it is, and add a new table for the
ArmedHippo class:
The id field is a foreign key into the hippos table. For example, a record with an
id of 25 in the hippos table will also have an id of 25 in the new table.
This has the same disadvantages as option 2, and adds another all of its own. Decision
makers often want to explore data, looking for ways to improve production, gain market share,
reduce costs, etc. They want to use friendly tools to query databases, like Access and Excel.
Introducing extra tables and links makes their task more difficult. It's not too bad for one
inherited class, but a real application might have a dozen of them.
Option 4: A New Base Class
Fortunately, there's a way out. Suppose we introduce a new class, called
BusinessBase,
and make it the base class of all business classes. BusinessBase performs the application
utility tasks (permissions, transaction logging, etc.). It also handles every SQL statement.
None of the business classes ever connect to a database directly. Instead, they use a data
structure to exchange information with BusinessBase.
Here's the new Hippo:
The Hippo constructor creates the data structure that exchanges information with BusinessBase.
First, it sets the name of the RDB table used to store hippo data (line 6). Then it specifies
each field in the table, supplying the name of the field, and a DataField object (lines 7 and 8).
A DataField object contains the data type of a field, and a flag showing whether or not it is
the primary key.
Hippo::load() passes a record id to
BusinessBase::load() (line 12), then extracts the values it needs
(id and name in this case, in lines 13 and 14). Hippo::save()
stores the values it wants to save (lines 18 and 19), and calls BusinessBase::save()
(line 20). If Hippo::save() is creating a new record,
BusinessBase::save() supplies its id number, which
Hippo::save() then retrieves (line 22).
The code for ArmedHippo is quite simple:
The constructor calls Hippo's constructor (line 5), then adds a new data field,
num_mount_points (line 6). ArmedHippo::load()
calls Hippo::load() (line 11), then extracts the value for
the field it manages (line 12). ArmedHippo::save()
stores the value for num_mount_points (line 16), then calls
Hippo::save() (line 17).
Now we've got what we wanted.
ArmedHippo can be added without disturbing
Hippo. A weight property can be added to Hippo,
without requiring changes to ArmedHippo. StealthHippo
can inherit from ArmedHippo, with no changes to existing code. Further,
no calls to functions that perform application utility tasks appear in Hippo,
ArmedHippo, or StealthHippo. They are
centralized in BusinessBase.
BusinessBase starts like this:
$table_name is, of course, the name of the RDB table for the class.
$record is an associative array describing each record in the table.
The array's index is the name of a field in the table (e. g., id). The value in each array element
is a DataField object:
Lines 2 and 3 essentially define two constants, though they are really static methods and are called as such
(see lines 7 and 8 in the Hippo code above, and line 7 in ArmedHippo). Declaring them this way subjects
them to the same scope and inheritance rules as other methods.
BusinessBase::addField() (called by Hippo and ArmedHippo) adds a field to
$records:
BusinessBase::load() looks like this:
BusinessBase::load() calls find_id_field()
in line 3 to locate the primary key field, and store its name in $this->_id_field for later use.
The method then opens the database (line 4), builds a SELECT query (line 7), and executes it
(line 9). Lines 11-13 move the data into $record, where it can be retrieved with a statement
like this from Hippo:
$this->id = $this->getDBValue('id');
You can download all of the sample code here.
Conclusion
This paper shows one way to handle object-relational mapping. The technique has much to recommend it:
The code above is simplified, of course, to focus on the object-relational problem. The code
doesn't check for errors, like bad database connections, or repeated field names
(e. g.,
ArmedHippo defining a field that Hippo
has already defined). Application utility tasks haven't been fully specified. Collection classes
(e. g., HippoHerd) haven't been discussed. Compound primary keys are not handled. A database
abstraction layer (e. g., ADODB) should be used. Parameters like database user name and password
should not be hard-coded. And so on.
However, all of these problems can be addressed within the architecture outlined here.
BusinessBase::save() could check for errors, log database changes,
use an RDB abstraction layer, and so on. Adding these things to BusinessBase
would yield a robust, capable, reusable class that simplifies business application development. |