![]() Join Up! 96812 members and counting! |
|
|||
Converting XML Into A PHP Data Structure
Does It Taste As Good As It XMLs?
Dante Lorenso
In the last few years, XML has received great media attention, and most
languages support the parsing and extraction of data from XML documents.
Besides being a great three-letter anacronym to sprinkle on your
resumé, XML is actually a useful data storage structure
for PHP programmers.
Before you begin to use XML, you must first determine if your project
really needs what XML offers. There are alternative data storage formats
like fixed-width column files, tab-delimited files, CSV files, and database
tables, but these formats typically can only manage a simple grid of
rows and columns of data. XML provides several additional benefits for
programmers including:
...but you already know that. What you want to do is use XML data inside
your sparkling new web application. We'll explore one simple way to do
this in the remainder of this article.
A Look At The XML File Structure
XML files are designed to be validated against a DTD and store data in a
format similar to something you'd see in an HTML document. All tags are
just made up on the fly (as defined by a DTD) and can represent a tree
structure. Here is an example of some XML:
Now, if you look at this document, you'll notice that there exists one
<drive> tag with two
folder tags inside it. Also,
each folder tag contains two file tags within
them. This file creates a tree-like structure of data. Now, we would
like to access this data from within PHP. There are many ways to get
to this data from within PHP including:
The manual option is not our most robust solution, and the DOM support in
PHP is still experimental. So, I've chosen to use the SAX parser route.
However, unlike similar other solutions, I'd like to write an object in
PHP that parses this XML document into a PHP data structure so that I can
access the data like any other PHP data instead of having to write a custom
parser each time I use XML in an application.
Using An Array As A Data Structure
Knowing that the structure of the XML file is a tree, we need to find the best
way to represent that “tree” data in PHP. Well, my first idea
is to immediately consider a PHP array. Another option might be to build
objects similar to the DOM parser approach. I've decided not to write a
DOM parser, though (which you could easily do) because the DOM support is
coming along quickly enough. Why duplicate their efforts?
For simple XML, PHP arrays are perfect for the task because you can create
arrays of arrays of arrays and hence build a tree structure. Exactly
what we need for this learning exercise. Besides, there already exists a
plethora of functions built into the core PHP language for iterating
through arrays, pushing, popping, shifting, unshifting, splitting,
joining, slicing, etc.
To use the DOM model for inspiration, though, we'll need to store several
pieces of information about a given XML tag. Each tag in XML will contain
4 pieces of information that we want to store:
A PHP array that can represent this simple XML tag (also refered to as a
node in the tree) might look as follows:
What I've done here is create an array of key and value pairs for all the
attributes in the node. Then, I've created 3 internal-use-only keys called
'_NAME', '_DATA', '_ELEMENTS' to store the tag name, tag data, and sub-node
array. By using the underscore ('_') I ensure that I'll not conflict with
an attribute name. Using the sub-node array, we can now create arrays of
arrays of arrays and basically build our tree.
Using our XML example again, suppose you wanted to read in some information
from the file where name is 'd.txt'... You'd first convert the XML into a
PHP array of arrays and then access the data with code like the following:
Make PHP Do The Hard Work
PHP has a built-in process for parsing your XML document. You pass a string
to the
xml_parse function with XML text in it and when the
XML document is parsed, handlers for the configured events are called
as many times as necessary. Some events for which you can write handlers
are 'StartElement', 'EndElement', and 'CharacterData'. Here is some sample
code for definine a class and the three event handlers to parse XML:
Once we've built this class to wrap the PHP parser, we can create an instance
of the class and have it parse the XML sample code we described above. Some
sample code to do this would look as follows:
Watching The XML Parsing Events: Callback Functions
What do we expect to happen when the above code is executed? Well, each time
the xml_parse function encounters an XML tag in our document, it'll fire an
event by calling the functions we told it to call. The term for this behavior
is often refered to as a 'Callback Function'. Basically we want PHP to call
us back at a given function name each time it triggers an event of a certain
type.
By using the function,
xml_set_element_handler, we are
letting the PHP parser know that the open tag should invoke a method in
our class named 'startElement' and all close tags should invoke a method in
our class named 'endElement':
Additionally, we want to capture all the character data between tags, so we
use the method,
xml_set_character_data_handler to define
the callback function as 'characterData':
Callback functions are a very powerful tool that many languages offer and
they work great in this specific case. Until I write an article on using
callback functions, just accept that it 'simply works', and let's see what
events are fired as we parse our sample XML document:
START: [drive] DATA: [] DATA: [ ] START: [folder] DATA: [] DATA: [ ] START: [file] END: [file] DATA: [] DATA: [ ] START: [file] END: [file] DATA: [] DATA: [ ] END: [folder] DATA: [] DATA: [ ] START: [folder] DATA: [] DATA: [ ] START: [file] END: [file] DATA: [] DATA: [ ] START: [file] DATA: [] DATA: [ This is a comment about file d.] DATA: [] DATA: [ We like comments.] END: [file] DATA: [] DATA: [ ] END: [folder] DATA: [] END: [drive]
Did you expect to see that? Notice that each time a tag is opened, we
see the 'START: [tagname]' line printed. When a tag is closed,
we see the 'END: [tagname]' lines. Finally, whenever data is encountered,
we get the 'DATA: [...]' lines. Notice, though that the data lines are not
necessarily together. Rules in parsing say that you can not guarantee that
the data will always be together in one chunk. In fact, it's likely that
it will NOT be together. The PHP parser is allowed to call your
characterData callback method as many times as it needs to
and you'll have to concat the strings together until the end tag is closed.
Building The Array Tree
At this point, we have a functioning class that will parse an XML document
and fire events. What we'll need to do now is modify the event handlers to
build our array tree using the array structure we defined above.
A simple algorithm for developing this code goes as follows:
Before we start parsing the XML, it might help to push a 'root' node onto
an empty stack. This way, when the parsing is completed, we expect to
find only the root node remaining on the stack with all the subnodes built
beneath it.
The Completed XMLToArray Class
>How To Use The Class
The XMLToArray class we just built is rather simple in function. It
parses your XML document into a multidimensional array. Here is some
code that shows you how you might use this now:
The output of this code would yield a display similar to the following:
FOLDER: folder01 FILE: a.txt FILE: b.txt FOLDER: folder02 FILE: c.txt FILE: d.txt In Summary
I use a version of this class to quickly import XML documents into a
multidimensional PHP array where I can then use PHP functions to manipulate
the array's contents. You might be able to enhance this class speed-wise with
the use of references on your stack, or your might optimize by building a
Node class instead of using our simple Array.
The real purpose of this article is not just to give you a working PHP
class for XML, but rather to show you how you might develop your own
XML parsing class and toolset. There exist other PHP resources like
XPath that will allow you to search and extract values from XML documents
more quickly than through this method. Additionally, as the DOM parser
matures, you may find that it performs this parsing functionality for you
but with C code which is many times faster. For simple XML needs, however,
speed of execution is rarely the bottleneck for your application and this
approach is sufficient and sometimes even a 'powerful' solution for
getting the job done.
|