|
 
Using XML: A PHP Developer's Primer
Adam Delves
This series of articles will
focus on XML, its
applications in modern day web development and how PHP fits into this
niche. In this article, we will focus specifically on the tools
provided to us by PHP which enable us to manipulate XML data sources.
What is XML and Why Use it?
XML (eXtensible Markup Language) is a W3C standard designed to allow
the easy exchange, storage and use of data between web applications and
services.
Data encoded using the XML standard has meaning and structure which can
be interpreted easily by humans and computers.
XML data is platform and
application independent. Need I say more? This in itself makes XML
an ideal data exchange format for the internet (it was in fact
developed for this very purpose). The recent increase in
speed of broadband connections and the ever growing need by the
consumer for feature-rich applications able to
communicate with each other and share data across any medium means
that
XML web services and applications are becoming more and more abundant.
XML was created to address the problem of describing richly structured
data on the web, which had thus far only been addressed loosely through
the clever use of HTML.
Below is an example of a an XML document:
<?xml version="1.0"?> <party> <location>My House</location> <time>7pm</time> <guest> <name>John Bloggs</name> <item>Crate of Fosters</item> </guest> <guest> <name>Sara Bloggs</name> <item>Umbrella</item> </guest> <guest> <name>David Fig</name> <item>Bombay Mix</item> </guest> </party>
If you have not seen XML before, you may be thinking that it looks a
lot like HTML. HTML is an application of SGML, a standard which XML is a
subset of. The similarities end with the familiar looking tag
delimiters however.
Just looking at the XML fragment above, we can see that the
data
is describing a party with some guests, each of which is bringing an
item. The names of the tags used to describe the data are
entirely the choice of the author; all the XML standard requires is
that the data be consistent and the tags used to describe it be well
formed. We can further enforce data integrity with a document type
declaration (DTD) or an XML schema. For reasons of simplicity however
we will work with plain, ordinary XML in this tutorial.
Applications of XML
We have already seen how XML can
be used to
describe any kind of data. XML is already in use today in many web
applications, a few of which are described below:
- XHTML - this is one of the most widly used applications of
XML.
It is similar to the SGML based HTML used to describe how the data is
displayed on a web page. XHTML uses a DTD to ensure that all
documents conform to the standard. The emergence of XHTML has helped to
make the lives of web programmers somewhat easier, however, a
web
browser which is fully compliant with both the CSS and XHTML standard is yet to
emerge.
- XML-RPC - Remote procedure call (RPC) is used by
distributed
applications to invoke procedures on remote computers. XML-RPC encodes
the information about the procedure call using XML and sends it to the
receiving computer using HTTP. The return value from the procedure is
then again encoded in XML and sent back over the HTTP connection to the
calling computer.
- RSS - Really
Simple Syndicate / Rich Site Summary is a
method
used to syndicate and aggregate web content such as news, articles,
share prices and links. A special application (an aggregator)
regularly updates the RSS feeds on users PC's. The RSS data is encoded and
transferred using XML.
- AJAX - Asynchronous Javascript And XML, allows web
developers to create feature rich event driven web applications which
run in the web browser. Javascript is used to send and
receive
XML encoded data to server side scripts, allowing live page updates
without the need to refresh all content.
The above is only a minute sample of the possible uses of XML. In
future articles we will be looking into how we can use some of these
applications in PHP.
Using XML in PHP
Since PHP 5, the options
available to us with
which PHP can interact with XML have broadened significantly. The best that
the latest version of PHP 4 was able to offer was the unstable non w3c
compliant DOM XML extension.
I will be focusing on three
of the methods provided to us in PHP 5 which allow us to interact with XML: DOM, Simple XML and XPath.
Where possible I will suggest
a situation and data which is best suited to the method in question.
All sample code will use a simple XML data source describing a library and
its books.
<xml version="1.0"?> <library> <categories> <category cid="1">Web Development</category> <category cid="2">Database Programming</category> <category cid="3">PHP</category> <category cid="4">Java</category> </categories> <books> <book> <title>Apache 2</title> <author>Peter Wainwright</author> <publisher>Wrox</publisher> <category>1</category> </book> <book> <title>Advanced PHP Programming</title> <author>George Schlossnagle</author> <publisher>Developer Library</publisher> <category>1</category> <category>3</category> </book> <book> <title>Visual FoxPro 6 - Programmers Guide</title> <author>Eric Stroo</author> <publisher>Microsoft Press</publisher> <category>2</category> </book> <book> <title>Mastering Java 2</title> <author>John Zukowski</author> <publisher>Sybex</publisher> <category>4</category> </book> </books> </library>
DOM
The DOM PHP extension allows
operations on
XML documents using the W3C DOM API. Before PHP 5, this was the only way
with which PHP could access XML documents. If you have used DOM in
Javascript, you will recognize that the object model is all but
identical.
While the DOM method may be a long-winded way of traversing and
manipulating an XML document, any DOM compliant code has the distinct
advantage of being portable with any other API which implements the
same W3C compliant
object model.
In the example code below we use DOM to display information about each
book. We first
traverse the list of categories, loading their ID's and corresponding
names into an indexed array. Then we display a short description for
each book:
PHP:
<?php
/* here we must specify the version of XML : i.e: 1.0 */
$xml = new DomDocument('1.0');
$xml->load('xml/library.xml');
/* first to create a list of categories */
$categories = array();
$XMLCategories = $xml->getElementsByTagName('categories')->item(0);
foreach($XMLCategories->getElementsByTagName('category') as $categoryNode) {
/* notice how we get attributes */
$cid = $categoryNode->getAttribute('cid');
$categories[$cid] = $categoryNode->firstChild->nodeValue;
}
?>
<html>
<head>
<title>XML Library</title>
</head>
<body>
<?php foreach($xml->getElementsBytagName('book') as $book):
/* find the title */
$title = $book->getElementsByTagName('title')->item(0)->firstChild->nodeValue;
/* find the author - for simplicity we assume there is only one */
$author = $book->getElementsByTagName('author')->item(0)->firstChild->nodeValue;
/* list categories */
$bookCategories = $book->getElementsByTagName('category');
$catList = '';
foreach($bookCategories as $category) {
$catList .= $categories[$category->firstChild->nodeValue] . ', ';
}
$catList = substr($catList, 0, -2); ?>
<div>
<h2><?php echo($title) ?></h2>
<p><b>Author:</b>: <?php echo($author) ?></p>
<p><b>Categories: </b>: <?php echo($catList) ?></p>
</div>
<?php endforeach; ?>
</html>
Again, modifying the XML is a little long winded. To add a
category for example:
PHP:
function addCategory(DOMDocument $xml, $catID, $catName)
{
$catName = $xml->createTextNode($catName); // create a node to hold the text
$category = $xml->createElement('category'); // create a category element
$category->appendChild($catName); // add the text to the category element
$category->setAttribute('cid', $catID); // set the category id
$XMLCategories = $xml->getElementsByTagName('categories')->item(0);
$XMLCategories->appendChild($category); // add the new category
}
Saving the XML
You can transform the DOM representation back to the XML string
representation using one of the save() and saveXML() methods. The
save() function saves the XML to a file with a specified name, whereas the
saveXML() function returns a string from part or all of the document.
$xml->save('xml/library.xml');
// save the whole file $categories = $xml->saveXML($XMLCategories);
// return a string containing just the categories
To show just how easy it is to port DOM complaint code to another
language, here is the same code in Javascript:
Javascript:
function doXML() { /* first to create a list of categories */ var categories = Array(); var XMLCategories = xml.getElementsByTagName('categories')[0];
var theCategories = XMLCategories.getElementsByTagName('category'); for (var i = 0; i < theCategories.length; i++) { /* notice how we get attributes */ var cid = theCategories[i].getAttribute('cid'); categories[cid] = theCategories[i].firstChild.nodeValue; }
var theBooks = xml.getElementsByTagName('book'); for(var i = 0; i < theBooks.length; i++) { var book = theBooks[i];
/* find the title */ var title = book.getElementsByTagName('title')[0].firstChild.nodeValue;
/* find the author - for simplicity we assume there is only one */ var author = book.getElementsByTagName('author')[0].firstChild.nodeValue;
/* list categories */ var bookCategories = book.getElementsByTagName('category');
var catList = ''; for(var j = 0; j < bookCategories.length; j++) { catList += categories[bookCategories[j].firstChild.nodeValue] + ', '; }
catList = catList.substring(0, catList.length -2);
document.open(); document.write("<h2>" + title + "</h2>"); document.write("<p><b>Author:</b>: " + author + "</p>"); document.write("<p><b>Categories: </b>: " + catList + "</p>");
} document.close(); }
Simple XML
Simple XML really is simple. It
allows access to an XML
document, its elements and attributes using object and array access
methods. The way this is modeled is simple:
- Elements
- These are represented as single attributes of
the
SimpleXMLElement object. Where more than one of that
element
exists as a child of the document or element, each element can be
accessed using array index notation.
$xml->books; // returns the element "books" $xml->books->book[0]; // returns the first book inside the books element
- Attributes
- Attributes of elements are accessed and set using associative array
notation, where an index corresponds to the attribute name.
$category['cid']; // returns the value of the cid attribute
- Element Data
- To retrieve the text data contained inside an element, it must be
converted to a string explicitly using (string) or output using print
or echo. If the element contains more than one text node, they will be
concatenated in the order they were found.
echo ($xml->books->book[0]->title); // displays the title of the first book
Here is the original example converted to use simple XML. To load the
XML file, we use the simplexml_load_file() function. This parses the
XML file and loads it into a SimpleXMLElement object:
PHP:
<?php
$xml = simplexml_load_file('xml/library.xml');
/* load a list of categories into an array */
$categories = array();
foreach($xml->categories->category as $category) {
$categories[(string) $category['cid']] = (string) $category;
}
?>
<html>
<head>
<title>XML Library</title>
</head>
<body>
<?php foreach($xml->books->book as $book):
/* list categories */
$catList = '';
foreach($book->category as $category) {
$catList .= $categories[((string) $category)] . ', ';
}
$catList = substr($catList, 0, -2); ?>
<div>
<h2><?php echo($book->title) ?></h2>
<p><b>Author:</b>: <?php echo($book->author) ?></p>
<p><b>Categories: </b>: <?php echo($catList) ?></p>
</div>
<?php endforeach; ?>
</html>
Modifying the XML
Although text data and attribute values can be set using simple XML,
new ones cannot be created. However, SimpleXML does provide a way of
converting between DomElement objects and SimpleXMLElement objects. I
have modified the addCategory() function to show how the
simplexml_import_dom() function can be used to add a category and
convert the document back to simple XML format:
PHP:
function addCategory(SimpleXMLElement &$sXML, $catID, $catName)
{
$xml = new DOMDocument;
$xml->loadXML($sXML->asXML());
$catName = $xml->createTextNode($catName); // create a node to hold the text
$category = $xml->createElement('category'); // create a category element
$category->appendChild($catName); // add the text to the category element
$category->setAttribute('cid', $catID); // set the category id
$XMLCategories = $xml->getElementsByTagName('categories')->item(0);
$XMLCategories->appendChild($category); // add the new category
$sXML = simplexml_import_dom($xml);
return $sXML;
}
Similarly, the asXML() function of the SimpleXMLElement object can be
used retrieve the XML string and save it back to a file.
xPath
XPath is without a doubt the
cherry on top of the XML cake. XPath allows you to use SQL-like queries
to search for specific information in an XML document. Both DOM and
SimpleXML have built in support for XPath, which, like SQL, can be used
to extract just about anything you wish from an XML document.
- //category
- find all occurrences of category
anywhere in the document.
- /library/books
- find all occurrences of books
which are
children of library
- /library/categories/category[@cid]
- find all occurrences
of category which
are children of library/categories
with an attribute named cid
- /library/categories/category[@att='2']
- find all
occurrences of category
which are children of library/categories
with an attribute named cid which
have a value of 2
- /library/books/book[title='Apache
2'] - find all occurrences of book
which are children of /library/books and
whose title element has a value of Apache 2
This is only the tip of the xPath iceberg. You can create hugely
complex queries with xPath to select almost any kind of information
from your document. I have modified the sample code once again to show
you just how elegant the use of xPath can make it.
PHP:
<?php
$xml = simplexml_load_file('xml/library.xml');
?>
<html>
<head>
<title>XML Library</title>
</head>
<body>
<?php foreach(((array)$xml->xpath("/library/books/book")) as $book):
/* list categories */
$catList = '';
foreach($book->category as $category) {
/* get the category with this ID */
$category = $xml->xpath("/library/categories/category[@cid='$category']");
$catList .= (string) $category[0] . ', ';
}
$catList = substr($catList, 0, -2); ?>
<div>
<h2><?php echo($book->title) ?></h2>
<p><b>Author:</b>: <?php echo($book->author) ?></p>
<p><b>Categories: </b>: <?php echo($catList) ?></p>
</div>
<?php endforeach; ?>
</html>
DOM and XPath
Evaluating XPath queries in DOM requires creating a DOMXPath object.
The evaluate() function returns an array of DOMElements.
$xPath = new DOMXPath($xml); $xPath->evaluate("/library/books/book[title='Apache 2']");
Conclusion
Now that we have seen the tools
PHP provides us with to interact with XML, we are armed and ready to
start delving into some of the applications of XML. In my next article
we will be looking at AJAX and show how sites like Google are able to
do this
(go on, type a search query).
Useful Links
|