Sr. Web Developer
mediabistro.com
US-NY-New York

Justtechjobs.com Post A Job | Post A Resume

Using XML: A PHP Developer's Primer
This series of articles will focus on XML, its applications in modern day web development and how PHP fits into this niche. In this article, we will focus specifically on the tools provided to us by PHP which enable us to manipulate XML data sources.
What is XML and Why Use it?
XML (eXtensible Markup Language) is a W3C standard designed to allow the easy exchange, storage and use of data between web applications and services.

Data encoded using the XML standard has meaning and structure which can be interpreted easily by humans and computers. XML data is platform and application independent. Need I say more? This in itself makes XML an ideal data exchange format for the internet (it was in fact developed for this very purpose). The recent increase in speed of broadband connections and the ever growing need by the consumer for feature-rich applications able to communicate with each other and share data across any medium means that XML web services and applications are becoming more and more abundant.

XML was created to address the problem of describing richly structured data on the web, which had thus far only been addressed loosely through the clever use of HTML.

Below is an example of a an XML document:
<?xml version="1.0"?>
<party>
    <location>My House</location>
    <time>7pm</time>
    <guest>
      <name>John Bloggs</name>
      <item>Crate of Fosters</item>
    </guest>
    <guest>
      <name>Sara Bloggs</name>
      <item>Umbrella</item>
    </guest>
    <guest>
      <name>David Fig</name>
      <item>Bombay Mix</item>
    </guest>
</party>
If you have not seen XML before, you may be thinking that it looks a lot like HTML. HTML is an application of SGML, a standard which XML is a subset of. The similarities end with the familiar looking tag delimiters however.

Just looking at the XML fragment above, we can see that the data is describing a party with some guests, each of which is bringing an item. The names of the tags used to describe the data are entirely the choice of the author; all the XML standard requires is that the data be consistent and the tags used to describe it be well formed. We can further enforce data integrity with a document type declaration (DTD) or an XML schema. For reasons of simplicity however we will work with plain, ordinary XML in this tutorial.

Applications of XML
We have already seen how XML can be used to describe any kind of data. XML is already in use today in many web applications, a few of which are described below:
  • XHTML - this is one of the most widly used applications of XML. It is similar to the SGML based HTML used to describe how the data is displayed on a web page. XHTML uses a DTD to ensure that all documents conform to the standard. The emergence of XHTML has helped to make the lives of web programmers somewhat easier, however, a web browser which is fully compliant with both the CSS and XHTML standard is yet to emerge.
  • XML-RPC - Remote procedure call (RPC) is used by distributed applications to invoke procedures on remote computers. XML-RPC encodes the information about the procedure call using XML and sends it to the receiving computer using HTTP. The return value from the procedure is then again encoded in XML and sent back over the HTTP connection to the calling computer.
  • RSS - Really Simple Syndicate / Rich Site Summary is a method used to syndicate and aggregate web content such as news, articles, share prices and links. A special application (an aggregator) regularly updates the RSS feeds on users PC's. The RSS data is encoded and transferred using XML.
  • AJAX - Asynchronous Javascript And XML, allows web developers to create feature rich event driven web applications which run in the web browser. Javascript is used to send and receive XML encoded data to server side scripts, allowing live page updates without the need to refresh all content.
The above is only a minute sample of the possible uses of XML. In future articles we will be looking into how we can use some of these applications in PHP.
Using XML in PHP
Since PHP 5, the options available to us with which PHP can interact with XML have broadened significantly. The best that the latest version of PHP 4 was able to offer was the unstable non w3c compliant DOM XML extension.
I will be focusing on three of the methods provided to us in PHP 5 which allow us to interact with XML: DOM, Simple XML and XPath. Where possible I will suggest a situation and data which is best suited to the method in question. All sample code will use a simple XML data source describing a library and its books.


<xml version="1.0"?>
<library>
<categories>
<category cid="1">Web Development</category>
<category cid="2">Database Programming</category>
<category cid="3">PHP</category>
<category cid="4">Java</category>
</categories>
<books>
<book>
<title>Apache 2</title>
<author>Peter Wainwright</author>
<publisher>Wrox</publisher>
<category>1</category>
</book>
<book>
<title>Advanced PHP Programming</title>
<author>George Schlossnagle</author>
<publisher>Developer Library</publisher>
<category>1</category>
<category>3</category>
</book>
<book>
<title>Visual FoxPro 6 - Programmers Guide</title>
<author>Eric Stroo</author>
<publisher>Microsoft Press</publisher>
<category>2</category>
</book>
<book>
<title>Mastering Java 2</title>
<author>John Zukowski</author>
<publisher>Sybex</publisher>
<category>4</category>
</book>
</books>
</library>
DOM
The DOM PHP extension allows operations on XML documents using the W3C DOM API. Before PHP 5, this was the only way with which PHP could access XML documents. If you have used DOM in Javascript, you will recognize that the object model is all but identical.


While the DOM method may be a long-winded way of traversing and manipulating an XML document, any DOM compliant code has the distinct advantage of being portable with any other API which implements the same W3C compliant object model.


In the example code below we use DOM to display information about each book. We first traverse the list of categories, loading their ID's and corresponding names into an indexed array. Then we display a short description for each book:

PHP:
<?php
    
/* here we must specify the version of XML : i.e: 1.0 */
    
$xml = new DomDocument('1.0');
    
$xml->load('xml/library.xml');

    
/* first to create a list of categories */
    
$categories = array();
    
$XMLCategories = $xml->getElementsByTagName('categories')->item(0);

    foreach(
$XMLCategories->getElementsByTagName('category') as $categoryNode) {
        
/* notice how we get attributes */
        
$cid = $categoryNode->getAttribute('cid');
        
$categories[$cid] = $categoryNode->firstChild->nodeValue;
    }
?>
<html>
    <head>
        <title>XML Library</title>
    </head>
    <body>
        <?php foreach($xml->getElementsBytagName('book') as $book):
            
/* find the title */
            
$title = $book->getElementsByTagName('title')->item(0)->firstChild->nodeValue;

            
/* find the author - for simplicity we assume there is only one */
            
$author = $book->getElementsByTagName('author')->item(0)->firstChild->nodeValue;

            
/* list categories */    
            
$bookCategories = $book->getElementsByTagName('category');

            
$catList = '';
            foreach(
$bookCategories as $category) {
                
$catList .= $categories[$category->firstChild->nodeValue] . ', ';
            }
            
            
$catList = substr($catList, 0, -2); ?>
            
        <div>
            <h2><?php echo($title) ?></h2>
            <p><b>Author:</b>: <?php echo($author) ?></p>
            <p><b>Categories: </b>: <?php echo($catList) ?></p>
        </div>    
        <?php endforeach; ?>
</html>

Again, modifying the XML is a little long winded. To add a category for example:

PHP:
function addCategory(DOMDocument $xml, $catID, $catName)
{
    
$catName = $xml->createTextNode($catName); // create a node to hold the text
    
$category = $xml->createElement('category'); // create a category element
    
$category->appendChild($catName); // add the text to the category element
    
$category->setAttribute('cid', $catID); // set the category id
        
    
$XMLCategories = $xml->getElementsByTagName('categories')->item(0);
    
$XMLCategories->appendChild($category); // add the new category
        


Saving the XML
You can transform the DOM representation back to the XML string representation using one of the save() and saveXML() methods. The save() function saves the XML to a file with a specified name, whereas the saveXML() function returns a string from part or all of the document.


$xml->save('xml/library.xml'); 
// save the whole file
$categories = $xml->saveXML($XMLCategories); // return a string containing just the categories
To show just how easy it is to port DOM complaint code to another language, here is the same code in Javascript:


Javascript:
function doXML()
{
/* first to create a list of categories */
var categories = Array();
var XMLCategories = xml.getElementsByTagName('categories')[0];

var theCategories = XMLCategories.getElementsByTagName('category');
for (var i = 0; i < theCategories.length; i++) {
/* notice how we get attributes */
var cid = theCategories[i].getAttribute('cid');
categories[cid] = theCategories[i].firstChild.nodeValue;
}


var theBooks = xml.getElementsByTagName('book');
for(var i = 0; i < theBooks.length; i++) {
var book = theBooks[i];

/* find the title */
var title = book.getElementsByTagName('title')[0].firstChild.nodeValue;

/* find the author - for simplicity we assume there is only one */
var author = book.getElementsByTagName('author')[0].firstChild.nodeValue;

/* list categories */
var bookCategories = book.getElementsByTagName('category');

var catList = '';
for(var j = 0; j < bookCategories.length; j++) {
catList += categories[bookCategories[j].firstChild.nodeValue] + ', ';
}

catList = catList.substring(0, catList.length -2);

document.open();
document.write("<h2>" + title + "</h2>");
document.write("<p><b>Author:</b>: " + author + "</p>");
document.write("<p><b>Categories: </b>: " + catList + "</p>");

}
document.close();
}

Simple XML
Simple XML really is simple. It allows access to an XML document, its elements and attributes using object and array access methods. The way this is modeled is simple:

  • Elements - These are represented as single attributes of the SimpleXMLElement object. Where more than one of that element exists as a child of the document or element, each element can be accessed using array index notation.
    $xml->books; // returns the element "books"
    $xml->books->book[0]; // returns the first book inside the books element
  • Attributes - Attributes of elements are accessed and set using associative array notation, where an index corresponds to the attribute name.
    $category['cid']; // returns the value of the cid attribute
  • Element Data - To retrieve the text data contained inside an element, it must be converted to a string explicitly using (string) or output using print or echo. If the element contains more than one text node, they will be concatenated in the order they were found.
    echo ($xml->books->book[0]->title); // displays the title of the first book
Here is the original example converted to use simple XML. To load the XML file, we use the simplexml_load_file() function. This parses the XML file and loads it into a SimpleXMLElement object:

PHP:
<?php
    $xml
= simplexml_load_file('xml/library.xml');

    
/* load a list of categories into an array */
    
$categories = array();
    foreach(
$xml->categories->category as $category) {
        
$categories[(string) $category['cid']] = (string) $category;
    }

    
?>
<html>
    <head>
    <title>XML Library</title>
    </head>
    <body>
    <?php foreach($xml->books->book as $book):
        
/* list categories */
        
$catList = '';
        foreach(
$book->category as $category) {
            
$catList .= $categories[((string) $category)] . ', ';
        }
    
        
$catList = substr($catList, 0, -2); ?>
    
    <div>
        <h2><?php echo($book->title) ?></h2>
        <p><b>Author:</b>: <?php echo($book->author) ?></p>
        <p><b>Categories: </b>: <?php echo($catList) ?></p>
    </div>
    <?php endforeach; ?>
</html>

 

Modifying the XML
Although text data and attribute values can be set using simple XML, new ones cannot be created. However, SimpleXML does provide a way of converting between DomElement objects and SimpleXMLElement objects. I have modified the addCategory() function to show how the simplexml_import_dom() function can be used to add a category and convert the document back to simple XML format:

PHP:

function addCategory(SimpleXMLElement &$sXML, $catID, $catName)
{
    
$xml = new DOMDocument;
    
$xml->loadXML($sXML->asXML());

    
$catName = $xml->createTextNode($catName); // create a node to hold the text
    
$category = $xml->createElement('category'); // create a category element
    
$category->appendChild($catName); // add the text to the category element
    
$category->setAttribute('cid', $catID); // set the category id
    
    
$XMLCategories = $xml->getElementsByTagName('categories')->item(0);
    
$XMLCategories->appendChild($category); // add the new category

    
$sXML = simplexml_import_dom($xml);
    return
$sXML;
}


Similarly, the asXML() function of the SimpleXMLElement object can be used retrieve the XML string and save it back to a file.

xPath
XPath is without a doubt the cherry on top of the XML cake. XPath allows you to use SQL-like queries to search for specific information in an XML document. Both DOM and SimpleXML have built in support for XPath, which, like SQL, can be used to extract just about anything you wish from an XML document.
  • //category - find all occurrences of category anywhere in the document.
  • /library/books - find all occurrences of books which are children of library
  • /library/categories/category[@cid] - find all occurrences of category which are children of library/categories with an attribute named cid
  • /library/categories/category[@att='2'] - find all occurrences of category which are children of library/categories
    with an attribute named cid which have a value of 2
  • /library/books/book[title='Apache 2'] - find all occurrences of book which are children of /library/books and whose title element has a value of Apache 2
This is only the tip of the xPath iceberg. You can create hugely complex queries with xPath to select almost any kind of information from your document. I have modified the sample code once again to show you just how elegant the use of xPath can make it.

PHP:

<?php
    $xml
= simplexml_load_file('xml/library.xml');
?>
<html>
    <head>
    <title>XML Library</title>
    </head>
    <body>
    <?php foreach(((array)$xml->xpath("/library/books/book")) as $book):
        
/* list categories */
        
$catList = '';
        foreach(
$book->category as $category) {
            
/* get the category with this ID */
            
$category = $xml->xpath("/library/categories/category[@cid='$category']");
            
$catList .= (string) $category[0] . ', ';
        }
    
        
$catList = substr($catList, 0, -2); ?>
    
    <div>
        <h2><?php echo($book->title) ?></h2>
        <p><b>Author:</b>: <?php echo($book->author) ?></p>
        <p><b>Categories: </b>: <?php echo($catList) ?></p>
    </div>
    <?php endforeach; ?>
</html>
 


DOM and XPath
Evaluating XPath queries in DOM requires creating a DOMXPath object. The evaluate() function returns an array of DOMElements.
$xPath = new DOMXPath($xml);
$xPath->evaluate("/library/books/book[title='Apache 2']");


Conclusion
Now that we have seen the tools PHP provides us with to interact with XML, we are armed and ready to start delving into some of the applications of XML. In my next article we will be looking at AJAX and show how sites like Google are able to do this (go on, type a search query).

Useful Links