Parsing XML the DOM Extension for PHP 5

By Octavia Andreea Anghel

on October 27, 2010

DOM (Document Object Model) is a W3C standard based on a set of interfaces, which can be used to represent an XML or HTML document as a tree of objects. A DOM tree defines the logical structure of documents and the way a document is accessed and manipulated. Using DOM, developers create and build XML or HTML documents, navigate their structures, and add, modify, or delete elements and content.

The DOM can be used with any programming language, but in this article we will use the DOM extension for PHP 5. This extension is part of the PHP core and doesn’t need any installation.

The PHP object tree is built from nodes named according to XML. Some of the most familiar DOM nodes are:

The document node, represented by the DOMDocument interface
The element node, represented by the DOMElement interface
The attribute node, represented by the DOMAttr interface
The comment node, represented by the DOMComment interface
The text node, represented by the DOMText interface

Extracting the PHP Tree Objects Associated to an XML Document

In this section, you will see how to extract some elements and their values from a PHP tree object. You will need the XML document Book.xml, which is listed below:


<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
   <book>
      <!--XML Processing [part I] -->
      <name>XML Processing I</name>
      <author>John Smith Jr.</author>
      <publisher>HisOwnTM</publisher>
      <ISBN>111-222-333-4441</ISBN>
      <contents>
         <chapter_I>
            <title>What is XML about ?</title>
            <content>XML (Extensible Markup Language) is a ...</content>
         </chapter_I>
         <chapter_II>
            <title>SAX</title>
            <content>SAX is a simple API for ...</content>
         </chapter_II>
         <chapter_III>
            <title>StAX</title>
            <content>Much powerful and flexible, StAX, is very...</content>
         </chapter_III>
         <chapter_IV>
            <title>DOM
               <subtitle>DOM concept
                  <continut>Starting to use DOM...</continut>
               </subtitle>
               <subchapter_IV_I>
                  <title>First DOM application...</title>
                  <content>Here it is your first DOM          
                       application...
                  </content>
               </subchapter_IV_I>              
            </title>          
         </chapter_IV>        
         <end>The end...</end>       
      </contents>
   <!-- See you in XML Processing [part II] -->
   </book>

Save this document into the same directory as the associated PHP applications.

The below application extracts the associated tree objects of Book.xml and displays the first occurrences of child nodes using the getElementsByTagName method, which belongs to the DOMElement interface. DOMNodeList DOMElement::getElementsByTagName ( string $name ) returns all descendants with the given $name tag name.


<?php
 // Create a document instance
 $doc = new DOMDocument();
 //Load the Book.xml file
 $doc->load( 'Book.xml' );
 //Searches for all elements with the "book" tag name
 $books = $doc->getElementsByTagName( "book" );
 //Searches for all elements with the "author" tag name
 $authors = $doc->getElementsByTagName( "author" );
 //Returns the first element found having the tag name "author"
 $author = $authors->item(0)->nodeValue;
 //Searches for all elements with the "publisher" tag name
 $publishers = $doc->getElementsByTagName( "publisher" );
 //Returns the first element found having the tag name "publisher"
 $publisher = $publishers->item(0)->nodeValue;
 //Searches for all elements with the "name" tag name
 $titles = $doc->getElementsByTagName( "name" );
 //Returns the first element found having the tag name "name"
 $title = $titles->item(0)->nodeValue;
 //Printing the found values
 echo "$title - $author - $publisher n";
 ?>

The listing result is:

XML Processing I – John Smith Jr. – HisOwnTM

Through a Recursive PHP Object tree

The function below can be used to browse through all the nodes of PHP tree objects and to browse recursively the whole subtree, which has as a root the node taken as argument ($node) and the list name and values of each encountered node.


function getNodesInfo($node)
   {
   if ($node->hasChildNodes())
    {
      $subNodes = $node->childNodes;
      foreach ($subNodes as $subNode)
         {
         if (($subNode->nodeType != 3) || (($subNode->nodeType == 3) 
            &&(strlen(trim($subNode->wholeText))>=1)))   
            {
            echo "Node name: ".$subNode->nodeName."n";
            echo "Node value: ".$subNode->nodeValue."n";
            }
         getNodesInfo($subNode);      
         }
      }     
      }

Note: Notice that the empty text nodes were removed for a better view of the outcome using this condition line:


if (($subNode->nodeType != 3) || (($subNode->nodeType == 3)&&(strlen(trim($subNode->wholeText))>=1)))

The alternative for this condition is the predefined preserveWhiteSpace function, which removes redundant white space and has the default set to TRUE.

The next application uses Book.xml and the recursive function getNodesInfo to list the entire object tree associated to the XML document:


<?php
 //Create a document instance
 $doc = new DOMDocument();
 //Load the Book.xml file
 $doc->load( 'Book.xml' );
 //Setting the object tree root
 $root = $dom->firstChild;
 //The recursive function that list all the nodes of the tree
 function getNodesInfo($node)
   {
   if ($node->hasChildNodes())
     {
      $subNodes = $node->childNodes;
      foreach ($subNodes as $subNode)
         {
         if (($subNode->nodeType != 3) || (($subNode->nodeType == 3) 
            &&(strlen(trim($subNode->wholeText))>=1)))   
            {
            echo "Node name: ".$subNode->nodeName."n";
            echo "Node value: ".$subNode->nodeValue."n";
            }
         getNodesInfo($subNode);      
         }
      }     
   }  
 //The getNodesInfo function call
 getNodesInfo($root);
 ?>

Figure 1 shows a screen capture of a small part of the listing result.

Click here for larger image

Figure 1. The Whole Object Tree Associated to Book.xml

Note: Notice that the empty text nodes were removed for a better view of the outcome using this condition line:


if (($subNode->nodeType != 3) || (($subNode->nodeType == 3)&&(strlen(trim($subNode->wholeText))>=1)))

The alternative for this condition is the predefined preserveWhiteSpace function, which removes redundant white space and has the default set to TRUE.

Adding New Nodes

To add a new node in the object tree, you can use one of the two methods of the DOMNode interface, appendChild and insertBefore. The difference between them is that the appendChild method will insert a new child at the end of the XML document while the insertBefore method will insert a new child before a specified node.

The following are the prototypes of those two methods.

1. DOMNode DOMNode::appendChild (DOMNode $newnode): This function appends the $newnode argument at the end of an existing list of children or creates a new list of children. The child can be created using one of the two methods of the DOMDocument interface, createElement or createTextNode, as described below:

DOMElement createElement (string $name [, string $value ]): This method creates an instance of the DOMElement class. The $name argument represents the tag name of the element and the $value argument represents the value of the element. The value can be set later using the DOMElement->nodeValue property.
DOMText createTextNode (string $content): This method creates an instance of the DOMText class. The $content argument represents the content of the new text node created.

The following example appends the <bibliography> node at the end of the object tree using the above function:


//Creating a new element and add it to the root using the appendChild method
$newElement = $dom->createElement('bibliography','Martin Didier, Professional XML');
//The appendNewChild function call
appendNewChild($root,$newElement);
//The function which will append a new child to the root
function appendNewChild($currentNode, $node)
   {
   //appending a new children to the root
   $currentNode->appendChild($node);
   }

Figure 2 shows the results.

Click here for larger image

Figure 2. Append at the End of the Document <bibliography> Node

2. DOMNode DOMNode::insertBefore ( DOMNode $newnode [, DOMNode $refnode ] ): This method inserts a new node before the reference node. The $node argument represents the new node that will be inserted and the $refnode argument represents the reference node. If $refnode is missing, then the new node is appended to the children.

In this example the <foreword> child is added before the <publisher> child.


//adding a new children foreword using the insertBefore method
 <pre>$newElement = $dom->createElement('foreword','What I love about this book is that it grew out of just such a process, and shows it on every page.');</pre> 
//Setting the reference node
$allContents = $dom->getElementsByTagName('publisher');
$contents = $allContents->item(0);
//The insertNewChild function call
insertNewChild($contents,$newElement);
//The function which will insert a new child to the first occurrence of the $currentNode node
function insertNewChild($currentNode, $node)
   {
   $currentNode->insertBefore($node, $currentNode->firstChild);  
   }

Figure 3 shows the results.

Click here for larger image

Figure 3. Insert a New Child Node Before the <publisher> Node

Download: DOMSourceCode.zip

Extracting the PHP Tree Objects Associated to an XML Document

Through a Recursive PHP Object tree

Adding New Nodes

Related Results via Envato Market

Related Content