![]() Join Up! 96813 members and counting! |
|
|||
Dynamic XML Conversion Using The SAX Parser And A Stack
This article describes an alternative way of converting XML to HTML using the SAX parser.
For each tag you want to convert, you write a conversion function. This function is called
with two arguments: contents and attributes. The return value of the function will replace
the tag and its contents in the finished document.
Introduction
We all know it: XML is great. (If you don't, look at some of the great articles at this site.)
But why is it so complicated to use? You have to learn about DTDs, XSLTs, DOM, XPATH, XPOINTER...
this is a lot of work, and most of these techniques are not really neccessary to build a website.
In this article you will learn how to build a simple converter for your XML.
The idea is this: You have a web page in clean XHTML. The biggest part of the html will be
tables, menu images and other design stuff. So why not replace the 30 kilobyte code for the
menu with a nice little "<menu />"?
When the document is opened from a webserver, a small php script replaces the <menu />-tag with
the correct table HTML. So you have a much cleaner document, and the end result is the same. You
can compose your whole page using these meta-tags, and even dynamic tags like <uppercase> or
<showdate /> are easy to program.
These are the steps we will take:
Think up a set of tags that can be used to build an example website.
We will use the SAX parser to build the converter. To learn more about SAX and EXPAT
(which is the name of the SAX implementation that PHP uses), please read Justin Grant's article
"PHP and XML: using
expat functions".
Thinking Up Some XML Tags
First you have to identify repeating elements of your web page. This can be menus, headlines,
links, shopping cart products and so on. Then look at the parameters you want to assign to your
elements. Look at this example XML, and you will get the idea:
file test.pxml:
<doc title="Pizza menu" bgcolor="lightblue"> <bigheadline> Pizza Palace - Our Menu for <dayofweek /> </bigheadline> <br /><br /> <b>Buon appetito!!!</b> <br /><br /> <nicebox bordercolor="green"> <product id="0" /><br /> <product id="1" /><br /> <product id="2" /><br /> <product id="3" /><br /> <product id="4" /><br /> </nicebox> </doc> Dynamically Constructing XML
PHP doesn't care if it is embedded in HTML or XML.
So if we use a little trick, we are able to use PHP to construct our XML.
This function creates an output buffer, opens and executes a file using the include-function,
and returns the contents of the output buffer.
So now we can create the XML on the fly:
file test.pxml:
<doc title="Pizza menu" bgcolor="lightblue">
<bigheadline>
Pizza Palace - Our Menu for <dayofweek />
</bigheadline>
<br /><br />
<b>Buon appetito!!!</b>
<br /><br />
<nicebox bordercolor="green">Creating the conversion functions
Each conversion function has two arguments:
So the function to handle this tag: <font face="bold">blah</font>
receives these attributes:
Here is an example conversion function:
The function wraps the contents it receives into some HTML to make it look like a headline.
It then returns this modified content to the conversion framework, which uses it as part of the
contents parameter for the function that converts the tag surrounding it.
Here is the function to convert the <dayofweek /> - tag:
The function ignores the contents and attributes which are passed and just returns the name of the current
weekday. So if you put some stuff between
Building the Conversion Framework
As mentioned before, we will use the SAX parser to build the conversion framework. SAX is an event based parser. It works like this: It chops up the document into elements. There are three element types: Start tags, end tags and character data. (Actually there are some more, but we won't need them in this example.) Then for each element it encounters in the document, the parser calls a function assigned to the element type.
The element handling functions receive these parameters:
We want to combine these three functions, so that we can use the attribute data to handle the contents data.
Our conversion functions will be called when the parser encounters a closing tag.
We will have to think up a way to store these values when we receive them so that we have them at hand when the
parser encounters the closing tag.
What Is A Stack?
A stack is a simple data structure.
It has two operations: Put data onto stack("push") and take data from stack ("pop").
Imagine a stack of pizza boxes: You can put pizza Nr 1 on the stack, then pizza Nr 2, pizza Nr 3.
When you now take the pizzas from the top of the stack, you get them in reverse order: Pizza 3, Pizza 2, Pizza 1.
Here is the code for the stack:
In valid XML tags must not overlap, and for every opening tag there is a closing tag. The SAX parser walks through
the script, and for every opening tag it reaches, our script will put its attributes onto the stack. Then, when it
reaches a closing tag, it takes one level from the stack.
So when the parser is converting a document, and it has already processed 13 opening tags and 8 closing tags,
there will be 5 elements on the stack.
As in XML the number of opening tags has to equal the number of closing tags, the stack will be empty when
the parser reaches the end of the document. And as there are no overlapping tags, the data sets are always
fetched in the correct order.
Here is a list of the steps our script will take to walk through a short piece of XML (the XML file contains
no character data, so only the opening and closing functions are called by SAX).
<doc> <tag1 parameter="Param 1"> <tag2 parameter="Param 2"> <tag3 parameter="Param 3"> </tag3> </tag2> </tag1> </doc>
The element handling functions
These functions are called by the SAX parser.
These are the handling functions for opening tags and character data:
In addition to the attribute list we also store the character data inside the tag into the stack.
The tag closing function is the most complicated part of the script. It works like this:
Character data is passed through, as are tags that are not handled by a conversion function.
So you don't have to write a handling function for every tag in your document, because they
stay unchanged. You can take a valid XHTML document as input for the converter, and the output
will be the same document except for the tags replaced by your conversion functions.
Here's the whole script:
file test.php:
Please save the files test.php and test.pxml to an accessible directory on your webserver and open
it in a browser. Note that the PHP version on the server has to include the expat parser (most do).
Closing Note
The conversion process is pretty fast, so you can do it on-the-fly.
If you are concerned about web server loads, you can put the converted output into a cache-file.
I used this technique myself for quite a few sites. I was able to build some tag libraries for form
processing, shopping carts and database tables. With some add-ons, this stack processing is quite
powerfull, but the code for the processing functions can become rather complicated when you add
intelligence to them and give them access to other levels of the stack (the parent tags).
Some other ideas for tags:
<image databaseid="3901" /> <openinpopup size="big"> Doh </openinpopup> <showflash src="bla.swf" />
© 2003 Martin Scheffler
|