Publishing System Settings Logout Login Register
The Ultimate Guide To Parsing XML Part 2: Using The DOM in PHP
TutorialCommentsThe AuthorReport Tutorial
Tutorial Avatar
Rating
Add to Favorites
Posted on October 7th, 2006
3946 views
PHP Coding
Welcome to The Ultimate Guide to Parsing XML: Part 2 (Using the DOM). Let's jump right in.

We already covered one way to parse the eXtensible Markup Language in PHP using expat. The other way that's built into PHP is using the DOM, or the Document Object Model.

The Document Object Model is a tree-like representation of an XML document. XHTML has one, which is how Javascript interacts with it. It builds a methodical representation of the XML document, wherein there is a root element, and then children of the root, and children of the children. The script navigates the tree by following children. It loads things in a very object oriented way. If you don't have any knowledge of object oriented programming, you could look at a tutorial on it. Building a Web Page with Object Oriented Programming.

The difference between parsing with the DOM and parsing with expat is that the DOM loads the whole XML document into memory and then navigates it using the commands you specify, while expat parses the document on the fly, parsing as it receives. As you'll see, the DOM is much easier to use, but it can slow down the script a bit. Expat requires a bit more thought as to how to use it, but can be faster.

Each does have its own purpose, though. Expat is built for parsing things like RSS feeds, which can be quite long and put a strain on the memory. Parsing with the DOM can be very useful for parsing things like the XML for a single article because it's not hard to write the script to parse all the information easily.

Now with the DOM there are two ways to go. You could either parse something from a file, from a string you define in the file, or from an XML document you create on the fly. We'll cover the first two in this tutorial.

So now onto the basics. We need something simple to parse. You can see the finished script here.

tutorial.xml


<?xml version=\"1.0\" encoding=\"iso-8859-1\"?>
<tutorial>
<name>Using OOP for PHP</name>
<author>Ben</author>
<email>[email protected]</email>
<content>Using Object Oriented Programming to Build a Web Page

I. Intro
1. The Overall Idea of the Tutorial
2. What is Object Oriented Programming
3. How can it help you
4. Basics of classes
II. The Page Class
1. Code
2. Explanation
III. The Page
1. Code
2. Explanation
IV. Extending
1. Code
2. Explanation
3. How it can reduce hassle and code



I. Intro
1. The Overall Idea of the Tutorial

This tutorial will demonstrate the basics of Object Oriented Programming. It will also put the ideas to practical use in creating and extending a page object.

To see the actual finished product, click here: My Example

2. What is Object Oriented Programming?

Object Oriented Programming, or OOP, is making a class and using a class in creating different PHP applications. The class acts as a basic outline containing variables and functions. Creating an instance of the class allows you to access the variables and functions and use the class.

3. How can it help you

OOP is very useful in all sorts of applications. It can be used as a general outline of a page, as it is in this tutorial, it�s used in the very popular Invision Power Board forums, and it can be used as a shopping cart mechanism. The versatility of the classes relies in the fact that it can be extended, allowing a class to be modified for different needs. This use will be better explained later on.</content>
</tutorial>


That's just a basic XML file. It has the root (tutorial), and four children (name, email, title, and content). Each of the children has one (1) text node. Very simple to parse.

To start, we load it into memory using xmldocfile(), then we'll find the root and the root's children.

<?php

if(!$doc = xmldocfile(�tutorial.xml�)){
die(�Error loading xml file�);
}

$root = $doc->root();
$children = $root->children();

?>


This is just a simple if statement that tries to load the XML file. Use xmldocfile(filename) to load a file, or xmldoc(string) to load an xml document defined in a string. So now you have the DomDocument class loaded into the $dom variable. The DomDocument object has 7 properties:

1.Name � The name of the XML Document
2.URL � The url
3.Version � The version of the XML document
4.Standalone � Whether the document is standalone or not
5.Type � Corresponds to a DOM node type (integer that will be covered later)
6.Compression � If the file was compressed or not
7.Charset � Charset used

All of this information can be accessed the same way as finding the root. To find the name, just use $doc->name. Finding the version would be $doc->version, and so on.

So now that you have the object loaded, you can find the root element and the children of the root, which is pretty self-explanatory. The DomDocument Object has 5 functions.

1.Root() - Returns a DomElement object for the document element
2.DTD() - Returns a DTD object that has information about the DTD
3.add_root() - Creates a new document element and returns a DomElement object (will be covered in a later tutorial on creating an XML document using the DOM)
4.dumpmem() - Dumps the structure into a string variable
5.xpath_new_context() - Creates an XPathContext object for XPath evaluation.

All are accessed the same way. $root = $doc->root(), $dtd = $doc->dtd(), etc.

So now we get to the next part of parsing.

foreach($children as $child){
$text = $child->children();
if(isset($text[0]->content)){
echo $child->tagname.\": \".nl2br($text[0]->content).\"<br />\";
}
}


So, if you remember, $children has all the information of the children of the root element stored in it. So we start to loop through each of the children. $text is the variable that has the text nodes of each child element stored in it. So then we see if the content of the first text node is set. If it is, we echo the tag name of the current child we are in, then a colon sign just to provide a segway into the content of the text node, and then finally a line break. I added the nl2br() function just to make it presentable.

We have to use $text[0] because there can be multiple text nodes under one child. For example, one could have something like:

<content>Hello, my name is <name>Ben</name>, how are you?</content>


In this, we have two text nodes in content, and one for name. Since we know there is only going to be one text node in this document, we can just use $text[0], but we can just as easily cycle through each node by adding a simple for loop.

Now when we say $children = $root->children(), it returns an array of all the children of the root element in an array. Each field in the array is an instance of the DomElement class, which has two fields: the type, and the tagname. Each are accessed the same: $children->type or $children->tagname. There are ten types of DOM node types:

1.XML_ELEMENT_NODE � Element
2.XML_ATTRIBUTE_NODE � Attribute
3.XML_TEXT_NODE � Text
4.XML_CDATA_SECTION_NODE � CDATA section
5.XML_ENTITY_REF_NODE � Entity reference
7. XML_PI_NODE � Processing Instruction
8. XML_COMMENT_NODE � Comment
9. XML_DOCUMENT_NODE � XML document
12. XML_NOTATION_NODE � Notation

Each of these are listed with the integer value of the type, then the constant value. They can be accessed like $child->type = 1 or $child->type = XML_ELEMENT_NODE. Both would be an element.

Each DomElement object has seven different functions to use with it. Each are accessed through the same way, ie: $children->parent().

1.children() - Returns an array of DomElement Objects of the children
2.parent() - Returns a DomDelemnt object of its parent
3.attributes() - Returns an array of DomAttribute objects of the attributes of the node
4.get_attribute() - Returns the value of an attribute of this node
5.new_child() - Creates a new DomElement object and attaches it as a child of this node
6.set_attribute() - Sets the value of an attribute of this node
7.set_content() - Sets the content of this node

So now there's two more classes that we'll go through in this tutorial before we end it: the DomText class, and the DomAttribute class.

The DomText class has two fields in it: the type, which should always be 3, and the content field, which is obviously the content of the element. Now there isn't anything different on how to find the text node, all it is is a DomElement node with a type of 3, or XML_TEXT_NODE.

To get the text value of this, you would do something like:

<?php

$xml = �<root><text>text</text></root>�;

$doc = xmldoc($xml);

$root = $doc->root();
$children = $root->children();
$text = $children[0]->children();
echo $text[0]->content;

?>


Basically here we have one root, one child of that root, and then one text node inside that child. We access the root using the root() method, then we find the children of said root, then get the children of that. Since the $children will be an array of DomElement classes, we'll get the children of that by accessing the first element, which is $children[0]. So when we get the children of the text element, there'll be only one returned in the array, which will be an instance of the DomText class, so it'll have two fields. Just because we should, we'll access it with $text[0] and find the content of that to echo it. If the xml was instead:

<root><text>some <color>red</color> text</text></root>


Then we'd find �some� with $text[0]->content and �text� with $text[2]->content, because �some� is the first child, the color element is the second child, and �text� is the third child.

So the last thing to do now, would be to find attributes of an element.

To get the attributes of a single element one calls the attributes() method of an element the same way one finds the children. So to start, we'll set up the basic of the script.

<?php

$xml = \"<root>some <text color='red'>text</text></root>\";

$doc = xmldoc($xml);
$root = $doc->root();
$children = $root->children();

?>


So it's a very simple XML document. One root with two children: the first, a text node, and the second, an element. The element �text� has one attribute and one child, a text node.

So to start, we'll set up something to parse the first child, the text element.

for($x = 0; $x < sizeof($children); $x++){
if($children[$x]->type == XML_TEXT_NODE){
echo $children[$x]->content;
}
}


It's an easy for loop that sets $x to zero, it loops while $x is less than the amount of children, and it goes up by one each loop.

First we'll see if the first child is a text node, to find the text node �some.� Then we echo the content of that.

That we've covered before. So the second the thing, would be to find the attribute. Here's the simple code for that.

if($children[$x]->type == XML_ELEMENT_NODE){
$text = $children[$x]->children();
$attributes = $children[$x]->attributes();
echo $attributes[0]->value.\" \".$text[0]->content;
}


What this fork will do, is check to see if this child is an element node. If it is, then we'll find the children of this element and and put the array into $text. Then the next thing to do would be to find the attributes of this element and put that into attributes. Since we know there's only one attribute, we can use $attributes[0], but if there were more than one, you could set up another for loop in the same way we set up the first one. Then I included some whitespace for readability, and then echoed the content of this element.

So now that's the basics of parsing XML with the Document Object Model. Now while that may be the basics, there are two more facets we can explore in a later tutorial. Those would be to create XML documents on the fly using PHP's DOM abilities, and to traverse the document using the DOM's XPath abilities.
Premium Publisher
Dig this tutorial?
Thank the author by sending him a few P2L credits!

Send
Blitz

I'm two months from 21 and these were written when I was 17. Fun how time flies huh?
View Full Profile Add as Friend Send PM
Pixel2Life Home Advanced Search Search Tutorial Index Publish Tutorials Community Forums Web Hosting P2L On Facebook P2L On Twitter P2L Feeds Tutorial Index Publish Tutorials Community Forums Web Hosting P2L On Facebook P2L On Twitter P2L Feeds Pixel2life Homepage Submit a Tutorial Publish a Tutorial Join our Forums P2L Marketplace Advertise on P2L P2L Website Hosting Help and FAQ Topsites Link Exchange P2L RSS Feeds P2L Sitemap Contact Us Privacy Statement Legal P2L Facebook Fanpage Follow us on Twitter P2L Studios Portal P2L Website Hosting Back to Top