Publishing System Settings Logout Login Register
RSS Parsing using built in libraries in PHP5
TutorialCommentsThe AuthorReport Tutorial
Tutorial Avatar
Rating
Add to Favorites
Posted on January 16th, 2007
4646 views
PHP Coding
Feeding me the News

So in today's world of dynamic content, news and information are always being passed from site to site, user to user within seconds, and even minutes on the occasionally dugg website.  One way that the super information highway keeps you in sync with your favorite websites like Pixel2Life is through the use of RSS Feeds.

Running a quick Google Search, you will find out the following:

RSS is a family of XML file formats for web syndication used by news websites and weblogs. They are used to provide items containing short descriptions of web content together with a link to the full version of the content. This information is delivered as an XML file called RSS feed, webfeed, RSS stream, or RSS channel.
~ http://en.wikipedia.org/wiki/RSS_Feed

Programs have been developed in which people can read up on their latest sites from their desktops, and even websites. Even my buddy ole pal, Dan Richard has an RSS Parser running on his blog which shows the user the latest tutorials from P2L.  Showing and sharing information is always good, isn't it?

So now that you have some background information about RSS Feeds and such, lets get down to the dirty work.

Sorry for you folks still running sloppy PHP4, this tutorial is not for you.  This requires the use of PHP5 and the SimpleXML extention.  For more information, visit http://www.php.net/SimpleXML





Harvesting the Good News
Since the entire SimpleXML class is compiled internally into your PHP installation, you do not need to worry about loading any files with the class in it.  All you have to do is create a new instance of the class.

In this tutorial, we will be creating a RSSParser class, in which you can get the feed information and all the feed items from that and then have the capability to use that information on your own site.  Lets start shall we?

Keep your classes organized by files.  It is good to store one lengthy class in its own file for easy debugging and editing.  Plus, its the cool thing to do =)

Creating our class, we will need to define some internal variables that will be parsed from function to function within that instance.  When we are creating the class, lets keep the following in mind:
  • URL in which feed is located
  • The source of that feed ( raw page source )
  • The instance of the SimpleXML class
  • Information about the feed, ( title, url, title, avatar, so forth )
  • The items themselves, news posts, tutorials, or what ever content we are delivering



Construction of the __construct
Since PHP 5.0, there is this really handy function called __construct().  It is called a Constructor, and that function is initiated when ever the class is called.  It is similar, the one and same as using the class name as a function, that would be a constructor as well.

class className
{
    // PHP 5
    function __construct ()
    {
   
    }
   
    // PHP 5 - PHP 4
    function className ()
    {
   
    }
}


If we have a script that only will work in PHP 5, it would be a good idea to make it only work using PHP 5 methods.  That said, lets try and make the class itself fool-proof, not allowing the little kiddies still running 4 or even 3 on their server environments and stopping them dead in their tracks.

if ( intval( phpversion() ) < 5 )
{
    die ( 'PHP5 is required to execute this class.' );
}


With this, if they do not have the correct PHP version, which is anything less than 5, the script dies automatically and nothing is done.

else if ( !class_exists ( 'SimpleXMLElement' ) )
{
    die ( 'Please re-compile PHP5 with the simpleXmlElement extention.' );
}


If they do not have the SimpleXML library installed into their installation either, we will stop them.  These are the two core necessities in order to run the class.


Finishing the Construction
Next in the __contruct function, we are going to set the url of the RSS Feed that we are parsing internally in the class using a function called setRSS();.  We easily could of done $this->url = $url, but that doesn't look cool.  So we will dedicate an entire function just to do so.

function setRSS ( $url )
{
    $this->url = $url;
}


Now that the URL is stored in the class, we will need to get the raw source of that file.  Given that, we can parse the RSS Feed and get the values.  One of the easiest functions to do so is the file_get_contents().  It is widely supported on shared web hosts and it is decently fast.  Lets just say, fast enough to get the job done.  We will store the source code into the $this->feed variable.

function getRSS ()
{
    $this->feed = file_get_contents ( $this->url )
        or die ( 'RSS feed was not found' );
}


Remember that $this-> refers to THIS class.  In PHP5, you can also use self:: but that is another story =)



Power to the PHP
So now, your class will set the url of the feed internally as well as the source code of that feed into the class.  Now we will need to get into the dirty work of some real PHP at play.  Lets start by creating our SimpleXML class and initiating the real core of this class.

$this->xml = new SimpleXMLElement ( $this->feed );


What this does is calls the SimpleXMLElement class with the feed source that we called previously and stores that class now into the $this->xml variable.

The SimpleXMLElement class stores everything in objects, not arrays.  Objects are generally cleaner to work with.  Unlink arrays, you can't serialize() objects and transport them where ever you want.

$this->channel = $this->xml->channel;


Looking at the RSS Structure, you will know that the information about the RSS Feed, or Channel is stored in <channel> tags.  All those objects that you see are now stored in that variable.  Lets put that information into an array about the feed so we can call it later if needed.

Try dumping the information that you have stored in the class, you will find it very interesting on how PHP sees the $this variable.  Do it for yourself! print_r( $this );

$this->feed = array
(
    'title' => $this->clean ( $this->channel->title ),
    'description' => $this->clean ( $this->channel->description ),
    'link' => $this->clean ( $this->channel->link ),
    'date' => $this->clean ( $this->channel->pubDate ),
    'image' => ( $this->channel->image->url ) ? $this->clean ( $this->channel->image->url ) : false,
);


What you see here is an array with the information stored about the RSS Feed.  I have chosen these sets of data because they are generally required on all RS Feeds and are widely used since it is a staple and valid standard by w3.  We will get the Title of the RSS Feed, alone with the Feed Description, the Link provided, the Date of publication and the Image that they are using as an avatar of the feed.

Look closely, you will notice a function in the class: $this->clean()  It is something that we will take a look at later on.



Wrapping it all up

So all we need is to put all meet of the RSS Feed into an array, so we can play with it and manipulate it how we want to display or use it.  Well not all RSS Feeds are created equally, that being said, not all RSS Feeds are always full, some are sometimes empty, like the comment feeds here on Pixel2Life.

What needs to be checked is that there are <item> objects, so we are going to loop through all of them.

if ( is_object ( $this->channel->item ) && count( $this->channel->item ) )
{
    foreach ( $this->channel->item as $item )
    {
        $this->items[] = array
        (
            'title' => $this->clean ( $item->title ),
            'link' => $this->clean ( $item->link ),
            'description' => $this->clean ( $item->description ),
            'category' => $this->clean ( $item->category ),
            'image' => ( $item->enclosure['url'] ) ? $this->clean ( $item->enclosure['url'] ) : false,
        );
    }
}


Since the SimpleXMLElement class stores its values in objects, we want to make sure that there are <item> available and then we will loop through all of them and add them to our items array.  We will also check that if they have an image for the item, we will add it as well, otherwise not to worry about it.

Looking at Clean

If you look throughout the entire class, you will notice that you have the clean function repeatedly used.  What it does is clean off the object tags and attribute from the SimpleXMLElement class, into something that is more user friendly.  So we return anything ran through the clean function to be a string, and to be more xhtml friendly, we transform all special html characters to their raw versions.

function clean ( $i )
{
    return (string) htmlspecialchars ( html_entity_decode ( $i ) );
}


Try using the class without the clean function, what do you notice?



Full Source Code

[code=PHP]<?php

/* ------------------------------------------------- */
##    RSSParser
/* ------------------------------------------------- */
/* ------------------------------------------------- */
// Using the SimpleXmlElement extention built into
// PHP5, take any RSS feed and parse the contents.
//
// Author: Jamie Chung ( Chaos King )
// Email: jamie [--a.t.--] notanotherportfolio.com
/* ------------------------------------------------- */


class RSSParser
{
    var $url;
    # (string) - URL of feed
   
    var $page;
    # (string) - Raw file contents of RSS Feed
   
    var $xml;
    # (object) - Object data of RSS Feed
   
    var $channel;
    # (object) - Channel Object containing feed information and items
   
    var $items;
    # (array) - RSS Items
   
    var $feed;
    # (array) - Feed Information ( title, desc, publish date )
   
    /*
        Class Constrictor
        Arguements:
            url: Feed URL, can be a local file, or online ( http:// )
            - url is required in order to execute the constrictor
    */
   
    function __construct ( $url )
    {
        /*
            Do we have PHP5 Installed?
            If we do not have it installed,
            Kill the script immediately.
        */
        if ( intval( phpversion() ) < 5 )
        {
            die ( 'PHP5 is required to execute this class.' );
        }
        /*
            Does the extention class exist?
            Since it is an internal class
            Compiled into PHP5, we can check
            Whether it is installed or not.
        */
        else if ( !class_exists ( 'SimpleXMLElement' ) )
        {
            die ( 'Please re-compile PHP5 with the simpleXmlElement extention.' );
        }
       
        // Set the URL of the feed internally.       
        $this->setRSS ( $url );
       
        // Get the page contents of that feed.
        $this->getRSS ();
       
        // Parse RSS information
        $this->parseRSS ();
    }
   
    /*
        Function: setRSS
        Arguements:
            url - RSS Feed url which is set interally
            - url is required to run this function
    */
   
    function setRSS ( $url )
    {
        $this->url = $url;
    }
   
    /*
        Function getRSS
        - Get the feed source of the rss feed
    */
   
    function getRSS ()
    {
        $this->feed = file_get_contents ( $this->url )
            or die ( 'RSS feed was not found' );
    }
   
    /*
        Function: parseRSS
        - Parses the rss source
        - Places feed items in array: $this->items
        - Places feed details in array: $this->feed
    */
   
    function parseRSS ()
    {
        // Since the extention is loaded, lets create a new
        // instance of this class.
        $this->xml = new SimpleXMLElement ( $this->feed );
       
        // The XML Object has another child called channel.
        // It holds the RSS details as well as items
        $this->channel = $this->xml->channel;
       
        // Lets set the feed details
        // - Information about the RSS Feed
        $this->feed = array
        (
            'title' => $this->clean ( $this->channel->title ),
            'description' => $this->clean ( $this->channel->description ),
            'link' => $this->clean ( $this->channel->link ),
            'date' => $this->clean ( $this->channel->pubDate ),
            'image' => ( $this->channel->image->url ) ? $this->clean ( $this->channel->image->url ) : false,
        );
       
        // Checks if we have any items present.
        // Yes, it is possible that a feed is empty =/
       
        if ( is_object ( $this->channel->item ) && count( $this->channel->item ) )
        {
            // Lets loop through all the <item> objects
            foreach ( $this->channel->item as $item )
            {
                // Add an item to the array
                $this->items[] = array
                (
                    'title' => $this->clean ( $item->title ),
                    'link' => $this->clean ( $item->link ),
                    'description' => $this->clean ( $item->description ),
                    'category' => $this->clean ( $item->category ),
                    'image' => ( $item->enclosure['url'] ) ? $this->clean ( $item->enclosure['url'] ) : false,
                );
            }
        }
    }
   
    /*
        Function clean
        Argueuemts:
            i - string in which to clean.
                Cleans off the object tag from an object variable.
    */
   
    function clean ( $i )
    {
        return (string) htmlspecialchars ( html_entity_decode ( $i ) );
    }
}

$RSSParser = new RSSParser ( 'http://www.pixel2life.com/feeds/latest_20_tuts.xml' );

echo '<pre>';
print_r( $RSSParser->feed );
print_r( $RSSParser->items );

?>[/code]
Dig this tutorial?
Thank the author by sending him a few P2L credits!

Send
Chaos King

Experienced Web Developer and a Senior Developer at Pixel2Life.com. Being a role model to many future web developers, its what I do best. I enjoy long walks on the beach and arguing with Faken about life.
View Full Profile Add as Friend Send PM
Pixel2Life Home Advanced Search Search Tutorial Index Publish Tutorials Community Forums Web Hosting P2L On Facebook P2L On Twitter P2L Feeds Tutorial Index Publish Tutorials Community Forums Web Hosting P2L On Facebook P2L On Twitter P2L Feeds Pixel2life Homepage Submit a Tutorial Publish a Tutorial Join our Forums P2L Marketplace Advertise on P2L P2L Website Hosting Help and FAQ Topsites Link Exchange P2L RSS Feeds P2L Sitemap Contact Us Privacy Statement Legal P2L Facebook Fanpage Follow us on Twitter P2L Studios Portal P2L Website Hosting Back to Top