Basic Parse With PHP – Part Two
In part one of PHP Parsing, I showed how you could take information presented on another website and use it on your own. In part two, I will show how to do something more useful – how to parse RSS. RSS is a very important part of the internet now and it is being widely adopted by many sites as a method of keeping browsers informed with the latest news and offerings.
In the example script, I will show you how you can parse an ebay RSS feed based on search criteria that is passed to it via a URL variable named st eg. parse.php?st=searchtext. The code will be split up into sections and I’ll explain how it works underneath it.
<?php
// create rssItem class
class rssItem {
var $rssItemTitle;
var $rssItemLink;
var $rssItemDescription;
}
// working variables
$feedTitle = "";
$feedLink = "";
$feedDescription = "";
$arItems = array();
$itemCount = 0;
// feed variables, expects ?st= on URL
$searchTerm = str_replace(" ","+",$_GET['st']);
$rssFile = "http://rss.api.ebay.com/ws/rssapi?FeedName=SearchResults&siteId=15&language=en-AU&output=RSS20&saprchi=&sacat=-1&saprclo=&ftrv=1&from=R10&satitle=$searchTerm&ftrt=1&catref=C6&fsop=1&fsoo=1";
// descriptions (true or false) goes here
$showDescriptions = true;
The first part of the script is used to create a class and define variables. We create a class named rssItem to hold our RSS data. Inside this class we create variables that we need to use to store the various elements of the feed such as title, link and description. We also define some working variables to hold information as required and we configure a link to the RSS file we require plus we define an array to hold the headline objects. There is also a boolean variable set to either true or false for determinign whether it displays the item descriptions.
The next three functions are based of functions that are found in Chapter 22 of the book PHP Developers Cookbook by Sterling Hughes [Sams], an absolutely invaluable source of PHP inspiration. At the bottom of this post, I will list the four PHP books I own and refer to solve all my issues.
function startElement($parser, $name, $attrs) {
global $curTag;
$curTag .= "^$name";
}
function endElement($parser, $name) {
global $curTag;
$caret_pos = strrpos($curTag,'^');
$curTag = substr($curTag,0,$caret_pos);
}
function characterData($parser, $data) { global $curTag; // get the Channel information first
global $feedTitle, $feedLink, $feedDescription;
$titleKey = "^RSS^CHANNEL^TITLE";
$linkKey = "^RSS^CHANNEL^LINK";
$descKey = "^RSS^CHANNEL^DESCRIPTION";
if ($curTag == $titleKey) {
$feedTitle = $data;
}
elseif ($curTag == $linkKey) {
$feedLink = $data;
}
elseif ($curTag == $descKey) {
$feedDescription = $data;
}
// now get the items
global $arItems, $itemCount;
$itemTitleKey = "^RSS^CHANNEL^ITEM^TITLE";
$itemLinkKey = "^RSS^CHANNEL^ITEM^LINK";
$itemDescKey = "^RSS^CHANNEL^ITEM^DESCRIPTION";
if ($curTag == $itemTitleKey) {
// make new rssItem
$arItems[$itemCount] = new rssItem();
// set new item object's properties
$arItems[$itemCount]->rssItemTitle = $data;
}
elseif ($curTag == $itemLinkKey) {
$arItems[$itemCount]->rssItemLink = $data;
}
elseif ($curTag == $itemDescKey) {
$arItems[$itemCount]->rssItemDescription = $data;
// increment item counter
$itemCount++;
}
}
These functions – startElement, endElement, and characterData are used to extract the data contained inside the XML document. So to parse an XML document in PHP, you will need to define three functions to handle what the parser encounters:
- the start element of a tag – Function startElement – eg. <item>
- the end element of a tag – Function endElement – eg. </item>
- the data within these tags – Function characterData eg. This is a test item.
The way we handle these functions is by setting a global variable ($curTag) to a string containg all the parent tags separated by a caret (^). You could change this to any other character like a comma or a colon if you wish. This would mean that the $curTag variable could hold a value similar to ^RSS^CHANNEL^ITEM.
Once the parser has for example found the <ITEM> tag, we have to check for when the parser has found the correct $curTag, and extracts the data via the characterData function. This function checks if the $curTag contains something we want to extract, and if true, assigns it to our variables. The characterData function is able to extract the general information inside the RSS as well as any items it comes across. For each item this function comes across, it creates a new xItem, and inserts it into our $arItems array with the data it has found in the RSS.
// main program portion - start parser
$xml_parser = xml_parser_create();
// use our above functions when elements or data is found
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "characterData");
We now start the actual parsing. Luckily, PHP has a standard function for XML parsing. We can easily activate this by declaring a variable that accesses the PHP function xml_parser_create(). Once we have done this, we have access to the other XML functions built into PHP. The code tells PHP’s XML parser we want our to use our functions when the parser comes accross a start tag, end tag, or character data.
// open the RSS feed as specified in $rssFile
if (!($fp = fopen($rssFile,"r"))) {
die ("could not open RSS for input");
}
// if successfully opened, parse the file
while ($data = fread($fp, 8192)) {
if (!xml_parse($xml_parser, $data, feof($fp))) {
die(sprintf("XML error: %s at line %d", xml_error_string(xml_get_error_code($xml_parser)), xml_get_current_line_number($xml_parser)));
}
}
xml_parser_free($xml_parser);
This portion loads in the RSS file as specified in $rssFile and if found, assigns the contents of the RSS file into the variable $fp. We then proceed to parse through the data we have found using the xmp_parse() function until there is no more data to process. There is also some inbuilt error trapping should there be an error inside the RSS file. Once done, we close down the XML objects used by using the function xml_parser_free() so that we reclaim any used memory.
// write out the items
echo ("<html><head><meta name =\"description\" content=\"$feedDescription\"></head>");
echo ("<body bgcolor=\"#ffffff\" style=\"font:normal 13pt 'Trebuchet MS', Georgia, 'Times New Roman', Times;color:#333333;\">");
//echo ("Link to feed: <a href=\"$feedLink\">$feedTitle</a><br/><br/>");
for ($i=0;$i<count($arItems);$i++) {
$trssItem = $arItems[$i];
echo ("<a href=\"$trssItem->rssItemLink\"><strong>$trssItem->rssItemTitle</strong></a><br/>");
if ($showDescriptions) {
echo ($trssItem->rssItemDescription);
echo ("<br/><br/>");
}
}
echo ("</body></html>");
?>
After we have successfully parsed the RSS file, we have our data inside our declared objects and variables and this will make formatting it on the screen relatively simple. Essentially, you loop through your array with a for loop starting from zero to the upper boundry of the array. You then assign the current array item to a temporary variable and print out each element. As each element in the array contains an object as specified inside your rssItem class, you will need to access your data using something like $trssItem->rssItemTitle to get the item title. This continues until you have displayed all of your arrays elements.
Let’s now see the script in action. Click here to run the script that will search ebay.com.au with the search term “wacom intuos3″ and parse the returned RSS. Enjoy this script and make your own copy on your site.
Recommended PHP Reading
- PHP Developers Cookbook by Sterling Hughes [Sams]
- Web Database Application with PHP and MySQL, 2nd Edition by Hugh E. Williams and David Lane [O'Reilly]
- PHP and MySQL for Dynamic Web Sites : Visual QuickPro Guide by Larry Ullman [Peachpit Press]
If you are an absolute PHP beginner, one of the best books to learn with is PHP/MySQL Programming for the Absolute Beginner by Andy Harris.