theBand
theBlueSmokeBand home

©2002 — 2024

A Few Words
      

Relevant Links:
Technology

Syndication's What You Need
(An RSS How To)

Introduction

RSS, standing variously for "Rich Site Summary," "RDF Site Summary," "Really Simple Syndication" and many more, heralded a new era of content sharing. Vast content and news repositories, such as BBC News, or Nature magazine, could boil down their articles into a standard, or at least a handful of standards. These could be hoovered up by anyone or anything who understood the standard, then aggregated and syndicated across the rest of the web. RSS software which turns feeds into readable, clickable, web-enabled content is available for most platforms and/or browsers.

This short essay describes how you can syndicate content from any publicly accessible RSS feed onto your webpage. It will hopefully cover the technology required in as much detail as is non-specific: you may need to ask your systems administrator or, failing that, a mature adult for assistance.

A final word of caution: I provide no warranty for the below. Use it all at your own risk. If you cut and paste this code without understanding the underlying languages, then that's your decision. You should only ever use programs you understand and can trust, in an environment where they're proven to be trustworthy. As far as I'm concerned it all works fine, but I'm certainly not proven to be trustworthy. Don't say I didn't warn you.

The process

In order to play on the strengths of each technology, I used a two-step process to syndicate the content, as emboldened below:

Content → RSS feed → Your PHPYour Javascript → Your website

The remote website has content, and offers an RSS feed to the world: you use a scrap of PHP to read that feed and put it into a format you want; you then use a second scrap of Javascript to look at the PHP.

I make no claims that this is the perfect method. It works, surprisingly easily, as I'll explain below.

  1. Content

    First catch your hare. What content do you want to host on your webpage? This will largely be a stylistic decision on your part, but try not to host the most obvious content, especially if it's not related to your own site.

    If you want any hints then most, if not all, LiveJournal syndications work off RSS feeds, so you can always trace them to their source. More usefully, LiveJournals themselves expose an RSS interface, if you know where to look: for a given user username, an RSS feed will be available at http://www.livejournal.com/users/username/rss. In addition, browsers will nowadays alert you to the presence of an RSS feed for a particular site: Firefox, for example, will show a little orange icon at the bottom right of the page.

  2. RSS Feed

    Once you have an RSS feed's URL, you might want to look at the feed itself, to see what you'll be able to access. Typically the feed will only contain a digest of each article: an introductory paragraph, or the first few sentences of the article itself. It will, however, provide you with a title and a link back to the article. Looking at this now will help you to recognise when your syndication is working: don't expect to see the whole of the Fox website tastefully grafted into your own page.

  3. Your PHP

    Understanding, getting, using PHP

    I won't dwell on this. I'm not your O'Reilly. Put as quickly as possible, PHP is a programming language similar to (but simplified from) Perl, which is embedded in webpages. Before a PHP-enabled webserver delivers a page to a remote browser, it runs any PHP within the page and replaces the fragments of code with whatever output they would have generated.

    You need to have PHP capability already on your system. Check with your webserver administrators. It may only work in pages (a) suffixed ".php" and (b) in your CGI directory. It might work anywhere. Every web server is a unique snowflake, more's the pity.

    PHP for RSS

    I initially plumped for an all-Javascript method of content syndication. However, while Javascript has its strengths, support for XML transformations (creating basic (X)HTML from an RSS feed) seemed to be patchy, and I was nearly resorting to laborious and dirty DOM work. I was also only supporting one of the RSS standards, and support for further standards would require a lot of work.

    Along came Magpie RSS. This is a PHP-based RSS reader that was very simple to install. After downloading and unzipping the distribution from the website, do the following:

    • Make a subdirectory of your CGI directory called, say, magpierss/.
    • Copy the four *.inc files and the directory extlib from the Magpie RSS distribution into this directory. This is all the "configuration" necessary.
    • Write a document called e.g. rss.php in the directory containing magpierss/, including the following:
    <!-- Some HTML here, if you want: remember
    it will be picked up by the Javascript later -->
    <?
      require_once('magpierss/rss_fetch.inc');
    ?>
    <-- Some more HTML here: again, caveat scriptor -->

    The PHP page will now be able to fetch an RSS feed. Now you need to specify a feed URL, and convert the result into HTML. The following example, to be included in the same page, fetches an RSS feed and returns a list of links to the articles mentioned in the feed:

    <!-- HTML here, including the above
    declaration -->
    <?
      $urls = array( "BBC" => "http://location.of.bbc.rss/",
                "NAT"  => "http://location.of.nature.rss/");
    
      $rss = fetch_rss($urls[$_GET['url']]);
      echo "<ul";
      foreach ($rss->items as $item) {
        $href  = $item['link'];
        $title = $item['title'];
        echo "<li><a href='$href'>$title</a></li>";
      }
      echo "</ul>";
    ?>
    <!-- Final splash of HTML here
    if you want a footer -->

    Remember that any HTML in this document will be scarfed by the Javascript: moreover, if you want the Javascript to be able to modify the HTML easily, you should ensure that it is also well-formed XML: close all tags, quote all attributes etc.

    We have hidden the URLs you want to request within the PHP. The Javascript just passes a predefined code to this page. It's bad form to have the PHP script willing to request any page, because the Javascript can be reverse-engineered. That means: anyone can find your PHP page. And if it can be used to access any RSS feed on the web and make it readable, then anyone could use your PHP page for their own website, unless you restrict the PHP to a select few RSS feeds.

    We now have a stand-alone page which accesses a remote RSS feed and turns it into browser-readable HTML. Now to embed it in another page.

  4. Javascript: the XMLHTTPRequest function

    Javascript has improved a great deal since the days when a sniff of it could crash Netscape, or cause IE to crash Windows on your behalf. One of the most recent add-ons is the XMLHTTPRequest object. This makes an HTTP request (again, like browsers normally do for webpages) and expects to find XML in return. It can accept non-XML text too, although that limits its functionality somewhat.

    Implementations of the object vary from browser to browser. IE uses a Windows ActiveX component; recent Firefox and Safari versions have aimed to keep the object native to Javascript. IE's method bites twice, as the ActiveX component needs to be recreated every time.

    We will aim to do the following:

    • If the browser has native XMLHTTPRequest functionality, create a one-off object
    • Otherwise, create an object every time
    • Use the object to grab our PHP page
    • Drop it, verbatim, into a named HTML element

    As preparation, write the following HTML where you want the syndication to appear:

    <a href="javascript:getSynd('BBC');"
    >BBC News</a> <div id="syndicElem"></div>

    This should go in your homepage somewhere, or the templates for your blog pages: anywhere you want the syndication to happen. The <div> element should be empty, but make sure you give it an id attribute.

    The following Javascript, embedded in <script> tags in your page, should create the necessary objects and provide the getSynd() function shown above.

    // non-IE: create this object once.
    if (window.XMLHttpRequest) {
      getFileReq = new XMLHttpRequest();
    }
    
    // The main function
    function getSynd(code) {
      // IE still needs a request object
      if (!window.XMLHttpRequest) {
        getFileReq = new ActiveXObject("Microsoft.XMLHTTP");
      }
    
      // When the HTTP transfer is over,
      // the internal state of the object will change
      // When that happens, it will run the function
      // assigned here
      getFileReq.onreadystatechange = procReqChange;
    
      // Make the request and close the connection  
      getFileReq.open("GET", "http://location.of/rss.php?url='+code);
      getFileReq.send(null);
    }
    
    function procReqChange() {
      // If the HTTP code changes at all, this function is called
      // If it's a success, then populate the element with the
      // HTML we've received
      if (getFileReq.readyState == 4) {
    
        // Get the element with ID 'syndicElem'
        elem = document.getElementById('syndicElem');
        elem.innerHTML = getFileReq.responseText;
      }
    }

    If you include this in the same page as the <a> tag shown far above, then clicking on the link shown on the page should cause the space below it to be filled with the results of your PHP script. Or, alternatively, filled with the error messages that are just as likely to appear. Consider error messages to be Mother Nature's signposts on the road to enlightenment. Only Mother Nature only speaks Ogham, so they rarely help.

Done?

You should now have working syndication, albeit in a rudimentary manner. Now you have to ask yourself: How does the output look? Does it work in all the browsers I want it to work in? What happens if the link is re-clicked? Do I want reloads to happen multiple times? Is the system secure? This is all beyond this introductory article, however: there are plenty of resources on Javascript and PHP out there, and several on the security of web-visible resources.

As a footnote, Google News has recently put the XMLHTTPRequest object to very good use indeed, fetching and displaying configuration options for the user. It's not as visually striking as Google Maps, but it still shows that a fairly basic function, doing what web browsers do a billion times a day, can pull many times its own weight when yoked to the right code. So go out there and write the right code.

© 2005 jps
March