Regular visitors to my blog (as if!) will have noted that I have a rather unhealthy fixation with my newsfeed aggregator application. I wrote it about a year ago and despite the passage of time (a year is a long time in Java development) I am still very proud of it.
A scribbly application architecture diagram (UML, I wish I understood you).

It was originally modelled after the JavaRSS approach. At the time I had also written a 3 pane view newsfeed aggregator application which is currently part of our University portal. I personally prefer this one page view. I have tried to tweak it to work with a larger selection of newsfeeds, GoFetch, but the expand/collapse paradigm doesn't quite work for me on this occasion somehow.
Anyhoo, I have been revisiting some of the code recently and have tidied it up considerably. I have always been happy to share the code with those that asked but it came with a big disclaimer that it was somewhat messy. Having sorted it out, I now think I have something approaching professional. I may even set up a SourceForge project for it although that arena is somewhat overcrowded already. An application shortly to join this throng is Microsoft's RSS Platform.
My feed aggregator application does have some very nice features which I will describe shortly. The simplicity of the application has come at the cost of being reliant on numerous third party libraries. NamelySpring,EHCache,EHCache Constructs,Quartz,ROME,ROME Fetcher,XMLWriter, JSTL, Xalan, Xerces and the usual suspects from the Jakarta Commons.
The application is now very much simpler than before and at its heart it only makes use of 6 java classes.
XBEL based feed collection management
The basis of my feed collection management is the XBEL file format (I could have easily used OPML also). The collection of feeds is stored in XBEL and I create what I call a "populated" XBEL output. This populated XBEL includes the actual feed items that I want to display. For example:
XBEL
<?xml version="1.0"?>
<xbel>
<folder>
<title>Blogs</title>
<bookmark title="Mark McLaren's Weblog" href="http://blog.mark-mclaren.info"/>
</folder>
</xbel>
becomes
XBEL populated
<?xml version="1.0"?>
<xbel>
<folder>
<title>Blogs</title>
<folder>
<bookmark title="Mark McLaren's Weblog" href="http://blog.mark-mclaren.info"/>
<folder>
<bookmark title="Feed caching using EHCache, Spring and ROME FeedFetcher" href=""http://blog.mark-mclaren.info/2006/03/feed-caching-using-ehcache-spring-and.html" info="1142964250859"/>
<bookmark title="The RSS Platform concept" href=""http://blog.mark-mclaren.info/2006/03/rss-platform-concept.html" info="1142964250859"/>
<bookmark title="Keith Donald answers my, rather stupid, questions on Spring Web Flow" href=""http://blog.mark-mclaren.info/2006/03/keith-donald-answers-my-rather-stupid.html" info="1142964250859"/>
<bookmark title="Yet another Google Suggest Clone - Take 2" href=""http://blog.mark-mclaren.info/2006/02/yet-another-google-suggest-clone-take-2.html" info="1142964250859"/>
<bookmark title="How did I miss Type-Ahead behaviour?" href=""http://blog.mark-mclaren.info/2006/02/how-did-i-miss-type-ahead-behaviour.html" info="1142964250859"/></folder>
</folder>
</folder>
</xbel>
One source of complexity in my original application was that I used Castor to support the XBEL format (I had this code lying around as I was using this elsewhere). I decided that I didn't actually need Castor or an object representation of XBEL in this case. I now use DOM/SAX directly to traverse and extract data from XBEL files and I make use of XMLWriter to create populated XBEL files.
EHCache and ROME Fetcher powered feed fetch mechanism
I talked about this in my last entry. The feed fetching engine is ROME and ROME Fetcher. I have created an instance of ROME Fetcher that make use of EHCache. In reviewing the code, I noticed that although technically I had three caches, I only actually needed two EHCaches (since SyndFeedInfo includes the SyndFeed object). Plus I also noticed that I had been caching the entire SyndFeed instead of just the URL string. So fixing these issues I have some performance improvements right there! Moving to a Spring Framework powered EHCache implementation has also nicely reduced the complexity of the code involved.
Quartz based scheduling
I wrote a ServletListener that begins polling the feeds in the background when you start the application up. I did look into replacing this with a Spring powered implementation but on this occasion there weren't any great advantages to be had in doing so. It would mean replacing two classes with two alternative classes (and this would add an unnecessary additional dependency and external configuration file maintenance).
EHCache and EHConstruct based dynamic page caching
My feed aggregator view is cached via a page caching filter. This means it makes use of the conditional get mechanism itself. When this page is accessed it is cached in the browser for a period of time which reduces the load on the backend processing.
The final rendering and JavaScript enabled functionality
The final rendering is achieved via an XSL transformation of the populated XBEL (I use JSTL to do this but I could have easily used a servlet). HTML DOM processing is performed by JavaScript which, with the aid of cookies, highlights new feeds to the user. I had to bend the XHTML standards slightly to achieve this to support a custom attribute.
I'm happy for people to download and use my code, I'd be very interested to hear what you are using it for and any modifications you make to it.
I'm sure I could improve it still further, removing further hard coded variable references but it does the job.
I now plan to take it in a JSON powered portlet direction...