One of the planned features for The Open Source Zone is an RSS aggregator that could be used to fetch and aggregate news channels from project websites, blogs, Freshmeat announcements and the like.
Obviously, I want to reuse the best Open Source foundations available for accomplishing this task. Looking around for Java-based solutions I found Rome and the Jakarta FeedParser.
The latter seem somewhat more mature and it already includes “an advanced networking layer which meets the requirements necessary for providing XML aggregations services over HTTP. This includes support for If-None-Match (ETags), If-Modified-Since (HTTP 304 Not Modified), gzip content encoding (compression), User Agent modification, non-infinite timeouts, event callbacks for download progress, support for setting HTTP Referrer headers, maximum content downloads (no files larger than N bytes), ability to use custom HTTP methods (HEAD, GET, PUT, POST) etc.”
It also supports autodiscovery and apparently it is being used by Rojo, so it’s not vaporware.
On the other hand, a suitable networking layer is available for Rome as a subproject. Moreover, there is at least one implementation of a persistence mechanism for feeds (Aqueduct-Prevayler) while there doesn’t seem to be one for FeedParser.
Everything considered, I’d be inclined to start experimenting with FeedParser, unless you, my dear readers, have some suggestions to make. In which case, please leave a comment.
Update: my first brush with FeedParser didn’t exactly inspire much confidence in me, as there is no downloadable distribution, but you have to use Subversion and the SVN URL on the website is wrong (hint: the correct one seems to be http://svn.apache.org/repos/asf/jakarta/commons/proper/feedparser/trunk/). Then no build instructions are provided. Looks like it uses Maven :(. Luckily, a plain Ant build file is provided and I managed to build a JAR file.

