RSS Aggregation Models

By Deane Barker on September 23, 2004

It struck me last night that there are two models of RSS aggregation: “real-time” and “stored” (yes, I just made those two terms up...).

Real-time are aggregators like Mozilla’s Sage extension. This model goes and gets the feed real time and displays it on-demand. In a lot of ways, they’re not even aggregators. They’re just different ways of looking at the content on your site – just like rearranging and simplifying the HTML version.

On the other hand, we have the stored aggregators. When this model refreshes the feed, it creates autonomous “objects” for each post. These objects are persistent beyond connection to the RSS feed. Additionally, you can manipulate the objects. For instance, with aggregators built into mail clients (like NewsGator and the awesome new RSS functioinality in Thunderbird), it’s very easy to just forward a post like an email.

(I thought about calling them “caching” instead of “stored,” but the term “cache” implies that it’s holding something for a set period of time for efficiency reasons. I don’t think that applies here.)

Now, this differentiation isn’t earth-shaking, except that the latter type of aggregator is essentially the same thing as someone visiting your site and hitting “Save page as...” and storing copies of your site on their machine.

This effectively circumvents the implicit advantage of a Web site in that it’s supposed to always contain the most up-to-date information. Users with a aggregator that stores posts can very easily be looking at an old copy of some content from your site.

For instance, I was opining about my newest TV obssession over at my fledging personal blog. I’m using WordPress over there (love it, incidentally), and it’s much easier (much, much easier) than Movable Type to keep working on something in draft. However, eventually I published the post and then kept right on editing – I made quite a few changes after it hit the site (and the feed) for the first time.

This morning, I get to work, and Thunderbird has a copy of the post that’s about three versions old. I look in Sage, and, of course, it’s the latest version.

It all comes down to how your aggregator handles modified posts. I know that NewsGator can be configured to download them new every time they change. But this ends up being kind of goofy because if someone like me keeps editing, you get a new post for every edit that the aggregator detects.

Other aggregators will just highlight an edited post as “unread” (the convention is to make it bold). Others (like Thunderbird, for instance), will apparently not do anything – the first version is the one they keep.

Like I said, this isn’t anything earth-shaking, but it’s worth keeping in mind. We tend to think of the Web as being a “they see what I have up there right now” affair. With RSS, this may or may not be true.

Comments (2)

Joe says:

SharpReader has a feature that specifically marks edited posts differently from unread posts (bold italics vs bold).

Unfortunately, there’s no way to tell if edited means “There’s been an update added to this article that provides new information”, or if edited means “Turns out you don’t spell ‘potato’ with 5 o’s”. I don’t think there’s a very good way around that. Maybe if there are new lines added? Readers that track that difference should also figure out how different the articles are, and adjust accordingly. Differences of less than a few percent could be ignored.

Deane says:

“Differences of less than a few percent could be ignored.”

I thought about that too, but in the English language, a change of just a few characters could make a HUGE difference in the meaning of a phrase.

I know – let’s add versioning to the RSS spec! So you either edit the existing version (small change), or publish a new version (big change).

I’m sure it’s coming.