adelaide-blog-fu

Web

James has deployed Adelaide Blogs v3. Not sure if the pinging stuff is working yet, and v3 seems to be a compromise between the last two big revisions, let's call them v1.5 and v2.

I liked v1.5 because the recent entry stuff seemed to work properly. V2 had some big issues picking up new entries on my blog. I suspect it was just doing some basic screen-scraping or just checking the Last-Modified HTTP header of the cover page and his service provider's transparent proxy was getting in the way. Also, 12hr updates seems unreasonable, especially during the day/evening. The scraping has semantic issues as well, it would pick up changes to people's pages if the user changed their blog's chrome or altered its style, but didn't post a new entry. Posting entries is what it's all about, right? ;)

Still, v1.5 had a bigger problem; anyone hitting the site would cause it to fetch all known RSS syndcation feeds to check for recent entries. This clearly sux0rz because of the excessive number of hits on people's feeds.

I think a better compromise can be found. First, ping feeds more frequently, say every hour or two. When fetching a feed, note the entries in the feed's RSS items sequence and the response's Last-Modified and Expires headers (if present). If no Last-Modified response header was provided, use the current date as the last-modified date.

Then before re-fetching later on, if the last fetch produced an Expires response header and the current date is less than the date specified by the header, don't fetch the feed again. Otherwise fetch it with the If-Modified-Since request header set to be the last modified date for the feed. You'll get a 304 (Not Modified) response if the server understood the header and the feed has not been modified, or a normal response otherwise. If a 304 is returned, the blog hasn't got any new content. If a normal response is returned, examine the items sequence in the feed, and if it is different from the last sequence, the blog has been updated.

The use of the Expires header allows a savvy user to control how often their feed is pinged. Using the If-Modified-Since header should reduce the amout of traffic when pinging feeds. And the user gets accurate, timely recent posts from their favourite Adelaide blogs.

Unfortunately, it seems that my server isn't doing the right thing when sent an If-Modified-Since. I'll have to look into that. >:|

Update: For blogs with no RSS feed, still do the same as above, but use an MD5 hash of the blog's content to determine if the blog has changed rather than examining the RSS item sequence. Easy!

Posted Monday, January 12, 2004 at 11:25.

TrackBacks

TrackBack URL for this post: http://volition.vee.net/mt/mt-idle-trackback.cgi/191

Comments

Add a Comment



(Optional)


(Optional)


Preview your comment before submitting.