Thursday, February 05, 2009

Serialized Feeds

A serialized feed is one in which posts are arranged in a linear order and where subscribers always begin with the first post, no matter when they subscribe to the feed. This contrasts with an ordinary RSS feed, in which a subscriber will begin with today's post, no matter when the feed started.

The idea of serialized feeds has been around for a while. This page from 2005, for example, allows you to read Cory Doctorow's novel Someone comes to Town, Someone leaves Town by RSS. And Russell Beattie offers serialized books via his Mobdex serialized feeds system. In 2006, a company called FeedCycle offered what it called cyclic feeds. " For example, if you were to take Moby Dick and divide it into 100 parts, and publish them all in one huge RSS feed, that would be a cyclic RSS feed." Feed cycles, as they have come to be called, have also been used for podcasts. Tony Hirst has written about serialized feeds, demonstrating the concept with services like OpenLearn Daily,

There is no academic literature discussing the use of serialized feeds to support online learning, though the subject of paced online learning has been discussed. Anderson, Annand and Wark (2005) examine the question of pacing from the perspective of student interactions. "Increased peer interaction can boost participation and completion rates, and result in learning outcome gains in distance education courses." But the use of serialized feeds does not automatically increase interactions. It is also arguable that pacing itself improves learning outcomes.


Serialized Feeds: Basic Approach

A serialized feed is basically a personalized feed, because each person begins at a different time. Personalized we data is typically managed by CGI or some other server process which gathers relevant information about the user (such as the time he or she subscribed to the feed) and generating the resulting feed. This feed is then typically identified with a serial number, which is processed when the RSS feed is requested by an aggregator.

This approach, however, raises some concerns:
  • First, it creates a scalability issue. RSS feed readers typically access a web site once an hour. If a CGI process is run for each feed, then each user results in 24 CGI requests a day. Even if the frequency is scaled back, having large numbers of users can place a considerable load on server processing.
  • Second, it creates a coordination issue. If each feed is personalized then in order for interaction to occur there needs to be some mechanism created to identify users of relevantly similar feeds.
These problems were addressed by adopting a cohort system for serialized feeds. But first, some discussion on the structure of a serialized feed.

In order so simplify coding, the gRSShopper framework was used. This allowed courses to be constructed out of two basic elements: the page and the post.

The page corresponds to a given course. It consists of typical page elements, such as page title, content and, where appropriate, a file location, along with default templates and project information. Page content defined RSS header content. Pages are identified with a page ID number. The page also has a creation date, which establishes its start date, set by default to the exact time and date the page was created.

The post corresponds to an individual RSS feed item. While a person subscribed to an RSS feed as a whole (corresponding to a page), he or she receives individual posts as RSS posts over time. A course thus consists basically of a page and a series of posts. Posts are identified by post ID numbers. Posts are associated with pages with a thread value corresponding to the ID number of the page.


Serialized Feeds: Pacing

Pacing is managed through two basic elements.

First, each page defined an cohort number. This number establishes the size of the cohort, in days. Thus, is a page offset number is '7', then a new edition of the course will start every 7 days. In the gRSShopper serialized feeds system, a new, serialized, page is created for each cohort. This page is identified by (a) the ID number of the original master page, and (b) the offset from that page, in total number of days, from the start date of the master page. These serialized pages are stored as records in the database.

Second, each post is assigned an offset number. This number defines the number of days after the start of the course that the post is to appear in the RSS feed. For example, suppose the course starts March 10. Suppose the post has an offset number of 6. Then the post should appear in the RSS feed on March 16.

This creates everything we need to create a serialized feed. To begin, we have a master page and series of associated posts:
  • Page Master (time t days, cohort size c)
    • Post 1 (t+o days)
    • Post 2 (t+o days)
    • etc.
The page also has a set of serialized pages, created as needed, each corresponding to an individual cohort:
  • Page Master (time t days)
    • Serialized Page 1 (t+(cr*1)days)
    • Serialized Page 2 (t+(c*2)days)
    • etc.
Each serialized page has a start date d, which is t+(c*2)days, and by comparing the interval i between the current date and the start date, we can determine which post should be posted in its RSS feed - it will be the post or posts with an offset value of i.
  • Page Master (time t days)
    • Serialized Page 1 (t+(c*1)days)
      • Post i1
    • Serialized Page 2 (t+(c*2)days)
      • Post i2
    • etc.

Serialized Feeds:
Processes

Processing to produce the serialized feed occurs in three stages:
  • First, the author creates a master or edited. This creates database records for the master page and for each of the posts associated with the master page.
  • Second, the script creates a series of pages for a given cohort. This occurs when a potential subscriber invokes the subscribe script. Essentially, the script creates the RSS feed content for each day the course runs. These are stored in the database and identified with a cohort number and a publish date.
  • Third, a nightly cron job prints the daily page for each cohort for each course. The idea here is that the script creates a static page that may be accessed any number of times without creating a CGI process. Static pages are stored in a standardized location: base directory/course ID number/cohort offset number
Subscribing to a serialized feed this becomes nothing more than a matter of pointing a browser to the appropriate page. For example, pointing the browser to http://course.downes.ca/course/127/17.xml allows the learner to subscribe to course number 127 (the Fallacies course), cohort number 17 (the cohort that started 17 days after the course was first created). This link is created and displayed by the subscribe script.

This process has several advantages. First, fixes the content of the course to what is currently defined when the student signs up to the course; the course may be edited for subsequent users without changing what was originally defined for previous users. Second, processing time is minimized and front-loaded, allowing the system to scale massively. Third, and most significantly, multiple users are served by the same RSS file. Not only does this save significantly on processing, it also sets up an environment where interaction may be facilitated.

4 comments:

  1. Great stuff, Stephen:-) I'm pushed for the time at the mo, so will comment on this in separate chunks...

    First up, though, here's me trying to get my own thinking straight wrt the history of serialised feeds on OUseful.info - I think I first started thinking around the potential of serialised/daily learning feeds in Paced Content Delivery via RSS, some time before the openlearndaily demo, which used third party tools (feedcycle) to serialise the delivery of OpenLearn content.

    There is no good reason why OpenLearn could not have developed a simple Moodle plugin to start delivering OpenLearn content in this way as a matter of course at that time, apart from the fact that no-one could see the point.... (sigh)

    The feedcycle demo was okay as a demo, but it was completely unsustainable (feeds had to be created by cut'and pasting content by hand into forms to define each feed item), so I followed it by a demonstration of Daily Learning feeds for several dozen Openlearn courses on openlearnigg using the creation of dynamic URLs that included a subscription timestamp and feed serialisation using a Yahoo pipe (for example, Serial Web Feeds via Yahoo Pipes is my first referral to the pipe, OpenLearn_Daily feeds, via OpenLearnigg describes its integration in OpenLearnigg (and suggests a first attempt at an icon) and Static RSS Feed Content, Delivered Daily provides screenshots and an actual description of the pipe).

    More recently, we commissioned a Wordpress plugin to deliver daily feeds form Wordpress blogs, as mentioned in OpenLearn WordPress Plugins; this plugin has been used to proof of concept deliver OpenLearn unit content imported into and republished within a WPMU farm: Serialised OpenLearn Daily RSS Feeds via WordPress.

    There are a great many ways of importing content into Wordpress, so this plugin (I would argue;-) made daily/serialised feeds generally possible...

    I think it might be handy to pull a round-up of daily feed interfaces together: screenshots I have to interfaces around the web are included in these posts: Daily Me, Daily Book Feeds, Serialised OpenLearn Daily RSS Feeds via WordPress (covers podiobooks, daily lit); here are some : bookmarked resources I've tagged "daily".

    Hoping this little history doesn't sound too "look what I did"...;-) I'm really looking forward to spend more time reading through you solution and hopefully adding to the discussion around it.

    ReplyDelete
  2. >
    Hoping this little history doesn't sound too "look what I did"...;-)

    Not at all, this is fantastic. I want to be sure to get the history of this right. I really appreciate the references.

    I haven't found any literature on serialized feeds in learning, and even material on pacing is hard to find - anything you run across would be useful.

    ReplyDelete
  3. Hi Stephen - My general reaction to serialized feeds is positive. It seems like a good idea. But your post left a lot of unanswered questions. For example, how personalized is it to assume that everyone in a cohort will benefit from the same feed schedule? I suppose for someone who would like a slower schedule, the feeds just build up in his or her reader. But what about someone who wants a faster schedule; ie, someone who reads the first post and then wants the second post immediately? Another question: how does the author determine the order of serialized feeds? Is it solely on the basis of date-time of the original post? Or does the order bounce around in time based on some interior logic to the topic? I suppose either might be useful, but I'm not certain how personalized they are. Anyway, these are just some initial impressions. I'm just trying to better understand how it all would work.
    Gary

    ReplyDelete
  4. The order of the posts is determined by the offset - offset 1 is sent first, offset 2 is sent second, and so on.

    Right now I am concentrating on creating a serialized feed and cohort to travel with it. I will address issues of variable speeds once the basic system is working.

    ReplyDelete

I welcome your comments - I'm really sorry about the moderation, but Google's filters are basically ineffective.