Feeding a repository

There has been some discussion recently about mechanisms for remote or bulk deposit in repositories and similar services. David Flanders ran a very thought provoking and lively show and tell meeting a couple of weeks ago looking at deposit. In part this is familiar territory; looking at and tweaking the work that the creators of the SWORD profile have done based on APP; or looking again at webDav. But there is also a newly emerging approach of using RSS or Atom feeds to populate repositories, a sort of feed-deposit. Coincidentally we also received a query at CETIS from a repository which is looking to collect outputs of the UKOER programme asking for help in firming-up the requirements for bulk or remote deposit, and asking how RSS possibly fitted into this.

So what is this feed-deposit idea. The first thing to be aware of is that as far as I can make out a lot of the people who talk about this don’t necessarily have the same idea of “repository” and “deposit” as I do. For example the Nottingham Xpert rapid innovation project and the Ensemble feed aggregator are both populated by feeds (you can also disseminate material through iTunesU this way). But, (I think) these are all links-only collections, so I would call them a catalogues not repositories, and I would say that they work by metadata harvest(*) not deposit. However, they do show that you can do something with feeds which the people who think that RSS or Atom is about stuff like showing the last ten items published should take note of. The other thing to take note of is podcasting, by which I don’t mean sticking audio files on a web server and letting people find them, but I mean feeds that either carry or point to audio/video content so that applications and devices like phones and wireless-network enabled media players can automatically load that content. If you combine what Xpert and Ensemble are doing by way of getting information about entire collections with the way that podcasts let you automatically download content then you could populate a repository through feeds.

The trouble is, though, that once you get down to details there are several problems and several different ways of overcoming them.

For example, how do you go beyond having a feed for just the last 10 resources? Putting everything into one feed doesn’t scale. If your content is broken down into manageable sized collections (e.g. The OU’s OpenLearn courses and I guess many other OER projects) you could put everything from each collection into a feed and then have an OPML file to say where all the different feeds are (which works up to a point, especially if the feeds will be fairly static, until your OPML file gets too large). Or you could have an API that allowed the receiver of the feed to specify how they wanted to chunk up the data: OpenSearch should be useful here, it might be worth looking at YouTube as an example. Then there are similar choices to be made for how just about every piece of metadata and the content itself is expressed in the feed, starting with the choice of flavour(s) for RSS or ATOM feed.

But, feed-deposit is a potential solution, and it’s not good to try to start with a solution and then articulate the problem. The problem that needs addressing (by the repository that made the query I mentioned above) is how best to deposit 100s of items given (1) a local database which contains the necessary metadata (2) enough programming expertise to read that metadata from the database and republish or post to an API. The answer does not involve someone sat for a week copy-and-pasting into a web form that the repository provides as its only means of deposit.

There are several ways of dealling with that. So far a colleague who is in this position has had success depositing into Flickr, SlideShare and Scribd by repeated calls to their respective APIs for remote deposit—which you could call a depositer-push approach—but an alternative is that she put the resources somewhere, provides information to tell repositories where they are so any repository that listens can come and harvest them—which would be more like a repository-pull approach, and in which case Feed-deposit might be the solution.

[* Yes, I know about OAI-PMH, the comparison is interesting, but this is a long post already.]