Feeding a repository

Phil Barker — Wed, 28 Oct 2009 12:18:58 +0000

There has been some discussion recently about mechanisms for remote or bulk deposit in repositories and similar services. David Flanders ran a very thought provoking and lively show and tell meeting a couple of weeks ago looking at deposit. In part this is familiar territory; looking at and tweaking the work that the creators of the SWORD profile have done based on APP; or looking again at webDav. But there is also a newly emerging approach of using RSS or Atom feeds to populate repositories, a sort of feed-deposit. Coincidentally we also received a query at CETIS from a repository which is looking to collect outputs of the UKOER programme asking for help in firming-up the requirements for bulk or remote deposit, and asking how RSS possibly fitted into this.

So what is this feed-deposit idea. The first thing to be aware of is that as far as I can make out a lot of the people who talk about this don’t necessarily have the same idea of “repository” and “deposit” as I do. For example the Nottingham Xpert rapid innovation project and the Ensemble feed aggregator are both populated by feeds (you can also disseminate material through iTunesU this way). But, (I think) these are all links-only collections, so I would call them a catalogues not repositories, and I would say that they work by metadata harvest(*) not deposit. However, they do show that you can do something with feeds which the people who think that RSS or Atom is about stuff like showing the last ten items published should take note of. The other thing to take note of is podcasting, by which I don’t mean sticking audio files on a web server and letting people find them, but I mean feeds that either carry or point to audio/video content so that applications and devices like phones and wireless-network enabled media players can automatically load that content. If you combine what Xpert and Ensemble are doing by way of getting information about entire collections with the way that podcasts let you automatically download content then you could populate a repository through feeds.

The trouble is, though, that once you get down to details there are several problems and several different ways of overcoming them.

For example, how do you go beyond having a feed for just the last 10 resources? Putting everything into one feed doesn’t scale. If your content is broken down into manageable sized collections (e.g. The OU’s OpenLearn courses and I guess many other OER projects) you could put everything from each collection into a feed and then have an OPML file to say where all the different feeds are (which works up to a point, especially if the feeds will be fairly static, until your OPML file gets too large). Or you could have an API that allowed the receiver of the feed to specify how they wanted to chunk up the data: OpenSearch should be useful here, it might be worth looking at YouTube as an example. Then there are similar choices to be made for how just about every piece of metadata and the content itself is expressed in the feed, starting with the choice of flavour(s) for RSS or ATOM feed.

But, feed-deposit is a potential solution, and it’s not good to try to start with a solution and then articulate the problem. The problem that needs addressing (by the repository that made the query I mentioned above) is how best to deposit 100s of items given (1) a local database which contains the necessary metadata (2) enough programming expertise to read that metadata from the database and republish or post to an API. The answer does not involve someone sat for a week copy-and-pasting into a web form that the repository provides as its only means of deposit.

There are several ways of dealling with that. So far a colleague who is in this position has had success depositing into Flickr, SlideShare and Scribd by repeated calls to their respective APIs for remote deposit—which you could call a depositer-push approach—but an alternative is that she put the resources somewhere, provides information to tell repositories where they are so any repository that listens can come and harvest them—which would be more like a repository-pull approach, and in which case Feed-deposit might be the solution.

[* Yes, I know about OAI-PMH, the comparison is interesting, but this is a long post already.]

A short update on Repository specs

Phil Barker — Mon, 10 Sep 2007 09:46:21 +0000

The end of summer, with people coming back from their holidays, seems a reasonable time for a quick update on where various specification and standardization activities have got to. I’ll deal with specifications for interacting with repositories in this post; there’ll be another post soon covering metadata specifications and standards.

IMS LODE

A project group of IMS members lead by Martin Morrey of Intrallect and David Massart of European Schoolnet is currently well advanced in writing a charter for work on Learning Object Discovery and Exchange (LODE, though originally the work went under the name of Federated Architectures). Charter writing is the first step of the IMS specification process, which essentially involves setting out the scope of a proposed specification activity. The draft charter is a private IMS document until it is approved, but there is a little information about LODE on the IMS website. My understanding is that the intention is for the LODE specification work to focus on profiling existing specs and standards for interoperability between repositories and eLearning systems. We Hope that the specification will result in an agreed protocol for searching a remote repository for resources with specific educational attributes by using, say, SRU and LOM, and to that end JISC CETIS will be providing some input into the development of this spec.

OASIS Search Web Services

An OASIS technical committee has been formed to define web services for search and retrieval based on standards such as SRU and Simple Query Interface (SQI) (and possibly others): their charter is on the OASIS website. Profiles are envisaged for applications such as bibliographic and geospatial metadata, or e-government, but not for education per se.

OAI-ORE

The Open Archive Initiative’s Object Reuse and Exchange project is working towards developing specifications that will allow repositories to exchange information about objects within them, particularly “compound information objects” or aggregations of several discrete resources. One product of the work so far is a discussion document on how such objects could be represented as named graphs in order to allow the repository to expose the constituent parts to the wider world.

Deposit API / Sword

Two years ago at the CETIS November conference, Andy Powell highlighted deposit of an object in a remote repository as area of interoperability for which there was no agreed mechanism (though there were plenty of candidates for such a mechanism). Since then UKOLN have encouraged a group of interested parties to discuss approaches, culminating in a JISC funded project Simple Web-service Offering Repository Deposit (SWORD). The protocol developed through these activities is a profile of the ATOM Publishing Protocol, and was described by Julie Allinson, the SWORD project manager at the June SIG meeting.

Phil Barker » deposit

Feeding a repository

A short update on Repository specs

IMS LODE

OASIS Search Web Services

OAI-ORE

Deposit API / Sword