considering OAI-PMH | John Robertson

OAI-PMH is a odd thing:

a protocol almost universally implemented in repositories and consequently (usually) publishing metadata about repository contents to the world
a protocol frequently reviled by anyone trying to aggregate feeds from different repositories and build discovery tools and services on top of that aggregate.

I’m not going to repeat OAI-PMH’s problems in detail (PERX, the experience of the NSDL with metadata quality, Andy Powell, Jim Downing and many others have done that), suffice to say their are issues about how protocol was implemented by software, how it is used by metadata creators, and how not web-friendly it is and niche it remains.

However, I realised recently that I’d begun to think that it must be better by now – surely the teething problems are done with – implementations more mature, record quality better, and aggregation more stable. This is, in part, because it remains the standard for sharing repository metadata and because in a number of settings it works well – there are plenty of communities using it to establish and provide services either by creating ‘closed’ controlled conditions through communally enforced ways of recording information and application profiles, guidelines and ‘political’ agreements in additional to the protocol or by creating tools that simply hack their way around whatever data they get and offer good enough services.

So it was with interest that I picked up on a discussion on twitter about what’s wrong with OAI-PMH and an upcoming paper on using Atom .[edit: I’m updating this list with fragments of conversation about OAI-PMH if I see them, and the odd link or two] It’s not the first time I’ve caught fragments of conversations on the use of feeds – for example the earlier RSS and repositories discussion.

There are a slew of issues around trying to standardise feed types (as we discovered in the discussions organised around RSS as a possible metadata deposit mechanism). See for example Feed Deposit , OER, RSS, and JorumOpen , and the two later review articles (1 , 2) as well as the email list discussions ). Given the increasing use of RSS or Atom in some of the OER discovery tools, the work listed above, and the wider promotion of it in the UKOER pilot projects, why am I so interested in this discussion about OAI-PMH and about another effort to use Atom/ RSS?

I’m happy to see this debate crop up again in the wider (library) repository community for two reasons:

1) perhaps obviously it reaffirms the issues with OAI-PMH, that they haven’t changed, and the possibilities RSS/Atom offers,

2) more importantly it’s bringing the discussion about the feeds produced by repositories into the library/ scholarly works communities. Like it or not those are the communities who are most using repositories and the communities who can to some degree shape the development of repository software and specifications. Few repository platforms natively support much customisation of the feeds they produce and until the wider repository community wants that type of functionality or control and begins to think how update or move beyond OAI-PMH* there’s little reason for repository developers to work on the problem.

Without those changes anyone wanting to manage learning materials in a repository still has to hack their own fixes, build their own repository – or not use repositories (but that’s another question).

*I should note: I like OAI-PMH – I can play with in a browser, repository explorer is a good tool, and using OAI-PMH I can get and interact with someone else’s ‘raw’ metadata. – I’m just no longer convinced it’s the right tool to share metadata – in part by how few successful discovery services there are which use it.

4 thoughts on “considering OAI-PMH”

Dorothea Salo says:

January 21, 2011 at 1:51 pm

I think you might be just a tiny bit optimistic about the influence repository managers have on repository developers.

Aside from that, agree wholeheartedly!

Pingback: Tweets that mention considering OAI-PMH -- Topsy.com
Roger Hyam says:

January 12, 2012 at 5:57 pm

Hi,

I was wondering if you could name an aggregation protocol that is not ‘reviled’ by whoever has to use it. Aggregation is just like that!

I am wanting to suggest the use of OAI-PMH in up coming projects. I feel there must be an alternatives but have yet to find them. The main issue I have is that the main feed technologies don’t appear to allow you to go back in time. If you find out that your harvester has been unreliable for the last couple of weeks you can’t go back and say “give me everything that has changed since…” and page through to the present day. (I may be missing something in the ATOM spec but I don’t see it).

People seem to complain about OAI-PMH but as you point out that is usually to do with metadata standards not the protocol. Nobody actually proposes an alternative.

What am I missing…

JohnR says:

January 16, 2012 at 11:55 am

Hi Roger,

I agree that many issues with aggregation are ultimately about underlying data quality. There are some issues with OAI-PMH in itself as it’s currently specified which are discussed in the above captured tweets – however, for all the problems which it may have, my colleague (Phil Barker ) was recently pointing out that one of the difficulties with OAI-PMH is that is it’s implemented by comparatively so few pieces of software that if you want to write a harvester or endpoint interface and address some of the underlying data quality issues you probably have to write everything yourself. Phil would contend that most feed technologies are so ubiquitous (Rss/ Atom) that there is a much greater chance the there are freely available software libraries addressing data quality issues that you can draw on to create your software.

in terms of going back in time … this is perhaps the greatest weakness of RSS/ATOM as it’s often implemented. The issue isn’t in the spec (as i understand it) but that it is often implemented as a (10 most recent items) newsfeed – the spec can be implemented to provide a complete listing of resources (for example listing everything or everything matching a particular criteria). If you have control over the resource providers you could use rss in this way quite effectively. however, if they’re already set up and supporting OAI-PMH, or using commercial software (which is likely to restrict how you can output the rss) – it may be more straightforward to keep using OAI-PMH (depsite the above limitations)