Many of you who have an interest in open education resources and the Academy / JISC OER Pilot Programme will already be following the development of JorumOpen. JorumOpen will enable users worldwide to search, browse and download open educational resources deposited by UK Further and Higher Education Institutions and licenced under Creative Commons licence. OER Projects already have access to the JorumOpen deposit tool and the service itself, which is based on a customised version of DSpace will be available to the wider community from the 19th of January 2010.
CETIS have been liaising closely with the Jorum team in order to support the OER Programme’s requirements. One early requirement that has emerged is the need for bulk deposit and a request from some projects that this may be partially fulfilled by ingest of RSS feeds. Sounds simple enough but as with all such requirements the devil is in the detail. And in this case some of the details include which version or RSS to support, how to encode and handle licence information, what metadata to include, how to process said metadata, what is a realistic size for feeds?
In order to kick off a discussion on these and other issues Gareth Waller, Jorum’s Technical Manager (and DSpace wrangler extraordinaire) has written a short white paper titled Issues surrounding feed deposit into institutional repositories which presents the pros and cons of using syndicated feed formats to facilitate deposit into repositories such as JorumOpen. Gareth presents a succinct overview of syndicated feed formats and raises a number of questions relating to: item identification, item updates and deletions, missing items, polling periods, feed formats, metadata formats, local metadata application profiles, handling licences and whether to store links or download resources.
JISC, CETIS and Jorum would like to move towards the development of an agreed RSS profile for open educational resources so we are actively seeking comments from the community on the kind of issues Gareth raises in his paper. If you would like to comment or raise additional issues please post your comments here or on the Jorum Community Bay. You can download Gareth’s paper here or from the Community Bay.
Apologies if you tried to download Gareth’s paper from the link above and got a “not found” error. Some kind of wordpress weirdness with document permissions. The link should work now.
Just to more publicly reiterate some of the points I made in reply to Laura Shaw’s recent email to UKOER projects that are already looking into the use of RSS to share OERs.
I’ve already become aware of one or two issues with our feed through correspondence with the Xpert project at Nottingham University – http://www.nottingham.ac.uk/xpert/; we are using intraLibrary and the RSS URL generated by that software has a dynamic element or token which means that the URL is different each time it is generated so every time Xpert harvests the feed, they see these urls as new resources, whereas they are actually duplicates – perhaps this won’t currently be an issue for JorumOpen, however, as the feed will not be continually polled for content and our feed should be adequate to provide a “snapshot” – it may become an issue, however, if JorumOpen develops and regularly harvests our feed for new content.
A more pressing issue, perhaps, is that, as far as I can tell, the only information published as RSS from intraLibrary is:
This obviously lacks a lot of important metadata including licence information – I don’t think I am able to customise this in intraLibrary but will double check with Intrallect.
From my perspective, there is also an issue in that we are currently developing an Open Search interface to provide unauthenticated access to OER that returns a list of results that link through to a unique html page for a specific resource (e.g. http://repository.leedsmet.ac.uk/main/view_record.php?identifier=823&SearchGroup=Open+Educational+Resources). Ideally it is this page that I would want the RSS link to point to, however, contains an intraLibrary generated URL which points directly to the resource itself (either the file in intraLibrary or an external URL).
I don’t really know enough about RSS to know how easy or difficult it would be to customise or perhaps generate my own feeds that would be more appropriate to our specific requirements and those of JorumOpen; if anybody can offer advice or point me towards useful tools it would be greatly appreciated.
The URL below is our current, generic OER feed – it would be fairly easy for me to generate individual feeds on the basis of individual classifications (caveats above notwithstanding!)
http://repository-intralibrary.leedsmet.ac.uk/IntraLibrary-RSS?rss_feed_id=6a6176612e7574696c2e52616e646f6d4031396633636564&rss_2.0.xml
The XML snippet seems to have been stripped from my post above! The fields that are published as RSS from intraLibrary are:
title / link / description / pubDate / guid / dc:date
Thanks
Nick
Just to say: the HumBox project would be very keen to consider RSS in relation to getting resources into JorumOpen. We have 800+ resources and are becoming concerned about how we get these into Jorum – we don’t have the resource allocation to pay someone to spend time uploading stuff. RSS seems like the solution.
Kate
Pat is in a much better position to comment than I am, but in developing Xpert we would echo many of Gareth’s points: the data is provided in all sorts of inconsistent methods, some of which we can handle, and some we can’t, and there are other issues.
Xpert launched in September, and now indexes some 2500 resources. Clearly a lot of people in the OER movement are exposing their resources using RSS, rightly or wrongly, because it is easy to do, and there is no obvious alternative. RSS + Dublin Core namespace is possibly as good as the state of the art is right now? If I were looking for a way to expose OER, that is probably what I’d choose, in the absence of any other simple alternatives, so I would imagine that major players in the repository are going to have to accommodate it somehow.
To add to Julian’s comments, through Xpert we have seen many of the issues described. Here is a link to a report from Pat Lockley (Oct 09) exploring some of the same points.
http://webapps.nottingham.ac.uk/elgg/xpert/weblog/2382.html
There are issues with using RSS. However, it is also has some key benefits, namely
– continued automated submission beyond current lifespan of projects
– simple distribution. Nottingham started using RSS for U-Now because this was a requirement for the OCWC. Since publishing the RSS feed, our OER is not only ‘findable’ in OCWC and Xpert, but also OER Recommender, OER Coomons and Discover Ed, all without any additional effort by the team.
Pingback: John’s JISC CETIS blog » JISC and MIT: comparing notes on ed tech
Pingback: OER and RSS «
We attended the JISCRI deposits tools event, and whilst the state of the art is reasonably advanced for depositing research papers, it appeared that deposit tools for learning resources were not really on the radar. We left with the view that learning resources have some quite different considerations, and that these weren’t being well represented in the uber-specs the day sought to produce.
The OER programme is focused on learning resources, and well positioned to inform future developments I think.
I’m a bit late commenting on this so many of the things I would have said have already been said: RSS gets you into a number of different places like OCWC, OERRecommender; RSS and DC and CC extensions are easy so lowest common denominator; OpenLearn has too many resources for a manual upload.
The paper concludes that the OAI-PMH metadata harvesting tool is a better option for JorumOpen. I’d strongly agree. I realise it is technically more challenging to offer an OAI interface, but there are many opensource libraries available which you can plug in to java or php apps.
There’s no well-maintained LOM extension for RSS which makes it a risky implementation if you want your feeds to pass an external validation too. OAI would also allow you to have a custom xml format for import into Jorum if you felt it necessary.
I was concerned by the authors suggestions for item update identification. As he states, checksums etc will not work for website notification harvesting. But adding custom text like “jorum:update” to an RSS feed will annoy all the other users of my feed and pollute my data in their search results. Shouldn’t we just recommend people use the existing date tags appropriately and compare a date stored in the respository with the date on the feed?
Deletion of an asset is a tricky area. OAI would let you mark an item deleted. I’m more worried by the suggestion that a feed has a finite amount of items in it, and I can see you are too as many of the other problems listed in the drawback stem from this assumption. There’s no such restriction in the RSS standard and no expectation of a limited list in other OER aggregators like OERRecommender who expect you to list everything you’ve got. Have you talked to the developers of that tool to see how they handle updates, deletions, missing items etc? I know it was knocked together pretty quickly, but I expect they’ll have thought of these things. I can put you in touch if you need an email as Joel doesn’t work for USU COSL any more I don’t think.
One final thought – if you get everyone to offer a single feed format into JorumOpen, then one advantage of JorumOpen can be to offer all the other feed formats so that all the UKOER projects can be aggregated elsewhere.
Gareth’s paper raises a number of interesting issues. In the introductory part he reviews technologies for bulk transfer/deposit of metadata from one source to another. However most of the paper expands on his point “At first glance using syndicated feeds to achieve this would be an obvious choice”. I would like to examine whether this is still true at second glance and to suggest why the other technologies, which were developed expressly for this purpose, are more appropriate.
First let’s look at RSS feeds. In their favour we have:
– They are commonly used and familiar to everyone
– Many repositories can expose resources through RSS feeds
– They offer “continuous update”
However, The implementation suggested for JorumOpen negates many of these benefits:
– The standard form of an RSS feed is not suitable for ingest into JorumOpen because it requires suitable metadata and licence information to be added to the feed. This would require everyone who creates feeds to ensure they are enhanced for JorumOpen and will require clarity on what metadata formats are required and what minimum fields are needed for acceptance into JorumOpen. How are people to know if their feeds do not meet these minimum standards?
– Who can submit feeds? Presumably resources in feeds can be submitted on behalf of others. The metadata will need to make clear who are the owners of resources.
– The “continuous update” is not implemented as stated by Laura Shaw “The feed isn’t continually polled for new content (and obviously no functionality for deletes/updates within a feed)”
– “Items within a feed are not auto classified within Jorum.” Even if the classification information is provided within the metadata?
If no other alternatives are available then RSS feeds are perhaps a suitable quick solution.
However, given the availability of OAI-PMH and SWORD it could be quite straightforward to harvest metadata from one repository in the required metadata format (and to set up periodic updates) and to deposit these metadata records using SWORD, ensuring that all the metadata from the original resource is preserved – including classification metadata and ownership information.
Licence information is always likely to be more problematic and as a CC licence will need to be selected for JorumOpen.
We have also hit many of those issues, but what strikes me is that many people are looking for a simple way to expose their resources, and, to their mind, RSS fits that bill – we know that because it is what many people are choosing to use.
OAI-PMH SWORD etc are big technical barriers for many people who have resources to expose – anyone can make a feed. Along with some good guidelines on how to best use the feeds, this surely presents a good opportunity for Jorum, especially while significant collections such as OER Recommeder continue to use feeds, and people find it easy to use?
Hello,
I am the aforementioned Pat.
Working from the topics listed in the paper / from my experience with xpert
Item identification
——————-
Xpert works on the basis that the URL for each item is unique (well it best be) so this is the “key” in database terms. This is how we distinguish between one item and another.
Item updates
————
If you’re harvesting URLs, I don’t think there is an issue with this. I would assume that all feeds are very much alive and continually changing or being modified.
Xpert empties all its metadata each night and reharvests each time, so it takes a new copy of the metadata each time. So if the metadata has been updated, xpert will reflect that.
I would agree with Jenny that any specific information is going to be confusing to end users, and possibly limit the ability to submit the same feed to other repositories.
Item deletions
————–
As xpert deletes its metadata every night, an item being taken from the feed will still be in the system (although we record when it was last found in a feed) but will be less likely to be found in a search result.
We’ve not explicitly had a problem with this though
Missing items
————-
I’m covering this later in a new section, but feeds can contain a tag on how often they should be harvested.
Polling period
————–
See above, but we harvest once a day. Each day brings about 10 or so new items.
Feed formats / Metadata formats
——————————-
DC seems by far the most widely used. I would worry more than people will stick to it and be consistent with data inputted.
Repository Required Metadata Profile
————————————
Xpert doesn’t have a minimum setting, other than it needs to have a link. Items with less metadata will just be found less often, but I wouldn’t not harvest just because metadata was sparse – people may still find it.
Licensing Content
—————–
You can have an overall license per feed, or information per item. I am not sure why subsequent release is an issue because they are already released?
Links or resource download
————————–
We only take links, no downloading occurs.
Other issues
————
Who is the end user? – I think it’s acceptable to have an RSS feed for harvesters and an RSS feed for people. Most feeds contain all their site’s content (I think only a few limit) and so are well suited to being harvested without any worries over missing items due to harvest frequency. That to me is better than trying to make one feed for all.
It might be logical that if we are making a second RSS for harvesting we might use some other technology instead.
Feeds that aren’t valid – Of all 60 feeds xpert takes, 5 aren’t valid xml, and approximately 20 aren’t valid RSS. The harvesting service has to be flexible to deal with this. It also needs to like lots of character sets, as a lot of content isn’t in English
Format of presentation – I think a link should go to a page. Some xpert links prompt for download, and some go off to a scorm package – which when accessed outside a scorm client / service looks absolutely awful to an end user (and would almost certainly put them off using it). It’s not an ideal situation.
Quality of metadata – It’s often very thin indeed
Author format – preferred way of displaying author
Making metadata consistent – one problem we have in xpert is the different forms of attribution – some organisations claim DC:Creator as them, some allow that for the individual creator. This makes presenting meaningful, consistent data to someone searching the data awkward.
DC nodes – Personally, i think they are a bit weak, and more categories are needed for more relevant information (level, accessibility, duration).
Pingback: Really (not so) Simple Syndication « Repository News
If you can face it, I’ve just posted more on this at http://repositorynews.wordpress.com/2010/01/07/really-not-so-simple-syndication/
Thanks for all the comments and discussion folks. There’s certainly no “one size fits all” solution emerging but there’s a huge amount of valuable information here. We will be attempting to synthesise some useful outputs from this discussion shortly.
Pingback: John’s JISC CETIS blog » RSS for deposit, Jorum and UKOER: part 2 commentary
Pingback: JIF10 at Royal Holloway « Repository News
Pingback: OER repositories and preservation – the elephant (not in) the room? « Repository News