John Robertson » repositories

UKOER 2: Content management platforms

johnr — Fri, 26 Aug 2011 16:00:26 +0000

What platforms are UKOER 2 projects using to host and manage their content? What types of content are they releasing? This is a post in the UKOER 2 technical synthesis series.

[These posts should be regarded as drafts for comment until I remove this note]

OER types:

Projects in UKOER 2 have released resources at various levels of granularity from individual images and documents through to whole courses.

A variety of mime types are used by the projects. These include: doc, pdf, spss, wiki, ppt, prezi, wmv, html5, javascript, wav, MS Office, DOM, RTF, GIF, JPEG, PNG, AVI, MPEG, DivX, QuickTime, MP3, mp4, HTML, zip, xml, qti, swf, flv.

Platform overview:

Overview of platforms in use in the UKOER 2 programme

As can be seen this graph is somewhat misleading as it aggregates the total use of web 2.0 tools giving a very large result in relation to other options, however no projects have solely used externally hosted web 2.0 platforms and the following detailed graph shows a more useful view. The graph is much more useful in noting the comparative use of other platforms.

NB usage figures are not mutually exclusive – a good number of projects used multiple platforms

Detailed view:

Content management platforms in use in the UKOER 2 programme

NB usage figures are not mutually exclusive – a good number of projects used multiple platforms
Repository = repository software platform, rather than use of another platform as a repository
There is an strong use of SlideShare and YouTube but relatively little use of iTunes.
There’s a diverse number of CMS used to manage content, but in aggregate their use parallels the use of repositories.
ALTO and Tiger added a CMS layer on top of a repository to improve their user interface.
There’s a notable interest in wordpress (especially in Strand C) as a lightweight platform to collect and aggregate OER.
One aspect of this interest in WordPress is for SEO reasons (Scooter)
The edshare variant of eprints is the most popular repository- interestingly a number of projects have chosen to use hosted versions of edShare. One noted key influence in this choice is the success of the UKOER 1 Humbox project in community development – at least one project (DHOER) is depositing content into Humbox.
Plugins for wordpress to support better metadata and licensing are being explored and developed by Triton, DelOREs, and CSAPOER [tbc])

Differences in managing learning materials?

johnr — Wed, 10 Aug 2011 16:04:19 +0000

Last week CETIS organised a workshop at the repository fringe 2011 #rfringe11 on the Advances in Open Systems for Learning Materials (#rfCETIS ). Phil’s collected blog posts and presentations-here.

This post is to briefly capture some of the discussion around the warm up act – our attempt to help the workshop participants, think about some of the different challenges that arise when managing learning materials. Both to help those participants coming from a more general repository background think through any possible differences which managing learning materials might make to their practice and systems, but also to remind participants of the different requirements which emerge from different types of learning materials.

The activity was to consider the differences between an OER collection (of any type(s) of material) and a collection of high stakes summative question items (and answers/ rubrics). It was framed in terms of quickly identifying some of the key issues in managing collections, the key discovery mechanisms, and what role or functionality users might expect the chosen ‘asset management system’ to support. In retrospect the timing of the activity was perhaps too short for group discussions but a few people asked me to write up some of the group feedback, so..

Issues in managing materials for learning and teaching

Can students contribute?
Can you use external content?
Do you need a formal deposit/ management workflow? (more likely to be needed if content open)
Do you need to manage IPR?
Do you need to worry about producing a final copy? maintaining version control?
How do you judge / promote / surface quality materials?
Do you need to quality screen resources?
Do you need to update resources or provide a mechanism for them to go out of date?
How do you manage security for assessment items? how do you manage time-to-live or other date restrictions?

Discovery issues for learning and teaching materials

Can you find it in Google? (if not – forget it?)
How do you navigate balance of Google and local indexing/ discovery tools?
How much metadata do you need? from who? how much of a time commitment is it?
How do you apply licences?
How do you tie into/ relate to/ develop discipline specific social networks?
How does your system overlap with/ integrate/ relate to the VLE?
How do you support course based discovery?
What issues are there in sharing data (data artefacts and types of data about resources) [including individual’s data/ corporate image]?
What level and type of info do you want around assessment items?

Issues for users

How does the system enhance learning experience?
How does the system suggest/ support discovery of additional related resources?
Can you find out anything about past exams? what can you find out?

There was plenty of feedback I’ve not managed to capture in this summary, but this gives a bit of a flavour of some of the issues which emerged and helped frame the approaches that the subsequent presenters discussed.

OER Hackday: initial reflections

johnr — Tue, 05 Apr 2011 13:25:37 +0000

On Thursday and Friday CETIS and UKOLN ran OERHack

last time I counted we had a little over 250 tweets.

Once we take ‘OERHack’ and RT out of the picture we see:

250 tweets isn’t that many for 2 days and ~40 people but that’s cause everyone was busy

Beforehand the event we had some discussions on blogs.

OER Hack day Wiki

Ideas that we outlined but didn’t develop fully:

PORSCHE thoughts
Document Import/Export Service
OER Playlist picker
Additions to OERbit
Prototyping a new OERca

Things that got hacked (and more or less documented)

Extend the utility of WordPress as a host / presentation vehicle for OER collections.
Bookmarking tools for OER
Generating Paradata from mediawiki pages
Hacking a Google CSE for course directories
Metadata Extraction Tools
SWORD desktop app
Email-based deposit plugin for SWORD
JS widget to Wookie widget (Jorum and OER Recommender)

These are described in more detail on the wiki as well as some notes about other things we tested and a useful list of wordpress plugins for learning resources. A fuller write-up about the event will be forthcoming but we (CETIS OERTIG, UKOLN DEVCSI, and everyone else) had a productive, fun, and busy two days.

opened10: brief thoughts

johnr — Fri, 05 Nov 2010 19:57:14 +0000

Highlight thoughts

(all of these deserve posts in their own right):

what is the difference ‘open’ makes? (D Wiley)
when we meet – when are we going to do and not just talk? (unattributed)
how do you respond as an individual? (and why do you care about OpenEd/OER)? (Gourley; but equally could have been if i’d been in their sessions: Winn, Hall, Neary – though they’ve quite a different perspective)
if you have rubrics and marks as semantic data can you analyse for ‘soft skills’ across a programme of study?
how do i articulate what HE does that P2PU can’t, what can i learn from P2PU and what should i stop doing cause they do it better? (drumbeat)
why don’t HE courses create badges too? (drumbeat)

I’ll need to go back through the programme and remind myself of some of the sessions but as a first pass of some of the stuff that caught my attention emerging from opened10. not yet adequately linked or marked up and doubtless will grow a bit over time as different parts of my brain kick in.

All the conference papers are available in the UOC repository .

with apologies to those i know or have heard recently (Brian Lamb, Scott Leslie, Suzanne Hardy, Jane Williams, Simon Thomson, Jakki Sheridan-Ross, all the wonderful folk from the Open University, and my colleague Li Yuan) – i’m too familiar with your work for it to make this first pass but i do think it’s great!

things to use now

some of the stuff that was presented is out there now to use:
smarthistory.org

smarthistory.org website

fantastic opened site for art history – working towards being a viable alternative to OER – two art history teachers making stuff as they go to help students offset the massive cost of introductory art history textbooks for foundation courses.

twhistory.org

Twhistory website

historical recreations on twitter: Gettysburg, 1847 pioneer trek, the sinking of the Titanic, the American revolution, possibly about to start working with UK national archives to cabinet war room twitter account of world war 2. Tom Caswell’s presentation.

edufeedr

a feedreader for running open courses – a tutor sets up a blog-based course and edufeedr aggregates content from across blogging platforms designed to gather together student feedback based from wherever they blog it.

information and stats

we’re finally at get to the point were we can make or not make business cases and informed decisions. (links will follow)

OER use and attitudes surveys Joseph Hardin Mujo Research present survey results from instructors at University of Michigan and University of Valencia – surveying their willingness to use and to publish OER.

OER use and attitudes iNacol (online schools, K-12) surveyed their members about awareness around OER – the data and paper aren’t published yet unfortunately

Rory McGreal – examining differences Open Access makes for a university press comparing Amazon rank of Abathasca University’s press which is OA with three other Canadian university presses. results didn’t indicate any significant difference for bought physical copies but only one metric and doesn’t account for greater access provided by OA downloads.

David Wiley offered some figures around Brigham Young University Independent Study Unit looking at sustainabilty of making content open – if content made open – can costs be covered by sustained or increased enrollment. the short answer- yes -just.

under development

Open Rubrics and the semantic web – Megan Kohler (Penn State) and Brian Panulla – well i’d call them feedback or assessment criteria but wither way they’ve developed an OWL ontology and reference implementations for sharing and storing marking rubrics (and associated marks) – in terms of technical developments i think this is potentially the most important thing from the conference.

stuff to think about more

building courses with OER: Griff Richards presented about a project he’d worked on create course syllabi for a master’s course in instructional design. one to follow up after the final report and syllabi are out. [personally it brings me back to thinking about course syllabi around OER for librarians – but that’s another post in a month or so]. His metaphor of clothes shopping for looking for learning materials is also worth keeping around (Tailored: expensive, perfect , emperor’s new clothes; Off the Shelf: not quite fit, but do the job, reasonable price; Charity Shop: nearly free, hard to find what you want, might just find something perfect).

David Wiley the difference of Openness. the challenge is what does ‘open’ allow us to do pedagogically that we can’t otherwise do [open specifically not all the good stuff that often is triggered by open]? Identifying Concrete Pedagogical Benefits of OER

David Wiley: Why do we need 'open'?

Dublin City University – took the OER as marketing angle and did some extensive work on how to best brand OERs using product placement and advertising methodology – this presentation made me profoundly uncomfortable but it is the logical extension of some of the advice and case for OER that many of us (including me) have made. i’m going to have to read their paper and think about this.

Erik Duval said a lot of things but there’s something fundamentally important about not being afraid to disrupt learning – oers probably have more quality assurance than the rest of course delivery.

Erik Duval you can afford to disrupt learning

I can’t help but finish with the work of those I presented in the same session as: Julià Minguillón (UOC), Pieter Kleymeer and Molly Kleinman from University of Michigan- we all raised questions, limits and possibilities around the role of libraries in OER. It was great to find other people asking similar questions.

CETIS OER Gathering

johnr — Tue, 08 Jun 2010 09:39:44 +0000

We’re organising a developer event on harvesting, aggregating and collecting OERs. Creating an opportunity for developers to work on some of the issues around collecting and using OERs. We’re looking at technical issues around collecting OERs into your ‘system’ and sharing content from your ‘system’ with dynamic collections. More specifically we hope to:

learn more about ICoper -a major European project- working building tools for the discovery, recommendation, and annotation of learning materials.
explore the issues in incorporating third-party OERs in a repository,
explore the technologies available for the dynamic thematic collections envisaged by the OER phase 2 call for proposals, and
investigate what needs to be done to implement these technologies.

More details of the day can be found at http://wiki.cetis.org.uk/OER_Gathering which we’ll continue to update as we get feedback.

Tag: #cetisgath

We’d originally intended to run this as two back to back events but as a result of some of our expected participants (including a number of colleagues from ICoper) having conflicting commitments and being unable to attend we’ve decided to run the two days we’d planned for the OER Gathering as a single day: June 22nd.

To help structure the day and make sure that the concentrated event is able to focus on what participants are most interested in and the questions the community has in this area, we’d like some feedback from participants and other interested parties.

If you’re attending:

What’s your background? (developer, manager, researcher, …)
Within the scope of the event, what are you most interested in discussing?
If applicable, what would you like to demonstrate at the event?
What are you most interested in hearing more about/ seeing demonstrated?
If you’re a developer, what languages do you know/ what development environments/tools do you work with?
If applicable, which metadata standards are you familiar with?
If you run a repository or service that you’d like involved in the event can you provide us with some details about it (e.g. OAI-PMH base url / api functionality / feeds)?

Whether attending or not if you have any ideas of development challenges which you’d like to work on or see further specified at the event let us know (comment, email, or add them to the wiki).

Custom ‘repository’ developments in the UKOER programme

johnr — Wed, 31 Mar 2010 15:58:26 +0000

One interesting development in the UKOER programme has been how many projects have chosen to build their own repository/database to manage their content in some form. Normally the phrase ‘we’ve built our own repository’ makes me worry in the same that ‘we’re developing our own standard’ or ‘our own controlled vocabulary’ does. However, these projects have had a wide variety of good reasons for doing so – all of which bear closer examination. Their approach is a reminder that there are circumstances under which ‘build your own’ is both necessary and a good idea. Some projects also make a case for lightweight and disposable approaches.

All the custom developments have used MySQL and all of those taking this option have been subject strand projects.

CORE Materials
- they have built a database for the central management of resources prior to uploading to web 2.0 sites; their own solution was required to support interaction with the APIs of web 2.0 tools.
Medev OOER
- they have built a database as a staging ground for preparing OERs – JorumOpen is their primary deposit. They are also considering a local repository in the longer term.
- MySQL was chosen to be able to interact with Subject Centre website.
- They are also looking at web2.0 api interoperability
Open Educational Repository in Support of Computer Science
- built a lightweight disposable solution as management and publishing tool and staging ground for Jorum deposit
- Jorum as the primary repository and copy of record/ preservation copy.
Phorus
- primary cataloguing of OERs is into Intute which is then harvested via OAI-PMH into their local database
- they then aim to harvest resources into JORUM
- they may also move resources to host institution’s (Fedora) repository
Simulation OER
- developed local repository both as continuation of earlier work and as available repository options did not meet the key requirement of being able to preview simulations.

Use of repository software in the UKOER programme

johnr — Wed, 31 Mar 2010 10:30:36 +0000

In the UKOER programme a number of projects have chosen to use repository software to manage their educational materials. Such software may be commercial, open source, or hosted (often using open source). Alongside research information systems, repositories occupy an increasingly well established position in institutional infrastructure for managing and sharing research materials (including theses, preprints, and metadata about articles). Consequently for many institutions they offer a natural choice to manage and share OERs.

When I’m aware of a repository holding research content as well as OERs I’ve noted this: educational materials only or mixed materials.

Fedora

http://www.fedora-commons.org/

Phorus
- may harvest their MySQL based solution into their host institution’s repository (outwith project- presumably mixed materials).
Skills for Scientists
- will move all resources into host institutional repository for preservation/ long term access.
- not all content suitable for Jorum e.g. Scottish ~CC licensed stuff. (mixed materials)

Intralibrary

http://www.intrallect.com/index.php/intrallect/products

Unicycle
- mixed materials

Equella

http://www.thelearningedge.com.au/products.php

Berlin
- educational materials only
- OpenCourseWare branded
OCEP
- mixed materials

Harvest Road Hive

http://www.giuntilabs.com/HarvestRoad_Hive/index.php

OpenStaffs
- unknown from context probably educational materials only

ePrints

http://www.eprints.org/

ADM OER partner
- unknown collection composition
HumBox
- educational materials only
ChemistryFM
- will be using institutional ePrints as preservation store
- mixed materials

DSpace

http://www.dspace.org/

Open Exeter
- developed support for Content Packages and a LOM mapping
- educational materials
C-Change
- local DSpace repository was considered but rejected in favour of Jorum only approach (counter-use)

Note:
I’ll be blogging shortly about the other approaches taken for managing and sharing OERs, I’ll comment on the patterns at that point – but feel free to add any suggestions or comments about repositories here.

The use of OAI-PMH and OAI-ORE in the UKOER programme

johnr — Tue, 30 Mar 2010 13:01:03 +0000

OAI-PMH

The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH ; http://www.openarchives.org/OAI/openarchivesprotocol.html ) “provides an application-independent interoperability framework based on metadata harvesting.” The protocol is widely used by repository software to make metadata about the resources they store available. In its use the repository acts as a data provider which is then able to be harvested by a data harvester. Two issues of note:

although most repositories can function as data providers the data harvesting aspect of the protocol often requires separate software and is much less widely implemented.
OAI-PMH specifies a minimal base metadata set of OAI_DC (~ the simple DC element set) therefore any implementation of it should be able to provide this as a minimum. Other metadata standards such as DC Terms or IEEE LOM can also be made available for harvesting.

Although OAI-PMH is a well established standard which is widely used, at this point it’s use for open educational resources is somewhat limited. OAI-PMH is not currently in use by Jorum for metadata harvesting and, as far as I know, there are not many OAI-PMH based harvesters offering aggregated search services for educational materials (outside of those within particular closed/ or semi-closed communities). DiscoverEd from Creative Commons does offer an OAI-PMH based harvest but prefers RSS/Atom based approaches (Enhanced Search for Educational Resources – A Perspective and a Prototype from CCLearn (2009) http://learn.creativecommons.org/wp-content/uploads/2009/07/discovered-paper-17-july-2009.pdf , p12). UPDATE: Please see comment from Jenny Gray below.

OAI-PMH is being used or is supported by:

ChemistryFM (option once content is in backup ePrints repository)
Phorus
Unicycle
OCEP
Open Exeter
OpenStaffs
OERP (use unknown)
Humbox

OAI-PMH is in active use by (as opposed to out of the box support):

Phorus (harvesting catalogued resources from Intute.)
TRUE (using a Drupal plug in?)
ADM OER
ChemistryFM (option once content is in backup ePrints repository)

OAI-ORE

“Open Archives Initiative Object Reuse and Exchange (OAI-ORE) defines standards for the description and exchange of aggregations of Web resources.” (http://www.openarchives.org/ore/)

As a standard for describing aggregated or compound resources ORE has the potential to be highly relevant to some types of education materials made up of distributed web resources; its use, however, with educational materials has thus far been somewhat limited. It has however been used as an exchange mechanism for moving repository contents from one system to another.

Projects using OAI-ORE:

ChemistryFM (export function from WordPress – will use to backup content to repository)
ADMOER (export function from ePrints)
HumBox (export function from ePrints)

RSS for deposit, Jorum and UKOER: part 2 commentary

johnr — Thu, 04 Feb 2010 14:23:17 +0000

Following on from part 1 which reviewed Jorum’s requirements for RSS-based deposit, this section synthesises the comments and feedback emerging in response to it.

Community views

In response to the requirements and position papers a number of feeds where submitted for testing and there has been some thoughtful reflection on the issues in the blogs and by email. This is a brief summary of responses to the key issues:

Issues around generating the RSS

Although most platforms in use can easily create RSS feeds and some can create a feed from any search result, it has become clear that, the RSS profile that is created is frequently fixed and does not match the profile requested by Jorum (which is very similar to the profile suggested by OCWC).

Irrespective of repository software of other OER management ‘platforms’ in use, adjusting the RSS output profile has proved to be a non-trivial task. Emerging issues in adjusting the RSS outputs include:

Users of commercial platforms may have to rely on the company’s developers and development schedule.
Open source platforms may require additional local coding or at the least will require adjusting an XLST.
It is likely that the RSS output of web 2.0 platforms will simply not be editable.

In all three cases there may also be possible solutions that utilise independent tools, such as Yahoo Pipes, to process the feed after production or create a feed from another interface. However, such an approach to adjust the RSS profile is either still reliant on the information present in the original source feed or is dependent on adding standard profile information or extracting additional information and creating a new feed. See for example http://repositorynews.wordpress.com/2010/01/07/really-not-so-simple-syndication/.

Xml validity

Xpert note that of the 60 feeds they harvest, 5 aren’t valid xml and 20 aren’t valid RSS (see comment on Lorna’s post). It is worth noting that aggregators can and do deal with poorly formed data, however, in the timescale of the programme the kind of manual effort involved in dealing with poorly formed feeds (and the quality of the item metadata they would generate) is not likely to be supportable.

Conclusion

With the exception of some web 2.0 platforms and (potentially) some commercial repositories a revised or reprocessed RSS feed to meet Jorum’s requirements is, in theory, a possibility. However, few projects currently produce RSS meeting Jorum’s required RSS profile. From the project trials thus far, one project has so far been able to conform sufficiently to permit successful harvest. In the context of UKOER, adjusting the RSS profile requires the right congruence of platform(s), skills, and time within the project team. Hence it is unlikely that this solution will work for all projects who need a bulk upload option. In terms of the longer term feasibility of using RSS to facilitate bulk deposit, this which may change over time, particularly if the OCWC profile is adopted more widely.

Issues around metadata

The following issues were noted about the feed content:

There is tentative agreement that it would be good if RSS feeds used a DC namespace where possible and ideally supporting the OCWC profile.
The addition of custom elements (for example for the purposes of tracking OER currency) is not regarded as a good idea.
It cannot be taken for granted that the item identifier in a feed is the same as the identifier of the OER within the platform.
It cannot currently be assumed that the item identifier in a feed is either unique or persistent. This is a critical issue for processing the feed.
There may be multiple feed identifiers for a given OER.
Feeds may contain more than one namespace and / or feed formatting
In a number of repository platforms the identifier supplied frequently points to the splash page rather than the OER itself. This is an issue if the resource itself is to be harvested.
Few feeds have rights information for the items or for the feed itself. Including this information is regarded as good practice.
1. Feeds that use one of the variants of Creative Commons encoding may allow aggregators and Jorum to provide enhanced services.
2. Projects should clearly license their feeds and underlying items.
3. In the last resort projects should state clearly on their site or by telling Jorum what rights and licensing exists in connection to their feeds.
Metadata quality (including completeness) in feeds is variable.

Issues around processing the RSS

There are a number of issues about feed size, currency, updating, identifiers and OER deletion but these depend on whether the service is collecting information to help point to current OERs (like an aggregator) or whether it is seeking to provide a central collection of OERs (a library – even if some of those OERs are actually elsewhere). These distinctions are not clearly made in much of the discussion.

Feed classification:

Pushing everything into one classification seems to negate the classification work done by projects and many projects will be producing OERs with more than one JACS code. This could be addressed by having multiple feeds per platform (either from the platform or by subsequently dividing the feed) but there are potential duplication management issues with multiple feeds.

Feed setup

News feeds are the most common use of RSS or Atom, they typically are limited to a fixed number of most recent items (this does not preclude multiple subject based feeds from a repository as mentioned above). They can, however, contain the entire contents of the repository or all the results for the search term.

If the feed contains the entire contents of the repository (or search) it inevitably becomes very large. Large feeds tend to time out in browsers and can be difficult to ingest (as outlined in Xpert’s paper). However, many aggregator services prefer this approach as it provides a straightforward way to maintain currency, avoid duplication, and not have to consider partial deletion. This is because each time the feeds are polled/ gathered the previous index created by the aggregator is deleted. Only the content currently in the feed is indexed.

Magazine type feeds are the most common form of RSS and are more likely to be the default feed produced by repositories or other platforms ; they are usually small. However, to build an aggregation service or collection based upon them would require items from feeds to be stored in an incrementally built index (i.e. new items from feeds are added to a persistent index that retains their information even after they are no longer present in the feed). This works if there are unique and persistent identifiers for feed items or OERs included in the feed record and OERS do not end up with multiple feed identifiers.

OER currency

I’d suggest that the discussion about how to tell if a given OER has updated is a management and policy question to do with versioning and should be out of scope for this discussion. If a uri/url is provided for an OER, I think subsequent versions of the OER should have different urls as they are different things! There is a difference between an academic’s view of an OER as constantly in flux and a digital asset management perspective which needs a clear notion of the persistence or fixity of an released OER.

Feed currency

The discussion of how often feeds should be polled to check for new items is something which has to be agreed. It will impact on a number of issues and is affected by the type of feeds being consumed (magazine feeds will need to be polled more frequently) and will impact on the performance of the index.

Upload

Jorum have currently indicated that uploading OERs via RSS is out of scope. Upload would probably require some form of persistent and locally unique identifier for each OER to be included in the feed.

Deletion

There are wider questions in connection to deletion from Jorum, but in the context of RSS link deposit, deletion is only an issue if Jorum opts for some form of incremental built index. RSS is not designed to manage the deletion of items.

Overview of combinations of RSS options

I’ve created this table to try to pull together some of the interdependent issues relating to feed processing.

	A	B	C	D	E	F
	Feed of all OERS	Feed of all OERS	Subject feed of OERS	Subject feed of OERS	Magazine feed of OERS	Magazine feed of OERS
Feed size	Very Big	Very Big	‘Medium’	‘Medium’	Small	Small
Update	Replace	Incremental addition	Update	Incremental addition	Incremental addition	Update
Coverage	Whole current collection	Whole cumulative collection	Current subject collection	Cumulative subject collection	Whole collection (gradually)	Transient snapshot of collection
Deletion	occurs as a feed is replaced	does not occur automatically	occurs as a feed is replaced	does not occur automatically	does not occur automatically	occurs as a feed is replaced
OER Deduplication?	Not significant	Issue	Minor issue	Issue	Issue	Not significant

Other options

OAI-PMH

As a precursor, JorumOpen does not currently act as an OAI harvester so this is a somewhat moot point (Note: the software required for OAI-PMH consumption is distinct from the software needed for harvesting).

Within the programme not all OER producers are using repositories and of those that are not all repositories have OAI-PMH enabled. So, although there is some established practice of harvesting metadata via OAI-PMH it would be at best a partial solution.
There are pieces of software which can add support for OAI-PMH export but adapting and implementing them creates an additional development task for projects.

OAI-PMH harvesting has some built in support for resumption (incremental harvesting of metadata from large repositories) and has some support for record deletion but this is not always well supported.
OAI-PMH harvesting services have a mixed record – a key point of note is that they invariably need time to set up.
OAI-PMH harvesting will face many similar issues to RSS harvesting – in that identifiers will point to splash pages and that resources themselves are not harvested.

SRU

As a precursor: JorumOpen does not support SRU based harvesting so this is another moot point.
Considering SRU would require this functionality in both contributing repositories other platforms and into Jorum. It is, however, not yet widely supported or used. Though some commercial repositories do implement it and there are open source clients to bolt-on to repositories or other platforms. This is requires developer time and is likely to be a partial solution only.

Deposit API?

The current deposit tool is based on third party software MRCUTE which runs through Moodle – as such it cannot easily be adapted by Jorum to provide an API.
Jorum are however, exploring the addition of a SWORD deposit endpoint. Suitable SWORD deposit tools would need to be identified (ie those that can handle the right metadata and cope with something that isn’t a research paper – given the research focus of SWORD tool development these are likely to be some of the less-developed tools ).

RSS for deposit, Jorum and UKOER: part 1 review

johnr — Thu, 04 Feb 2010 14:17:20 +0000

Over the past few months CETIS and Jorum have been discussing approaches to bulk deposit to support the projects in the UKOER programme as they deposit or represent their OERs in Jorum. Based on feedback from projects gathered through our technical reviews of projects, we’ve investigated approaches which might work for the programme.

One option we have investigated is the use of RSS. Gareth Waller from Jorum produced a set of feed requirements and a discussion paper suggesting possible issues with the use of RSS. A number of projects have trialled their feeds and provided feedback on Lorna’s blog post introducing Gareth’s paper and outling the issues. The Xpert project has also produced a briefing paper looking at issues around RSS –based deposit. (Considerations and evaluations of the development of distributed repositories when using RSS aggregation as a submission protocol. By Pat Lockley, The University of Nottingham http://webapps.nottingham.ac.uk/elgg/xpert/files/-1/803/xpert+metadata+final.pdf )

Many thanks to Gareth and Laura from Jorum and everyone else who’s contributed to the discussion thus far. This post is a summary of that discussion and comment about other options suggested.

Please note this conversation is shaped by the constraints of the programme. The discussion below focuses on the relation of a single OER project producing a feed or feeds of resources to contribute to Jorum. Issues of how Jorum addresses and combines data feeds from different projects and provides standardised data are a separate discussion.

RSS

Suitability

Although submission to a repository isn’t the primary purpose of RSS, it does have functionality and features that may make it suitable for such a purpose. The investigation of RSS as an option for submitting content to Jorum began with the observations:

Many of the diverse choices of platforms used across the programme are capable of producing RSS
RSS is the preferred format for most current OER discovery services (aggregators). For example,
- DiscoverEd http://learn.creativecommons.org/wp-content/uploads/2009/07/discovered-paper-17-july-2009.pdf,
- Ensemble/Steeple http://www.steeple.org.uk/wiki/Ensemble,
- Xpert http://xpert.nottingham.ac.uk/
OCWC have produced recommendations for an application profile RSS feeds for OERs http://wiki.ocwconsortium.org/index.php?title=RSS_feeds
iTunes(U) uses RSS and a number institutions making material available through iTunesU

Jorum’s requirements

Jorum produced an outline of their minimum requirements for feed-based ingest and a briefing paper summarizing their current take on issues around RSS for deposit.

Feed format and content

Jorum‘s current requirements are:

RSS version 2.0 feed
At least one element belonging to one of the following namespace directly under the channel element. Metadata for all items must be represented in elements belonging to this namespace.
Licence information on each item (in the relevant metadata element). This must contain a v2 Eng & Wales CC licence url e.g.
1. DC : rights e.g. Licensed under a Creative Commons Attribution – NonCommercial-ShareAlike 2.0 Licence – see http://creativecommons.org/licenses/by-nc-sa/2.0/uk/
2. IMSMD : rights/description/langstring
3. LOM: rights/description/string”

Feed processing

Jorum currently processes the feed as follows:

“The feed is *not* continually polled for new content. […] The current functionality simply reads the feed when it is deposited and all the items are created in DSpace. It’s a snapshot in time of that RSS feed. If you add in the same feed again, it will store duplicates.
The physical data of a resource in the feed is not stored in JorumOpen. A link is simply created pointing to the resource as indicated by the RSS feed (the “link” element).”
“The feed MUST be valid XML – if the XML coming back isn’t valid in the first place then we cannot process it (neither can any validator, XML reader etc). ”
“Items within a feed are not auto classified within Jorum. In other words, every item in a feed is stored within a single collection as chosen by the admin user i.e. a top level JACS or LearnDirect classification. Having individual feed for each classification such as the OpenLearn model would ensure that items are classified correctly as these feeds can be deposited separately.”

Possible issues about the use of feeds

In his paper Gareth raises a number of issues and questions including the following:

RSS items need to contain the unique id of the OER
It’s not yet clear how to tell from the feed if an OER has changed or been deleted
Feeds should not contain the whole repository contents
There is the possibility that OERs might fall between arbitrary limits for feed creation (50 most recent items polled everyday misses resources above this number)
The richness of metadata which exists within the platform creating the RSS may be restricted to using subset of the fields they have available by the feed creation process or feed consumption process.
Feed deposit needs to make assumptions about licensing
Current exploration of feed deposit relates only to harvesting metadata, not to harvesting resources.

Part 2 of this post will look at the community responses to this proposal and look at emerging issues.