RSS for deposit, Jorum and UKOER: part 2 commentary

Posted on February 4, 2010 by johnr

Following on from part 1 which reviewed Jorum’s requirements for RSS-based deposit, this section synthesises the comments and feedback emerging in response to it.

Community views

In response to the requirements and position papers a number of feeds where submitted for testing and there has been some thoughtful reflection on the issues in the blogs and by email. This is a brief summary of responses to the key issues:

Issues around generating the RSS

Although most platforms in use can easily create RSS feeds and some can create a feed from any search result, it has become clear that, the RSS profile that is created is frequently fixed and does not match the profile requested by Jorum (which is very similar to the profile suggested by OCWC).

Irrespective of repository software of other OER management ‘platforms’ in use, adjusting the RSS output profile has proved to be a non-trivial task. Emerging issues in adjusting the RSS outputs include:

Users of commercial platforms may have to rely on the company’s developers and development schedule.
Open source platforms may require additional local coding or at the least will require adjusting an XLST.
It is likely that the RSS output of web 2.0 platforms will simply not be editable.

In all three cases there may also be possible solutions that utilise independent tools, such as Yahoo Pipes, to process the feed after production or create a feed from another interface. However, such an approach to adjust the RSS profile is either still reliant on the information present in the original source feed or is dependent on adding standard profile information or extracting additional information and creating a new feed. See for example http://repositorynews.wordpress.com/2010/01/07/really-not-so-simple-syndication/.

Xml validity

Xpert note that of the 60 feeds they harvest, 5 aren’t valid xml and 20 aren’t valid RSS (see comment on Lorna’s post). It is worth noting that aggregators can and do deal with poorly formed data, however, in the timescale of the programme the kind of manual effort involved in dealing with poorly formed feeds (and the quality of the item metadata they would generate) is not likely to be supportable.

Conclusion

With the exception of some web 2.0 platforms and (potentially) some commercial repositories a revised or reprocessed RSS feed to meet Jorum’s requirements is, in theory, a possibility. However, few projects currently produce RSS meeting Jorum’s required RSS profile. From the project trials thus far, one project has so far been able to conform sufficiently to permit successful harvest. In the context of UKOER, adjusting the RSS profile requires the right congruence of platform(s), skills, and time within the project team. Hence it is unlikely that this solution will work for all projects who need a bulk upload option. In terms of the longer term feasibility of using RSS to facilitate bulk deposit, this which may change over time, particularly if the OCWC profile is adopted more widely.

Issues around metadata

The following issues were noted about the feed content:

There is tentative agreement that it would be good if RSS feeds used a DC namespace where possible and ideally supporting the OCWC profile.
The addition of custom elements (for example for the purposes of tracking OER currency) is not regarded as a good idea.
It cannot be taken for granted that the item identifier in a feed is the same as the identifier of the OER within the platform.
It cannot currently be assumed that the item identifier in a feed is either unique or persistent. This is a critical issue for processing the feed.
There may be multiple feed identifiers for a given OER.
Feeds may contain more than one namespace and / or feed formatting
In a number of repository platforms the identifier supplied frequently points to the splash page rather than the OER itself. This is an issue if the resource itself is to be harvested.
Few feeds have rights information for the items or for the feed itself. Including this information is regarded as good practice.
1. Feeds that use one of the variants of Creative Commons encoding may allow aggregators and Jorum to provide enhanced services.
2. Projects should clearly license their feeds and underlying items.
3. In the last resort projects should state clearly on their site or by telling Jorum what rights and licensing exists in connection to their feeds.
Metadata quality (including completeness) in feeds is variable.

Issues around processing the RSS

There are a number of issues about feed size, currency, updating, identifiers and OER deletion but these depend on whether the service is collecting information to help point to current OERs (like an aggregator) or whether it is seeking to provide a central collection of OERs (a library – even if some of those OERs are actually elsewhere). These distinctions are not clearly made in much of the discussion.

Feed classification:

Pushing everything into one classification seems to negate the classification work done by projects and many projects will be producing OERs with more than one JACS code. This could be addressed by having multiple feeds per platform (either from the platform or by subsequently dividing the feed) but there are potential duplication management issues with multiple feeds.

Feed setup

News feeds are the most common use of RSS or Atom, they typically are limited to a fixed number of most recent items (this does not preclude multiple subject based feeds from a repository as mentioned above). They can, however, contain the entire contents of the repository or all the results for the search term.

If the feed contains the entire contents of the repository (or search) it inevitably becomes very large. Large feeds tend to time out in browsers and can be difficult to ingest (as outlined in Xpert’s paper). However, many aggregator services prefer this approach as it provides a straightforward way to maintain currency, avoid duplication, and not have to consider partial deletion. This is because each time the feeds are polled/ gathered the previous index created by the aggregator is deleted. Only the content currently in the feed is indexed.

Magazine type feeds are the most common form of RSS and are more likely to be the default feed produced by repositories or other platforms ; they are usually small. However, to build an aggregation service or collection based upon them would require items from feeds to be stored in an incrementally built index (i.e. new items from feeds are added to a persistent index that retains their information even after they are no longer present in the feed). This works if there are unique and persistent identifiers for feed items or OERs included in the feed record and OERS do not end up with multiple feed identifiers.

OER currency

I’d suggest that the discussion about how to tell if a given OER has updated is a management and policy question to do with versioning and should be out of scope for this discussion. If a uri/url is provided for an OER, I think subsequent versions of the OER should have different urls as they are different things! There is a difference between an academic’s view of an OER as constantly in flux and a digital asset management perspective which needs a clear notion of the persistence or fixity of an released OER.

Feed currency

The discussion of how often feeds should be polled to check for new items is something which has to be agreed. It will impact on a number of issues and is affected by the type of feeds being consumed (magazine feeds will need to be polled more frequently) and will impact on the performance of the index.

Upload

Jorum have currently indicated that uploading OERs via RSS is out of scope. Upload would probably require some form of persistent and locally unique identifier for each OER to be included in the feed.

Deletion

There are wider questions in connection to deletion from Jorum, but in the context of RSS link deposit, deletion is only an issue if Jorum opts for some form of incremental built index. RSS is not designed to manage the deletion of items.

Overview of combinations of RSS options

I’ve created this table to try to pull together some of the interdependent issues relating to feed processing.

	A	B	C	D	E	F
	Feed of all OERS	Feed of all OERS	Subject feed of OERS	Subject feed of OERS	Magazine feed of OERS	Magazine feed of OERS
Feed size	Very Big	Very Big	‘Medium’	‘Medium’	Small	Small
Update	Replace	Incremental addition	Update	Incremental addition	Incremental addition	Update
Coverage	Whole current collection	Whole cumulative collection	Current subject collection	Cumulative subject collection	Whole collection (gradually)	Transient snapshot of collection
Deletion	occurs as a feed is replaced	does not occur automatically	occurs as a feed is replaced	does not occur automatically	does not occur automatically	occurs as a feed is replaced
OER Deduplication?	Not significant	Issue	Minor issue	Issue	Issue	Not significant

Other options

OAI-PMH

As a precursor, JorumOpen does not currently act as an OAI harvester so this is a somewhat moot point (Note: the software required for OAI-PMH consumption is distinct from the software needed for harvesting).

Within the programme not all OER producers are using repositories and of those that are not all repositories have OAI-PMH enabled. So, although there is some established practice of harvesting metadata via OAI-PMH it would be at best a partial solution.
There are pieces of software which can add support for OAI-PMH export but adapting and implementing them creates an additional development task for projects.

OAI-PMH harvesting has some built in support for resumption (incremental harvesting of metadata from large repositories) and has some support for record deletion but this is not always well supported.
OAI-PMH harvesting services have a mixed record – a key point of note is that they invariably need time to set up.
OAI-PMH harvesting will face many similar issues to RSS harvesting – in that identifiers will point to splash pages and that resources themselves are not harvested.

SRU

As a precursor: JorumOpen does not support SRU based harvesting so this is another moot point.
Considering SRU would require this functionality in both contributing repositories other platforms and into Jorum. It is, however, not yet widely supported or used. Though some commercial repositories do implement it and there are open source clients to bolt-on to repositories or other platforms. This is requires developer time and is likely to be a partial solution only.

Deposit API?

The current deposit tool is based on third party software MRCUTE which runs through Moodle – as such it cannot easily be adapted by Jorum to provide an API.
Jorum are however, exploring the addition of a SWORD deposit endpoint. Suitable SWORD deposit tools would need to be identified (ie those that can handle the right metadata and cope with something that isn’t a research paper – given the research focus of SWORD tool development these are likely to be some of the less-developed tools ).

RSS for deposit, Jorum and UKOER: part 1 review

Posted on February 4, 2010 by johnr

Over the past few months CETIS and Jorum have been discussing approaches to bulk deposit to support the projects in the UKOER programme as they deposit or represent their OERs in Jorum. Based on feedback from projects gathered through our technical reviews of projects, we’ve investigated approaches which might work for the programme.

One option we have investigated is the use of RSS. Gareth Waller from Jorum produced a set of feed requirements and a discussion paper suggesting possible issues with the use of RSS. A number of projects have trialled their feeds and provided feedback on Lorna’s blog post introducing Gareth’s paper and outling the issues. The Xpert project has also produced a briefing paper looking at issues around RSS –based deposit. (Considerations and evaluations of the development of distributed repositories when using RSS aggregation as a submission protocol. By Pat Lockley, The University of Nottingham http://webapps.nottingham.ac.uk/elgg/xpert/files/-1/803/xpert+metadata+final.pdf )

Many thanks to Gareth and Laura from Jorum and everyone else who’s contributed to the discussion thus far. This post is a summary of that discussion and comment about other options suggested.

Please note this conversation is shaped by the constraints of the programme. The discussion below focuses on the relation of a single OER project producing a feed or feeds of resources to contribute to Jorum. Issues of how Jorum addresses and combines data feeds from different projects and provides standardised data are a separate discussion.

RSS

Suitability

Although submission to a repository isn’t the primary purpose of RSS, it does have functionality and features that may make it suitable for such a purpose. The investigation of RSS as an option for submitting content to Jorum began with the observations:

Many of the diverse choices of platforms used across the programme are capable of producing RSS
RSS is the preferred format for most current OER discovery services (aggregators). For example,
- DiscoverEd http://learn.creativecommons.org/wp-content/uploads/2009/07/discovered-paper-17-july-2009.pdf,
- Ensemble/Steeple http://www.steeple.org.uk/wiki/Ensemble,
- Xpert http://xpert.nottingham.ac.uk/
OCWC have produced recommendations for an application profile RSS feeds for OERs http://wiki.ocwconsortium.org/index.php?title=RSS_feeds
iTunes(U) uses RSS and a number institutions making material available through iTunesU

Jorum’s requirements

Jorum produced an outline of their minimum requirements for feed-based ingest and a briefing paper summarizing their current take on issues around RSS for deposit.

Feed format and content

Jorum‘s current requirements are:

RSS version 2.0 feed
At least one element belonging to one of the following namespace directly under the channel element. Metadata for all items must be represented in elements belonging to this namespace.
Licence information on each item (in the relevant metadata element). This must contain a v2 Eng & Wales CC licence url e.g.
1. DC : rights e.g. Licensed under a Creative Commons Attribution – NonCommercial-ShareAlike 2.0 Licence – see http://creativecommons.org/licenses/by-nc-sa/2.0/uk/
2. IMSMD : rights/description/langstring
3. LOM: rights/description/string”

Feed processing

Jorum currently processes the feed as follows:

“The feed is *not* continually polled for new content. […] The current functionality simply reads the feed when it is deposited and all the items are created in DSpace. It’s a snapshot in time of that RSS feed. If you add in the same feed again, it will store duplicates.
The physical data of a resource in the feed is not stored in JorumOpen. A link is simply created pointing to the resource as indicated by the RSS feed (the “link” element).”
“The feed MUST be valid XML – if the XML coming back isn’t valid in the first place then we cannot process it (neither can any validator, XML reader etc). ”
“Items within a feed are not auto classified within Jorum. In other words, every item in a feed is stored within a single collection as chosen by the admin user i.e. a top level JACS or LearnDirect classification. Having individual feed for each classification such as the OpenLearn model would ensure that items are classified correctly as these feeds can be deposited separately.”

Possible issues about the use of feeds

In his paper Gareth raises a number of issues and questions including the following:

RSS items need to contain the unique id of the OER
It’s not yet clear how to tell from the feed if an OER has changed or been deleted
Feeds should not contain the whole repository contents
There is the possibility that OERs might fall between arbitrary limits for feed creation (50 most recent items polled everyday misses resources above this number)
The richness of metadata which exists within the platform creating the RSS may be restricted to using subset of the fields they have available by the feed creation process or feed consumption process.
Feed deposit needs to make assumptions about licensing
Current exploration of feed deposit relates only to harvesting metadata, not to harvesting resources.

Part 2 of this post will look at the community responses to this proposal and look at emerging issues.

DepoST : what would a repository deposit tool look like for learning materials?

Posted on November 6, 2009 by johnr

The morning sessions a the recent JISCRI deposit tools show and tell meeting in London (DepoST) offered a whirlwind of elevator pitches for the many existing repository deposit tools. Details of the tools from the pitches have been neatly captured by David Flanders on the JISCinvolve blog.

In the midst of the afternoon sessions there where a few of us with an interest in learning materials (and particularly Open Educational Resources) who had a think about what might be different about a tool for depositing learning materials in a repository (Rory McNichol, Richard Davies, Julian Tenney, Pat Lockley, Phil Barker, J.M.Gray, Antony Corfield and myself). In our discussions we didn’t talk that much about mechanisms but focused more on the features that such a tool might require. [Subsequently Phil has blogged an inital view on the possible deposit/ harvest mechanisms http://blogs.cetis.org.uk/philb/2009/10/28/feed-deposit/ – his post is about the questions we need to address now; this post and our discussions on the day looked at the what next question]

Our short list of possible differences centered, not on technical diferences as much on the importance of context. In particular the context of the use of the learning material. We thought that future developements should look not only at the deposit of a learning material but also consider the ongoing ‘deposit’ of usage information in some form- allowing the repository to gather feedback about the resource. From this point, it’s fair to say that our conception of a deposit veered somewhat towards including elements of a repository interface (tool or otherwise) that would allow discovery and ongoing data excahnge about a learning material. As such the following isn’t so much of a requirements specification as a trying to pin down information from the user or other systems that would help improve how learning materials are managed and accessed.

Our shortlist of key features was:

richer user profiles both for depositors and users
resources to include a link to the source/ master object
import asset plus usage info (such as which courses it’s used for) from VLE
import asset plus usage info (such as comments and tags) from Web 2 tools
need support for instituional management and release of assets

Having written this I’m very aware that SWORD works because it’s so simple. Partly this is because putting papers into repositories is, mostly, a one directional technical process [it is of course a much more interactive social/ political / administrative process] and SWORD has been very careful to limit in what it is trying to do. Consequently any work in this area looking to expand the scope of deposit tool/ repository interface functionality should be very cautious in adding mandatory extras. However, feedback and usage information are becoming increasingly important for scholarly communciations and data sets are likely to prove to be much more interactive resources (in a similar way to learning materials) as how they’re being used is key information). In a similar way institutions (as well as authors) are increasingly becoming the creators and/or distributors of resources so the ‘corporate’ deposit interface is likely to become more prominent.

Our discussion created more questions than answers in my mind, but it’s clear that, however deposit tools develop, we’d like them to be able to capture more context, but that this has to be done in lightweight ways that reuse rather than recreate information – we’ve had complex standards that ask for this type of information for a while but we have always asked users to input it.

Our full discussion is pictured below.

Notes about features of a repository deposit tool for learning materials

Notes from the web: Making Standards that Work and a Sordid History of Learning Object Repositories

Posted on November 5, 2009 by johnr

A few quick items of interest form the web this week. Two offer a perspective of the process of making standards (looking at OAI-PMH); another is an interview with Brian Lamb reviewing the history of Learning Object Repositories.

Talking to DC [Washington] (Adam Bosworth, Adam Bosworth’s Weblog)

In a post based on his experiences with standards development, Adam outlines seven guidelines for good standards development
http://adambosworth.net/2009/10/29/talking-to-dc/

Keep the standard as simple and stupid as possible.
The data being exchanged should be human readable and easy to understand.
Standards work best when they are focused.
Standards should have precise encodings.
Always have real implementations that are actually being used as part of design of any standard.
Put in hysteresis for the unexpected.
Make the spec itself free, public on the web, and include lots of simple examples on the web site.

Making Standards that Work (Dorothea Salo, The Book of Trogool)

http://scienceblogs.com/bookoftrogool/2009/11/making_standards_that_work.php
Relfecting on Adam’s post, Dorothea relates some of those principles to her own experience and view of standards development in particular commenting on OAI-PMH.
OAI-PMH is an interesting example because it’s so widely used and, as Dorothea says, so simple. When it works, it works well (even if we might now like to change some of it to be more web friendly). However, metadata sharing works best in defined communities. When OAI-PMH doesn’t work, it’s a mess as frequently it’s the data harvesters who notice but who are dependent on the data providers (and potentially also their technical support) to change anything. Interestingly the OAI-PMH Static repository specification pushed some of the emphasis back onto the data provider – as their base information in xml had to be valid xml before it would be mediated by a gateway (but SRs are a whole other story with lots of potential but their own problems.)

The Sordid History of Learning Object Repositories or, a chat with Brian Lamb (Jim Groom, bavatuesdays)

One of the interesting things about the UKOER programme is how much freedom projects have to choose how they are going to store, describe, manage, and share their resources. They are using a wide variety of approaches, which include repositories, content management systems, and web pages of rss feeds. They’re also using a wide variety of ways to descibe stuff. All of the approaches though are some way from the sort of educational world which the original learning object repositories envisaged and which Brian Lamb reflects on here:
http://bavatuesdays.com/the-sordid-history-of-learning-object-repositories-or-a-chat-with-brian-lamb/. I still think repositories have a lot to offer the management of learning materials but they’re not the only option, are better as part of a wider suite of tools, are really hope they aren’t going to ask users about semantic density. To my mind Brian’s relfections highlight a number of reasons why the UKOER programme is implementation nuetral.

I’ve also realised that I’ve not yet pointed to a related resource from CC Talks about OERs A chat with Stephen Downes on OER http://creativecommons.org/weblog/entry/17860 I was also going to talk about Pete Johnston’s latest installment about Simple Dublin Core but that’ll have to wait for another day (http://efoundations.typepad.com/efoundations/2009/11/simple-dc-revisited.html)

Open Repositories 2009

Posted on June 5, 2009 by johnr

Held at Georgie Tech, Open Repositories 2009 hosted 326 delegates form 23 countries. The conference ran smoothly and managed to provide robust wifi for anyone who wanted it. The conference dinner was held at the Georgia Aquarium; the aquarium was inspiring (though the theme music was a bit much after a while) and the dinner was splendid.

Atlanta Aquarium Jellyfish

The conference proper once again provided a state of the art view of repository software and emerging trends in institutional (& organisational) approaches to managing digital assets. It proved to be a thought-provoking few days.

Of particular note was the impact of SWORD which was, frankly, everywhere. I’m not going to say a great deal about it beyond noting that:

at least for this conference, it has become the de facto standard for deposit tools
there are now a good number of desktop tools and application plug-ins supporting SWORD deposit

It will be interesting to see how it evolves from here. For those of you on Twitter @swordapp tweets updates of SWORD related developments.
[disclaimer: I have in the past worked with both of the SWORD project managers]

Another feature of the conference was another RepoChallenge. Sponsored by JISC and Microsoft this again attracted a lot of interest; David Flanders, the organiser, has blogged about the event, participants, and winners (Winner: MentionIt by Tim Donohue; Runner up: FedoraFS by Rebecca Koesar)
http://dev8d.jiscinvolve.org/2009/05/20/repochallenge-winners/.

I’ve got about 8 pages of tweets covering the rest of the conference – rather than bore you with I’ll offer this wordle as a summary and instead comment on what I felt to be the most important developments.

I’ll also note in passing the approach one session took to supporting twitterers – programme session 7 a – chaired by Robert Macdonald http://twitter.com/mcdonald/ added a hashtag for their session #ps7a alongside #or09 this would allow the seperate retrieval and analysis of that sessions tweets from the more general stream of conference tweets.

Trends that I noticed emerging from the conference:

managing datasets is firmly entering institutional agendas. In part this is pushed by funding bodies in part through a desire for more open data.
the merger of the DSpace and Fedora organisations should provide a more stable future for the software platforms and in the longer term greater opportunities for collaborative development.
California Digital Library (John Kunze’s presentation) are shifting to repository microservices and beginning to move beyond single software products. (I couldn’t help being reminded of the eFramework but CDL seems to be beginning with a specific local business case)
there appears to be a growing interest in Open Access journal publishing in North America: both in the production of new journals and through collaboration with university presses; Microsoft Research have also developed a hosted open access journal service.
Zentity – it’s is unclear yet to what extent Microsoft Research’s repository will gain traction in the community but the impact of their engagement with the sector and tools like Zentity and the SWORD deposit plugin for MS Office is significant in itself.
@mire and mediashelf are two commercial companies heavily involved in the development of additional functionality or support services for DSpace and Fedora (respectively). It was striking how many projects doing innovative stuff had worked with one or other of them. Eprints has offered customisation, hosting, and related services for a while but that initiative emerged from the ECS department at Southampton where the software originated, whereas these two companies seem to have emerged more independently.

It was also interesting to note a few presentations touching on managing learning materials.

Other blogs reporting on OR09 are listed at:
http://repositorynews.wordpress.com/2009/05/28/open-repositories-2009/

Recent JISC project outputs at Open Repositories 2009

Posted on May 15, 2009 by johnr

RRT are attending Open Repositories 2009 in Atlanta.

Our poster highlights nine recent ready-to-use outputs from JISC’s repositories and preservation work that the conference might not otherwise have heard about.

RRT poster for OR09

Repository software update

Posted on April 17, 2009 by johnr

Over the past couple of months I’ve had a chance to hear updates from a number of repository software developers (at a Fedora training day, at DEV8D and on a number of blogs). Albeit slightly delayed by holidays, here’s a bit of a snapshot of where ePrints, DSpace, Fedora, Microsoft’s repository are at. There’s a lot more information about Fedora than the others as I’ve heard a couple of updates from them. The usual caveat that I may have misunderstood what some of these are or how developed they are should apply. Much of this development is building up to releases at Open Repositories 2009.

Fedora

(in the process of writing I’ve noted that indepth coverage of most of the Fedora items can be found on the fedora Hatcheck newsletter blog: http://www.fedora-commons.org/resources/newsletter.php )

Recent/current development

A new version of the rest api has been developed by the Fedora community
An updated Muradora front end.
“The Muradora project aims to develop a web front-end for Fedora repository and to re-factor Fedora authentication and authorization into pluggable middleware components.” Some of this security work is likely to become part of fedora proper
https://fedora-commons.org/confluence/display/MURADORA/Muradora
Islandora – new drupal based front end
http://www.fedora-commons.org/confluence/display/ISLANDORA/Islandora
An update of the fedora GSearch service plugin ( full text indexing) http://expertvoices.nsdl.org/hatcheck/2008/02/27/fedora-gsearch-20-release-takes-advantage-of-lucene/
Plug-ins for popular applications are under development (wikis, zotero, MSWord)

Developments (preOR09)

improve out of box administrative gui – move towards a web-based gui
improved api for backend storage (akubra api)
This is linked to discussions with DSPACE, ePrints on a common storage abstraction to develop a
Pluggable storage sub-system integration.
Support for SWORD 1.3

Longer term developments

Work on webdav – to lower ingest barriers by supporting drap and drop
More enhanced content models
Active Fedora (based on/ similar to active record in Ruby
Hydra – working towards an out of the box Fedora to support faculty create/store object directly; longer term support for more complex arrays of digital objects. http://www.fedora-commons.org/confluence/display/hydra/The+Hydra+Project

duraspace: DSpace and Fedora collaboration

http://expertvoices.nsdl.org/hatcheck/2008/11/11/dspace-foundation-and-fedora-commons-receive-grant-from-the-mellon-foundation-for-duraspace/
Moving to sharable module development – the initial project will be the development of storage module. The investigation of possible durable storage service layer (broker) offering: pluggable storage, ‘Cloud’ storage, ‘interCloud’- university offered storage services

DSpace

Jim Downing presenting an update on DSpace at Dev8D but (afaik) most of what he presented either realted to the work on duraspace mentioned above or is now part of the new 1.5.2 DSpace release. The details of this release have been summarized by Stuart Lewis’s blog post http://blog.stuartlewis.com/2009/04/15/dspace-152-whats-
in-it-for-me/. A few of the new things from his highlights are:

Support for SWORD 1.3
“Shibboleth support has been added.”
More refined ldap integration options
support for uketd_dc and exposing it via OAI-PMH (out of the box)
export tools have been improved

ePrints

ePrints is now around 10 yrs old and despite close ties to the Open Access movement, ePrints is also developing support for the gamut of institutional processes. In particular, it’s developing greater support for statistics, research management, and better desktop integration.

ePrints are planning to have beta version of ePrints 3.2 by or09 . Key updates planned for this release:

metadata values can now be resolvable references not just literals.
extendable data types – will support CERIF (Common European Research Information Format) out of the box.
integration of work of IrStats project – stats and citations improvements to system.
Support for a tiered storage layer plugin. http://repositoryman.blogspot.com/2009/02/cloud-researcher-and-repository.html

Edit: a fuller list of updates in this release is available http://wiki.eprints.org/w/New_Features_Proposed_for_EPrints_3.2

Microsoft

Microsoft Research’s team working on repositories and scholarly communications have produced a number of free tools based on Microsoft products (http://www.microsoft.com/mscorp/tc/scholarly_communication.mspx). I’ve talked about the Creative Commons plugin before but they’ve also developed beta versions of an ejournal service, a document conversion service, an onotlogy plugin for word, a research information centre (with the British Library), they’ve worked with the ePrints to develop a windows-based version of ePrints, and a research repository.

Version 1 of the research repository is going to be formally released at workshop at OR2009 (https://or09.library.gatech.edu/workshops.php). Work on related tools for the desktop and mobile devices is planned after this launch.

The debate about free / somewhat open tools built on commercial products is a separate issue but it’s worth remembering that most insititutions are going to have and support all the required comercial software anyway – irrespective of what the repository software they consider (I’ll come back to this in another post).

Microsoft also have released some of their development tools to education. In an initiative called dreamspark users can download full versions of Microsoft development software under an academic license. Computer Science departments have had this sort of deal for a while but the two good things about this are: it’s open to any student/ academic and it’s no longer a ‘mediated’ rather it uses shibboleth and your own institutional login to verify status.

Notes from the web: metadata related reports

Posted on March 11, 2009 by johnr

There have been two reports relating to metadata released recently that I’ve been meaning to read and blog about: OCLC’s What We’ve Learned from the RLG Partners Metadata Creation Workflows Survey and DLF’s Future Directions in Metadata Remediation for Metadata Aggregators. However, I’m not going to get a chance to do more than skim these for a while – so while they’re fresh here’s the links.

What We’ve Learned from the RLG Partners Metadata Creation Workflows Survey

http://www.oclc.org/programs/publications/reports/2009-04.pdf
There’s an interesting comparasion of tools and people used to create MARC and non-MARC materials.
Of particular interest is how little libraries seem to involved in ‘educational’ metadata – 8 out of 78 respondents are invovled in creating metadata for learning objects. Now I know there’s a world of difference between learning materials with educational metadata and learning object metadata but looking at the metadata standards being used, educational description still seems to be out of scope. It is also of note how little of the respondents metadata is currently being pushed to/ used by newer forms of exposure – web2.0 tools and SRU.

Future Directions in Metadata Remediation for Metadata Aggregators

http://www.diglib.org/pubs/dlf110.pdf
Many repository and digital library services operate on the premise that exposing their metadata will enable information about thier content to be made available in larger resource discovvery services that aggregate metadata from many sources. Such services provide a valuable service but often have well documented problems with variation in the metadata which they harvest. Whether the metadata is of poor quality or simply designed to support local needs without consideration of a wider context, aggregated metadata needs to be cleaned and otherwise processed to provide a better discovery service. This report examines key services/ features that a aggregated search service would hope to provide and for each documents how metadata supports that service, what tools exist to ‘fix’ harvested metadata, what tools are desired and provides a bibliography and comments. As such this report should represent an overview state of the art (and I really need to read it soon…).

Reflections on dev8D: vle and repositories sessions

Posted on March 10, 2009 by johnr

#dev8D

Developer happiness days http://www.dev8d.org/ was a week long JISC-sponsored event organised by David Flanders and Andy McGregor. It set out to bring “together the cream of the crop of educational software developers along with coders from other sectors, users, and technological tinkerers in an exciting new forum.” The event was a success and its own blog http://dev8d.jiscinvolve.org/ contains short profiles of some of the development projects that sprung up and interviews some key developers in the domain. As I’m more of a user than a programmer I attended the two community days of dev8D. The first day I went to a session on virtual learning environments in the morning and a session on repositories in the afternoon. The second day was also about repositories with updates and plans from four repository systems (ePrints, DSpace, Fedora, and Microsoft’s ‘Famulus’) – I’ll blog about that day and the updates and plans of the other repository systems separately.

VLE session

The vle session kicked off with three mini case studies from members of the vle teams at Birbeck, Imperial, and LSE who are using Blackboard, WebCT->Blackboard, and Moodle respectively. The presenters talked about the different integration or development issues ongoing in their institution. There was then some general discussion and demos of Blackboard 9 and Sakai. The session identified three key areas for development (one from each presenter but fitting the experience of those present as a whole); these are:

support for anonymous marking;
automation of enrollment (at module level/ integration with registry systems);
integration with learning object repository.

There was one development team in the room but they were already working on their project – SpACE tool- blackboard Api and IMS tools interoperability specification being codebashed by team from Edinburgh, Strathclyde, and Blackboard. See http://spvsoftwareproducts.com/powerlinks/space-w/ for more details.

Repositories session

The afternoon session on repositories went quite differently with Les Carr steering us to think about repository heresies, and question the current norms within the repository community. There was a lively discussion which ended up clustering around a couple of key themes: the problem of managers shaping development, the problem of the paper-based format, and the opportunity of preservation. The discussion roughly went as follows:

The problem of managers shaping development

As institutional managers, driven by new models of research assessment or demonstrating value, become more interested in statistics there is a risk that development of repository software may be skewed to focus on support for reporting functions at the expense other, more critical, development [such as functionality to support content ingest, content visibility, and end user services]. There was a general concensus that, in part, this concern is obviated as long as the repository provides suitable APIs and access. Much of the data needed for institutional reporting should be able to be provided to external applications – the repository software itself doesn’t need to be customised to include these functions.

The problem of the web-based format

There was a clear feeling that repositories are still tied to paper-based formats; organisationally and technically they are not particularly suited to web-enhanced documents and born digital/linked documents. This is not to say that they can’t cope with such documents, but that thy don’t cope with them well and inevitably stifle their richness. Participants noted that there needs to be revolution in publishing to create web native publications. One area where this is beginning to happen is in the repository-supported linking of datasets and publications. There is, however, an even greater potential to enhance articles through supporting better facilities to link articles and comment inline.

The opportunity of preservation

Throughout the discussion there seemed to be an ongoing thread about the role of repositories in preservation. This touched on many areas including the problems with pdfs (both as a web-based format and as a preservation format) and the possible role for repositories in overcoming difficulties in preserving wiki’s and some web2.0 content. There was a sense in which the underlying thread of this was that a repository is more of a state of mind than a particular piece of software. An institutional repository should be able to change between products or switch between all-in-one repositories and suites of tools without fundamentally changing what it does.

demos

ePrints Soton demomonstrated a javascript plugin that automatically scans a webpage for citations and creates previews from an identified repository on the fly
a research community in the humanities which using wordpress was demonstrated http://ap0riasofar.wordpress.com/ (I think it’s providing a forum for discussion around bits of data but I’m not exactly sure; the site’s about page linked to a youtube video but the video has been removed…)
Indirectly splashurl was demonstrated – this creates a shortened url or QR code and displays it in a large font on the webpage for projection splashurl.net

reflections:

vle session

The presentations were interesting but there was perhaps a slight mismatch as, at least initially, the presenters were speaking to the audience as if we were developers. Unfortunately developers were thin on the ground in our session as it suffered from being in parallel not only with a strand on OPACs but also with the Dragon’s Den event for developers, as a result I suspect our session had many more users/ vle administrators than developers.

repositories session

Our discussion about repositories kept returning to preservation. Although I think this is a vital role that repositories and there is much to discuss about how well repositories preserve stuff, I feel very uneasy about the dominiance of this idea and its apparent status as the key use case. Questions about repositories, preservation, and learning materials is a blog post in its own right but my concern with preservation as the use case for repositories is, in part, simply that it doesn’t sell particularly well, it’s really quite unproven, and frankly we’ve had the idea of a single key use case before with Open Access (which was hardly mentioned in the discussions). The reasons repositories (in the technical and organisational sense) work is that they don’t just do one thing. They may provide the basis for initiatives for any of the following: open access, preservation, institutional research management, knowledge management, asset management and storage, and new forms of publication.

Having expressed that concern, I’d note that the discussion about the role of repositories in archiving web2.0 and web native publication formats was really useful and reinforced the idea that repositories may be maturing to the point where theyare able become part of the background/ institutional infrastructure.

A wordle of my tweets during the repository session is available http://tinyurl.com/b5wncy

I’m glad I was at dev8d but arriving as the coding at the event tailed off meant I missed much of the frentic bar camp atmosphere and, as it worked out, saw very little of coding projects in progress. I can appreciate why the dragon’s den wasn’t open but hope that any future events find a way to showcase the projects in progress a bit more. As it was much of the coding seemed to pass the Thursday’s events by.

An ecology of repositories?

Posted on December 3, 2008 by johnr

[this is a copy of my post on the Repository Research Team blog http://jiscrrt.wordpress.com/]

In issue 57 of Ariadne, Phil, Mahendra, and myself have an article introducing some of our work on ecological models of repository and service interaction. “A Bug’s Life?: How Metaphors from Ecology Can Articulate the Messy Details of Repository Interactions”

In our introduction we outline the isseus that our work is addressing.
“The development, implementation, and support of real services challenges how we have traditionally articulated, represented, and tried to communicate the context of those services. We need abstract visions of an information environment, recommended standards, and models of software architectures (or component software functions) that can inform how we begin to develop local repositories and services. However, we, as a community of managers, librarians, researchers, and developers of technology, also need approaches that help us engage with the complex details of local contexts that shape how and why particular repository implementations succeed or fail”

The article goes on to outline different types of complex systems that exist, provides an overview of some relevant concepts from ecology and how they might be of use, and provides an example of using them to examine an academic’s dissemination of presentations (which was also the subject of our poster at DC2008).

Our article is available here http://www.ariadne.ac.uk/issue57/robertson-et-al . Comments and feedback are very welcome.