John Robertson » Jorum http://blogs.cetis.org.uk/johnr Cetis Blogs Mon, 15 Jul 2013 13:26:48 +0000 en-US hourly 1 http://wordpress.org/?v=4.1.22 UKOER 2: Dissemination protocols in use and Jorum representation http://blogs.cetis.org.uk/johnr/2011/08/26/ukoer-2-dissemination-protocols-in-use-and-jorum-representation/ http://blogs.cetis.org.uk/johnr/2011/08/26/ukoer-2-dissemination-protocols-in-use-and-jorum-representation/#comments Fri, 26 Aug 2011 16:01:54 +0000 http://blogs.cetis.org.uk/johnr/?p=1825 What technical protocols are projects using to share their resource? and how are they planning on representing their resources in Jorum? This is a post in the UKOER 2 technical synthesis series.

[These posts should be regarded as drafts for comment until I remove this note]

Dissemination protocols

Dissemination protocols in use in the UKOER 2 programme

Dissemination protocols in use in the UKOER 2 programme

The chosen dissemination protocols are usually already built in the platforms in use by projects; adding or customising an RSS feed is possible but often intricate and adding an OAI-PMH feed is likely to require substantial technical development. DelOREs investigated existing OAI-PMH plugins for WordPress they could use but didn’t find anything usable within their project.

As will be discussed in more detail when considering Strand C – RSS is not only the most supported dissemination protocol, from the programme’s evidence, it is also the most used in building specialist discovery services for learning and teaching materials. The demand for an OAI-PMH interface for learning resources remains unknown. [debate!]

Jorum representation

Methods of uploading to Jorum chosen in UKOER 2 programme

Methods of uploading to Jorum chosen in UKOER 2 programme

  • The statistics on Jorum upload method are denoted expressions of intent – projects and Jorum are still working through these options.
  • Currently RSS upload to Jorum (along with all other forms of bulk upload) is set up to create a metadata record not deposit content.
  • Three of the uploaders using RSS are using the edshare/eprints platform (this platform was successfully configured to deposit metadata in bulk  via RSS into Jorum in UKOER phase 1).
  • Jorum uses RSS ingest as a one-time process – as I understand it it does not revisit the feed for changes or updates [TBC]
  • As far as I know PORSCHE are the only project who have an arranged OAI-PMH based harvest (experimental for Jorum upload under investigation as part of an independent project – [thanks to Nick Shepherd for the update on this HEFCE-funded work: see comments and more information is available on the ACErep blog)]
]]>
http://blogs.cetis.org.uk/johnr/2011/08/26/ukoer-2-dissemination-protocols-in-use-and-jorum-representation/feed/ 2
OER Hackday: initial reflections http://blogs.cetis.org.uk/johnr/2011/04/05/oer-hackday/ http://blogs.cetis.org.uk/johnr/2011/04/05/oer-hackday/#comments Tue, 05 Apr 2011 13:25:37 +0000 http://blogs.cetis.org.uk/johnr/?p=1658 On Thursday and Friday CETIS and UKOLN ran OERHack

Wordle: OERhack1

last time I counted we had a little over 250 tweets.

Once we take ‘OERHack’ and RT out of the picture we see:

Wordle: Oerhack 3

250 tweets isn’t that many for 2 days and ~40 people but that’s cause everyone was busy

Beforehand the event we had some discussions on blogs.

OER Hack day Wiki

Ideas that we outlined but didn’t develop fully:

  • PORSCHE thoughts
  • Document Import/Export Service
  • OER Playlist picker
  • Additions to OERbit
  • Prototyping a new OERca

Things that got hacked (and more or less documented)

  • Extend the utility of WordPress as a host / presentation vehicle for OER collections.
  • Bookmarking tools for OER
  • Generating Paradata from mediawiki pages
  • Hacking a Google CSE for course directories
  • Metadata Extraction Tools
  • SWORD desktop app
  • Email-based deposit plugin for SWORD
  • JS widget to Wookie widget (Jorum and OER Recommender)

These are described in more detail on the wiki as well as some notes about other things we tested and a useful list of wordpress plugins for learning resources. A fuller write-up about the event will be forthcoming but we (CETIS OERTIG, UKOLN DEVCSI, and everyone else) had a productive, fun, and busy two days.

]]>
http://blogs.cetis.org.uk/johnr/2011/04/05/oer-hackday/feed/ 1
RSS for deposit, Jorum and UKOER: part 2 commentary http://blogs.cetis.org.uk/johnr/2010/02/04/rss-for-deposit-jorum-and-ukoer-part-2-commentary/ http://blogs.cetis.org.uk/johnr/2010/02/04/rss-for-deposit-jorum-and-ukoer-part-2-commentary/#comments Thu, 04 Feb 2010 14:23:17 +0000 http://blogs.cetis.org.uk/johnr/?p=812 Following on from part 1 which reviewed Jorum’s requirements for RSS-based deposit, this section synthesises the comments and feedback emerging in response to it.

Community views

In response to the requirements and position papers a number of feeds where submitted for testing and there has been some thoughtful reflection on the issues in the blogs and by email. This is a brief summary of responses to the key issues:

Issues around generating the RSS

Although most platforms in use can easily create RSS feeds and some can create a feed from any search result, it has become clear that, the RSS profile that is created is frequently fixed and does not match the profile requested by Jorum (which is very similar to the profile suggested by OCWC).

Irrespective of repository software of other OER management ‘platforms’ in use, adjusting the RSS output profile has proved to be a non-trivial task. Emerging issues in adjusting the RSS outputs include:

  • Users of commercial platforms may have to rely on the company’s developers and development schedule.
  • Open source platforms may require additional local coding or at the least will require adjusting an XLST.
  • It is likely that the RSS output of web 2.0 platforms will simply not be editable.

In all three cases there may also be possible solutions that utilise independent tools, such as Yahoo Pipes, to process the feed after production or create a feed from another interface. However, such an approach to adjust the RSS profile is either still reliant on the information present in the original source feed or is dependent on adding standard profile information or extracting additional information and creating a new feed. See for example http://repositorynews.wordpress.com/2010/01/07/really-not-so-simple-syndication/.

Xml validity

Xpert note that of the 60 feeds they harvest, 5 aren’t valid xml and 20 aren’t valid RSS (see comment on Lorna’s post). It is worth noting that aggregators can and do deal with poorly formed data, however, in the timescale of the programme the kind of manual effort involved in dealing with poorly formed feeds (and the quality of the item metadata they would generate) is not likely to be supportable.

Conclusion

With the exception of some web 2.0 platforms and (potentially) some commercial repositories a revised or reprocessed RSS feed to meet Jorum’s requirements is, in theory, a possibility. However, few projects currently produce RSS meeting Jorum’s required RSS profile. From the project trials thus far, one project has so far been able to conform sufficiently to permit successful harvest. In the context of UKOER, adjusting the RSS profile requires the right congruence of platform(s), skills, and time within the project team. Hence it is unlikely that this solution will work for all projects who need a bulk upload option. In terms of the longer term feasibility of using RSS to facilitate bulk deposit, this which may change over time, particularly if the OCWC profile is adopted more widely.

Issues around metadata

The following issues were noted about the feed content:

  1. There is tentative agreement that it would be good if RSS feeds used a DC namespace where possible and ideally supporting the OCWC profile.
  2. The addition of custom elements (for example for the purposes of tracking OER currency) is not regarded as a good idea.
  3. It cannot be taken for granted that the item identifier in a feed is the same as the identifier of the OER within the platform.
  4. It cannot currently be assumed that the item identifier in a feed is either unique or persistent. This is a critical issue for processing the feed.
  5. There may be multiple feed identifiers for a given OER.
  6. Feeds may contain more than one namespace and / or feed formatting
  7. In a number of repository platforms the identifier supplied frequently points to the splash page rather than the OER itself. This is an issue if the resource itself is to be harvested.
  8. Few feeds have rights information for the items or for the feed itself. Including this information is regarded as good practice.
    1. Feeds that use one of the variants of Creative Commons encoding may allow aggregators and Jorum to provide enhanced services.
    2. Projects should clearly license their feeds and underlying items.
    3. In the last resort projects should state clearly on their site or by telling Jorum what rights and licensing exists in connection to their feeds.
  9. Metadata quality (including completeness) in feeds is variable.

Issues around processing the RSS

There are a number of issues about feed size, currency, updating, identifiers and OER deletion but these depend on whether the service is collecting information to help point to current OERs (like an aggregator) or whether it is seeking to provide a central collection of OERs (a library – even if some of those OERs are actually elsewhere). These distinctions are not clearly made in much of the discussion.

Feed classification:

Pushing everything into one classification seems to negate the classification work done by projects and many projects will be producing OERs with more than one JACS code. This could be addressed by having multiple feeds per platform (either from the platform or by subsequently dividing the feed) but there are potential duplication management issues with multiple feeds.

Feed setup

News feeds are the most common use of RSS or Atom, they typically are limited to a fixed number of most recent items (this does not preclude multiple subject based feeds from a repository as mentioned above). They can, however, contain the entire contents of the repository or all the results for the search term.

If the feed contains the entire contents of the repository (or search) it inevitably becomes very large. Large feeds tend to time out in browsers and can be difficult to ingest (as outlined in Xpert’s paper). However, many aggregator services prefer this approach as it provides a straightforward way to maintain currency, avoid duplication, and not have to consider partial deletion. This is because each time the feeds are polled/ gathered the previous index created by the aggregator is deleted. Only the content currently in the feed is indexed.

Magazine type feeds are the most common form of RSS and are more likely to be the default feed produced by repositories or other platforms ; they are usually small. However, to build an aggregation service or collection based upon them would require items from feeds to be stored in an incrementally built index (i.e. new items from feeds are added to a persistent index that retains their information even after they are no longer present in the feed). This works if there are unique and persistent identifiers for feed items or OERs included in the feed record and OERS do not end up with multiple feed identifiers.

OER currency

I’d suggest that the discussion about how to tell if a given OER has updated is a management and policy question to do with versioning and should be out of scope for this discussion. If a uri/url is provided for an OER, I think subsequent versions of the OER should have different urls as they are different things! There is a difference between an academic’s view of an OER as constantly in flux and a digital asset management perspective which needs a clear notion of the persistence or fixity of an released OER.

Feed currency

The discussion of how often feeds should be polled to check for new items is something which has to be agreed. It will impact on a number of issues and is affected by the type of feeds being consumed (magazine feeds will need to be polled more frequently) and will impact on the performance of the index.

Upload

Jorum have currently indicated that uploading OERs via RSS is out of scope. Upload would probably require some form of persistent and locally unique identifier for each OER to be included in the feed.

Deletion

There are wider questions in connection to deletion from Jorum, but in the context of RSS link deposit, deletion is only an issue if Jorum opts for some form of incremental built index. RSS is not designed to manage the deletion of items.

Overview of combinations of RSS options

I’ve created this table to try to pull together some of the interdependent issues relating to feed processing.

  A B C D E F
  Feed of all OERS Feed of all OERS Subject feed of OERS Subject feed of OERS Magazine feed of OERS Magazine feed of OERS
Feed size Very Big Very Big ‘Medium’ ‘Medium’ Small Small
Update Replace Incremental addition Update Incremental addition Incremental addition Update
Coverage Whole current collection Whole cumulative collection Current subject collection Cumulative subject collection Whole collection (gradually) Transient snapshot of collection
Deletion occurs as a feed is replaced does not occur automatically occurs as a feed is replaced does not occur automatically does not occur automatically occurs as a feed is replaced
OER Deduplication? Not significant Issue Minor issue Issue Issue Not significant

Other options

OAI-PMH

As a precursor, JorumOpen does not currently act as an OAI harvester so this is a somewhat moot point (Note: the software required for OAI-PMH consumption is distinct from the software needed for harvesting).

Within the programme not all OER producers are using repositories and of those that are not all repositories have OAI-PMH enabled. So, although there is some established practice of harvesting metadata via OAI-PMH it would be at best a partial solution.
There are pieces of software which can add support for OAI-PMH export but adapting and implementing them creates an additional development task for projects.

OAI-PMH harvesting has some built in support for resumption (incremental harvesting of metadata from large repositories) and has some support for record deletion but this is not always well supported.
OAI-PMH harvesting services have a mixed record – a key point of note is that they invariably need time to set up.
OAI-PMH harvesting will face many similar issues to RSS harvesting – in that identifiers will point to splash pages and that resources themselves are not harvested.

SRU

As a precursor: JorumOpen does not support SRU based harvesting so this is another moot point.
Considering SRU would require this functionality in both contributing repositories other platforms and into Jorum. It is, however, not yet widely supported or used. Though some commercial repositories do implement it and there are open source clients to bolt-on to repositories or other platforms. This is requires developer time and is likely to be a partial solution only.

Deposit API?

The current deposit tool is based on third party software MRCUTE which runs through Moodle – as such it cannot easily be adapted by Jorum to provide an API.
Jorum are however, exploring the addition of a SWORD deposit endpoint. Suitable SWORD deposit tools would need to be identified (ie those that can handle the right metadata and cope with something that isn’t a research paper – given the research focus of SWORD tool development these are likely to be some of the less-developed tools ).

]]>
http://blogs.cetis.org.uk/johnr/2010/02/04/rss-for-deposit-jorum-and-ukoer-part-2-commentary/feed/ 2
RSS for deposit, Jorum and UKOER: part 1 review http://blogs.cetis.org.uk/johnr/2010/02/04/rss-for-deposit-jorum-and-ukoer-part-1-review/ http://blogs.cetis.org.uk/johnr/2010/02/04/rss-for-deposit-jorum-and-ukoer-part-1-review/#comments Thu, 04 Feb 2010 14:17:20 +0000 http://blogs.cetis.org.uk/johnr/?p=798 Over the past few months CETIS and Jorum have been discussing approaches to bulk deposit to support the projects in the UKOER programme as they deposit or represent their OERs in Jorum. Based on feedback from projects gathered through our technical reviews of projects, we’ve investigated approaches which might work for the programme.

One option we have investigated is the use of RSS. Gareth Waller from Jorum produced a set of feed requirements and a discussion paper suggesting possible issues with the use of RSS. A number of projects have trialled their feeds and provided feedback on Lorna’s blog post introducing Gareth’s paper and outling the issues. The Xpert project has also produced a briefing paper looking at issues around RSS –based deposit. (Considerations and evaluations of the development of distributed repositories when using RSS aggregation as a submission protocol. By Pat Lockley, The University of Nottingham http://webapps.nottingham.ac.uk/elgg/xpert/files/-1/803/xpert+metadata+final.pdf )

Many thanks to Gareth and Laura from Jorum and everyone else who’s contributed to the discussion thus far. This post is a summary of that discussion and comment about other options suggested.

Please note this conversation is shaped by the constraints of the programme. The discussion below focuses on the relation of a single OER project producing a feed or feeds of resources to contribute to Jorum. Issues of how Jorum addresses and combines data feeds from different projects and provides standardised data are a separate discussion.

RSS

Suitability

Although submission to a repository isn’t the primary purpose of RSS, it does have functionality and features that may make it suitable for such a purpose. The investigation of RSS as an option for submitting content to Jorum began with the observations:

Jorum’s requirements

Jorum produced an outline of their minimum requirements for feed-based ingest and a briefing paper summarizing their current take on issues around RSS for deposit.

Feed format and content

Jorum‘s current requirements are:

  1. RSS version 2.0 feed
  2. At least one element belonging to one of the following namespace directly under the channel element. Metadata for all items must be represented in elements belonging to this namespace.
    1. http://www.imsglobal.org/xsd/imsmd_v1p2
    2. http://ltsc.ieee.org/xsd/LOM
    3. http://purl.org/dc/elements/1.1/
  3. Licence information on each item (in the relevant metadata element). This must contain a v2 Eng & Wales CC licence url e.g.
    1. DC : rights e.g. Licensed under a Creative Commons Attribution – NonCommercial-ShareAlike 2.0 Licence – see http://creativecommons.org/licenses/by-nc-sa/2.0/uk/
    2. IMSMD : rights/description/langstring
    3. LOM: rights/description/string”

Feed processing

Jorum currently processes the feed as follows:

  1. “The feed is *not* continually polled for new content. […] The current functionality simply reads the feed when it is deposited and all the items are created in DSpace. It’s a snapshot in time of that RSS feed. If you add in the same feed again, it will store duplicates.
  2. The physical data of a resource in the feed is not stored in JorumOpen. A link is simply created pointing to the resource as indicated by the RSS feed (the “link” element).”
  3. “The feed MUST be valid XML – if the XML coming back isn’t valid in the first place then we cannot process it (neither can any validator, XML reader etc). ”
  4. “Items within a feed are not auto classified within Jorum. In other words, every item in a feed is stored within a single collection as chosen by the admin user i.e. a top level JACS or LearnDirect classification. Having individual feed for each classification such as the OpenLearn model would ensure that items are classified correctly as these feeds can be deposited separately.”

Possible issues about the use of feeds

In his paper Gareth raises a number of issues and questions including the following:

  1. RSS items need to contain the unique id of the OER
  2. It’s not yet clear how to tell from the feed if an OER has changed or been deleted
  3. Feeds should not contain the whole repository contents
  4. There is the possibility that OERs might fall between arbitrary limits for feed creation (50 most recent items polled everyday misses resources above this number)
  5. The richness of metadata which exists within the platform creating the RSS may be restricted to using subset of the fields they have available by the feed creation process or feed consumption process.
  6. Feed deposit needs to make assumptions about licensing
  7. Current exploration of feed deposit relates only to harvesting metadata, not to harvesting resources.

Part 2 of this post will look at the community responses to this proposal and look at emerging issues.

]]>
http://blogs.cetis.org.uk/johnr/2010/02/04/rss-for-deposit-jorum-and-ukoer-part-1-review/feed/ 3