The #cetis10 Locate, Collate and Aggregate extravaganza

Posted on November 8, 2010 by Lorna Campbell

Next week Phil, John and I will be running a session at the JISC CETIS conference with the snappy title Locate Collate and Aggregate. The aim of this session is to explore innovative technical approaches related to, but not confined to, the JISC / HEA OER 2 Programme which are applicable to finding, using and managing content for teaching and learning, including:

Building collections of OERs.
Drawing together information about learning resources
Building rich descriptions from disparate sources of information

We’ve got an eclectic bunch of contributors lined up including David Kay, Sero; Vic Lyte, MIMAS; James Burke, deBurca; Chris Taylor, oErbital; Rob Pearce, Engineering a Lo-Carbon Future; Pierre Far, OCW Search; Pat Lockley, Xpert and some bloke called Phil Barker. Our contributors will be presenting and leading short discussions on a diverse range of topics including cross-silo semantic search opportunities, using mainstream and niche search engines to discover OERs and automatic selection of resources for a UKOER collection.

We’ve also been promised the world premiere of the long awaited dogme masterpiece The Plight of Metadata by acclaimed repository manager and film maker Pat Lockley. Mr Lockley assures us that the film will be “awesome, despite the limited CGI budget.”

So who should attend this Locate, Collate and Aggregate extravaganza? Anyone interested in open content, innovative use and management of teaching and learning resources, techies, geeks, rss wranglers, data miners and even the odd repository manager.

And what do we want? We want ideas! Lots of them! We want ideas, comments and input to other peoples ideas. We’re also looking for ideas for JISC CETIS technical mini-projects we can potentially take forward to run in parallel with the OER 2 Programme.

We’re not quite sure what the outputs of this session will be but we’re aiming to go beyond the boundaries of JISC programmes and domain focussed initiatives and we’re hoping for cross pollination and propagation of innovation ~~throughout the nation~~.

cetiswmd Activities

Posted on October 29, 2010 by Lorna Campbell

Phil has already blogged a summary of last week’s memorably tagged What Metadata or cetiswmd meeting. During the latter part of the meeting we split up to discuss practical tasks and projects that the community could undertake with support from CETIS and JISC to explore the kind of issues that were raised at the meeting. We agreed to draft a rough outline of some of these potential activities and then feed them back to the community for comment and discussion. So if you have any thoughts or suggestions please let us know. CETIS are proposing to set up a task group or working group of some kind to develop this work and to provide a forum to explore technical issues relating to the resource description, management and discovery in the context of open educational resources.

I helped to facilitate the breakout group that focused on what we might be able to achieve by looking at existing metadata collections. Here’s an outline of the activity what we discussed.

Textual Analysis of Metadata Records

A large number of existing collections of metadata records were identified by participants including NDLR, JorumOpen, OU openlearn, US data.gov collections, all of which could be analysed to ascertain which fields are used most widely and how they are described. Clearly this metadata exists in a wide range of heterogeneous formats so the task is not as simple as comparing like with like. The “traditional” way to compare different metadata schema and records is through the use of cross-walks. However developing cross walks is a non-trivial task that in itself requires considerable time and resource.

An alternative approach was put forward by ADL’s Dan Rehak who suggested treating the metadata collections as text, stripping out fields and formatting and running the raw data through a semantic analysis tool such as Open Calais. Open Calais uses natural language processing, machine learning and other methods to analyse documents and find the entities within them. Calais claim to go “well beyond classic entity identification and return the facts and events hidden within your text as well.”

Applying data mining and semantic analysis techniques to a large corpus of educational metadata records would be an interesting exercise in itself but until we attempt such an analysis it’s hard to speculate what it might be possible to achieve with the output data. It would certainly be valuable to compare frequently occurring terms and relationships with an analysis of search web logs to see if the metadata records are actually describing the characteristics that users are searching for.

There was general agreement amongst participants that this would be an interesting and innovative project. Participants felt it would be advisable to start small with a comparison of two or three metadata collections, possibly those of JorumOpen, Xpert and the OU Openlearn before taking this forward further.

One thing I am slightly unsure about regarding this method is that Open Calais identifies the relationship between words but once we strip out the metadata encoding of our sample records this information will be lost. I don’t know enough about how these semantic analysis tools work to know whether this is a problem or if they are clever enough for this not to be an issue. I suppose the only way we’ll find out if the results are sensible or useful is to give it a try!

I’d also be very interested to hear how this approach compares with work being undertaken on a much larger scale by the Digging into Data Challenge projects and Mimas’ Bringing Meaning into Search initiative.

Other Activities

Phil has already summarised the other possible tasks and activities put forward by the other breakout groups which include:

Establishing a common format for sharing search logs.
Identify which fields are used on advanced forms and how many people use advanced search facilities.
Analysis of the relative proportion of users who search and browse for resources and how many people click onwards from the initial resources.
Further development of the search questionnaire used by David Davies. If sufficient responses could be gathered to the same questions this would facilitate meta analysis of the results.
Work with communities around specific repositories and find out what works and doesn’t work across individual platforms and installations.
Create a research question inventory on the CETIS wiki and invite people to put forward ideas.

If anyone has any comments or suggestions on any of the above ideas we’d love to hear from you!

Time travelling at RepoFringe10

Posted on September 10, 2010 by Lorna Campbell

Last week I attended the Repository Fringe in Edinburgh. Unsurprisingly openness was one of the key themes of this year’s event and Tony Hirst kicked off with a typically inspiring keynote on content liberation. Tony suggested that although repositories play an important role in preserving institutional memory they are less good at presenting content as data. In addition, many document formats, such as pdf, lock data within them preventing other people from representing that data in useful and interesting ways. Charts and graphs presented as images are dead data. Open document formats help to liberate content and get the data flowing. And once data has been opened up it can be combined and reconnected in new and interesting ways. In addition to open data open queries enable us to see the assumptions that are embedded in queries and how results and reports have been generated. Tony went on the demonstrate some powerful information processing tools, based on Mendeley, yahoo pipes, tic tocs, YQL and rss that require little or no coding.

Tony also pointed out that most document stores have a structure comprised of how documents relate to each other, but we are not good at making use of that structure. He then demonstrated how Gephi can be used to visualise structures and data clusters across multiple data stores. This presents new ways of navigating the content and can be used to provide topic or facet based browsing on the cheap. Earlier this week Tony demonstrated exactly this kind of data visualisation by using Gephi, yahoo pipes and google custom search to analyse altc-2010 twitter streams.

Tony concluded by reinforcing the importance of Ranganathan’s five laws of library science:

First law: Books are for use
Second Law: Every reader his or her book
Third Law: Every book its reader
Fourth Law: Save the time of the reader
Fifth Law: The library is a growing organism

I am quite sure that I was the only person in the audience that had never come across these laws before but it seems they could apply equally usefully to educational resources, open or otherwise.

Herbert Van de Sompel presented another new and interesting way to access data online with Memento: Timetravel for the Web. Resources on the web change continually over time and Memento uses the http protocol to navigate these resources through time by linking current resources and archived resources. Memento used a “timegate” to redirect uri’s to a time specific version of a web page. The Memento tool suite includes the MementoFox plug-in that allows you to set a datetime to navigate the Web with. For MediaWiki servers there is the Memento plug-in that supports responding to datetime content negotiation requests issued by clients and for Wayback Archives a plugin that adds Memento Timegate and TimeBundle support. I’m a big fan of the internet archives and Memento looks like it could be an invaluable tool for browsing old web resources. Before Herbert had finished his presentation a large number of Fringe delegates had installed the MementoFox plug in and were enthusiastically browsing back backwards and forwards^* in time. This despite that fact that, as Herbert informed us, the first journal paper on Memento had been rejected with the comment along the lines of “who would be interested in looking at old websites”

Time travel of a different kind was in evidence during a round table discussion on teaching and learning resources which spiralled back in time into “what’s the definition of a learning object” territory. It became apparent that learning objects were being conflated with the specific standards often used to described and package them, specifically IEEE LOM and IMS Content Packaging. Open educational resources were put forward as a more useful and viable alternative to learning objects. I’m inclined to think that both learning objects and open educational resources can take any form and that the main distinction between the two is the presence of an open license for the latter. However the round table did give us an opportunity to talk to Patrick Sweeney of the University of Southampton about their repository usage stats which could potentially reveal some very useful information about how users search for teaching and learning resources. This is exactly the kind of data we hope to explore at the forthcoming CETIS What metadata is really useful? or CETISWMD event in October.

We also had the opportunity for another interesting conversation with Herbert Van de Sompel regarding the use of OAI ORE to aggregate open educational resources. This is something I’ve been curious about for a while, as there has been little exploration of the use of ORE in the context of educational materials. It seems to me that the resources produced by the JISC HEA OER Programmes, which are scattered all over the web and accompanied by highly variable metadata would provide an interesting usecase for OAI ORE. Herbert was able to point us towards one promising example of OAI ORE implementation in an educational context. Unlocking the Archive is a project based at the Jewish Women’s Archive which has developed an OAI-ORE presentation tool. As far as I can gather the aim of the project as to produce a powerpoint like web based tool which will allow users to aggregate any kind of content housed by the archive and view it using any standard web browser. The open source code developed by the project is available from drupal.org. We haven’t had an opportunity to investigate this project further but it certainly sounds like it’s worthy of further exploration in the context of the new OER2 Programme.

Of course the real high point of the entire two day event was Phil Baker’s pecha kucha (20 slides, 20 seconds per slide) presentation “An open and closed case for educational resources.” I may be biased, but I thought it was an excellent presentation and I was astonished that Phil could talk that fast! You can enjoy a video of the fast version of Phil’s session here and a rather more leisurely write up of the same on his blog.

All in all I found RepoFringe10 a very interesting and thought provoking event. I went to Repository Fringe last year and came away rather frustrated as I found that the event was trying so hard not to be a conference that the format actually got in the way of what could have been an opportunity for really interesting discussion. Not so this year, the event had a good balance of presentations, roundtable discussions, some very slick pecha kucha sessions and there was plenty of opportunity for networking and discussion. Well done guys!

* Okay you can’t really browse forward in time, I made that bit up

Then and Now

Posted on April 16, 2010 by Lorna Campbell

A position paper for the ADL Repositories and Registries Summit by Lorna M. Campbell, Phil Barker and R. John Robertson

Between 2002 and 2010 the UK Joint information Systems Committee (JISC)¹ funded a wide range of development programmes with the aim of improving access within the UK Further and Higher Education (F/HE) sector to content produced by F/HE institutions and to establish policies and technical infrastructure to facilitate its discovery and use. The Centre for Educational Technology and Interoperability Standards (CETIS)² is a JISC innovation support centre that provides technical and strategic support and guidance to the JISC development programmes and F/HE sector. CETIS contributed to scoping the technical requirements of the programmes summarised here.

Programmes such as Exchange for Leaning (X4L, 2002 – 2006)³ focused on the creation of reusable learning resources and tools to facilitate their production and management while Re-purposing and Re-use of Digital University-level Content⁴ (RePRODUCE, 2008 – 2009) aimed to encourage the re-use of high quality externally produced materials and to facilitate the transfer of learning content between institutions. At the same time the Digital Repositories⁵ (2005 – 2007) and Repositories Preservation Programmes⁶ (2006 – 2009) focused on establishing technical infrastructure within institutions and across the sector.

These programmes were informed by a strategic and technical vision which was expressed through initiatives including the e-Learning Framework⁷, the e-Framework⁸, the Information Environment Technical Architecture⁹ and the Digital Repositories Roadmap¹⁰. The IE Architecture for example sought to “specify a set of standards and protocols intended to support the development and delivery of an integrated set of networked services that allowed the end-user to discover, access, use and publish digital and physical resources as part of their learning and research activities.”

These programmes and initiatives have met with varying degrees of success across the different sectors of the UK F/HE community. The rapid growth in the number of open access institutional repositories of scholarly works including both journal papers and e-theses may be attributed directly to the impact of JISC funding and policy. The number of open access institutional repositories has approximately doubled since 2007 to 172¹¹ currently . Arguably there has been less success supporting and facilitating access to teaching and learning materials. Although the number of repositories of teaching and learning materials is growing slowly, few institutions have policies for managing these resources. Indeed one of the final conclusions of the Repositories and Preservation Programme Advisory Group, which advised the JISC repositories programmes, was that teaching and learning resources have not been served well by the debate about institutional repositories seeking to cover both open access to research outputs and management of teaching and learning materials as the issues relating to their use and management are fundamentally different¹². The late Rachel Heery also commented that greater value may be derived from programmes that focus more on achieving strategic objectives (e.g. improving access to resources) and less on a specific technology to meet these objectives (e.g. repositories). In addition the findings of the RePRODUCE Programme¹³ suggested that projects had significantly underestimated the difficulty of finding high quality teaching and learning materials that were suitable for copyright clearance and reuse.

Rather than a radical shift in policy these conclusions should be regarded as reflecting a gradual development in policy, licensing and technology right across the web. This includes the advent of web 2.0, the appearance of media specific dissemination platforms such as slideshare, youtube, flickr, iTunesU, interaction through RESTful APIs, OpenID, OAuth and other web-wide technologies, increasing acceptance of Creative Commons licenses and the rise of the OER movement. As a result there has been a movement away from developing centralised education specific tools services and towards the integration of institutional systems with applications and services scattered across the web. Furthermore there has been growing awareness of the importance of the web itself as a technical architecture as opposed to a simple interface or delivery platform.

These developments are reflected in current JISC development programmes where the priority is less on using a particular technology (e.g. repositories) or implementing a particular standard but rather to get useful, useable content out to the UK F/HE community and beyond by what ever means possible. The JISC Higher Education Academy Open Educational Resources Pilot Programme¹⁴ (OER, 2009 – 2010) is a case in point. To illustrate how both strategic policy and technology have developed it is interesting to compare and contrast the 2002 X4L Programme and the current OER Pilot Programme

X4L Programme 2002 – 2006

The X4L programme aimed to explore the re-purposing of existing content suitable for use in learning. Part of this activity was to explore the process of integrating interoperable learning objects with VLEs. A small number of tools projects were funded to facilitate this task: an assessment management system (TOIA), a content packaging tool (Reload) and a learning object repository (Jorum). Projects were given a strong steer to use interoperability standards such as IMS QTI, IMS Content Packaging, ADL SCORM and IEEE LOM. A mandatory application profile of the IEEE LOM was developed for the programme and formal subject classification vocabularies identified including JACS and Dewey. Projects were strongly recommended to deposit their content in the Jorum repository and institutions were required to sign formal licence agreements before doing so. Access to content deposited content in Jorum was restricted to UK F/HE institutions only.

OER Pilot Programme 2009 – 2010

The aim of the OER Pilot Programme is to make a significant volume of existing teaching and learning resources freely available online and licensed in such a way to enable them to be reused worldwide. Projects may release any kind of content in any format and although projects are encouraged to use open standards where applicable proprietary formats are also acceptable. CETIS advised projects on the type of information they should record about their resources but not how to go about recording it. There is no programme specific metadata application profile and no formal metadata standard or vocabularies have been recommended. The only mandatory metadata that projects were directed to record was the programme tag #ukoer. Projects were given free rein to use any dissemination platform they chose provided that the content is freely available and under an open licence. In addition, projects must also represent their resources in JorumOpen either by linking or through direct deposit. All resources represented in JorumOpen are freely available worldwide and released under Creative Commons licences.

During the course of the OER Pilot Programme CETIS have interviewed all 29 projects to record their technical choices and the issues that have surfaced. This information has been recorded in the CETIS PROD¹⁵ system and has been synthesised in a series of blog posts¹⁶. CETIS is also undertaking additional exploratory work to investigate different methods of aggregating and tracking resources produced by the OER Programme. The contrast between the two programmes is marked and the success or otherwise of the technical approach adopted by the OER Pilot Programme remains to be seen. The programme concludes in April 2010 and a formal programme level synthesis and evaluation is already underway.

References
1. Centre for Educational Technology and Interoperability Standards, CETIS, http://www.cetis.org.uk
2. Exchange for Learning Programme, X4L, http://www.jisc.ac.uk/whatwedo/programmes/x4l.aspx
3. Re-purposing and Re-use of Digital University-level Content Programme, RePRODUCE, http://www.jisc.ac.uk/whatwedo/programmes/elearningcapital/reproduce.aspx
4. Digital Repositories Programme, http://www.jisc.ac.uk/whatwedo/programmes/digitalrepositories2005.aspx
5. Repositories Preservation Programmes http://www.jisc.ac.uk/whatwedo/programmes/reppres.aspx
6. E-Learning Framework, http://www.elframework.org/
7. E-Framework, http://www.e-framework.org/
8. JISC Information Environment Technical Architecture http://www.jisc.ac.uk/whatwedo/themes/informationenvironment/iearchitecture.aspx
9. Digital Repositories Roadmap, http://www.jisc.ac.uk/whatwedo/themes/informationenvironment/reproadmaprev.aspx
10. The Directory of Open Access Repositories, OpenDOAR, http://www.opendoar.org/
11.Exclude Teaching and Learning Materials from the Open Access Repositories Debate. Discuss, http://blogs.cetis.org.uk/lmc/2008/10/27/exclude-teaching-and-learning-materials-from-the-open-access-repositories-debate-discuss/
12. RePRODUCE Programme Summary Report, http://www.jisc.ac.uk/media/documents/programmes/elreproduce/jisc_programme_summary_report_reproduce.doc
13. JISC Academy Open Educational Resources Pilot Progamme, http://www.jisc.ac.uk/whatwedo/programmes/elearning/oer.aspx
14. CETIS PROD, monitoring projects, software and standards, http://prod.cetis.org.uk/query.php?theme=UKOER
15. John’s JISC CETIS Blog, http://blogs.cetis.org.uk/johnr/category/ukoer/
16. OER Synthesis and Evaluation Project, http://www.caledonianacademy.net/spaces/oer/

JISC Persistent Identifiers Meeting: Teaching and Learning Materials

Posted on February 9, 2010 by Lorna Campbell

During the second half the JISC Persistent Identifiers Meeting participants split into five groups to discuss identifier requirements for the following resource types: research papers, research data, learning materials, cultural heritage, administrative information.

Phil Barker, Matt Jukes, Chris Awre and I composed the small group that discussed teaching and learning materials and these were our conclusions.

Constraints

Much of the discourse of the day did not sit comfortably with the teaching and learning domain. There was an implicit assumption that resources reside in repositories of some kind and are accompanied by quality-controlled metadata.

In reality teaching and learning materials are stored in many different places that can not be regarded as repositories “no matter how big the quotation marks”. These resources tend to be unmanaged and are not persistent.

Learning materials have relationships to many other entities e.g. the concept being learned, educational activities, course instance, individual people and social networks. These entities are poorly understood and modelled and are difficult to identify.

There is still a “craft” view of the process and practice of teaching and consequently there is some resistance to formalising the management of resources and activities.

There is no clearly identifiable lifecycle for teaching and learning materials and frequently no formal mechanism for their management.

Learning materials are “made public” but they are not “published” in the formal sense and metadata is often poor or non existent.

Use Cases

Composite objects – learning materials are frequently composite objects that may be ordered in one or more ways. Identifiers need to be able to identify the component parts, specify the order and potentially also to recompose and reorder them.

Open educational resources – once resources are released under an open license there are likely to be multiple different copies, formats and versions all over the place. How do you express relationship between these multiple entities?

Resource / course relationship – what is the relationship between learning materials and concepts such as educational activity or educational activity? It is notoriously difficult to assign an educational level to a learning resources but it is often much easier to assign an educational level to a course. Is it possible to extrapolate from the course to the resource?

Drivers

Institutions are beginning to recognise that learning materials are valuable for the core business of higher education, i.e. teaching and learning; and that it may be beneficial to manage them for quality and efficiency gains.

The OER movement may be a significant driver for futher work in this area.

What approaches are being used at present?

There is no clearly identifiable workflow behind the use of learning materials. The url of a learning resource tends to become its identifier and is dependant on where the resource is stored e.g. vle, repository, slideshare. Clearly however the url refers to a specific instantiation of a resource in a specific location.

There is very little in the way of established practice in terms of management and identification of teaching and learning materials. Everything in flux. In the terminology of the Repository Ecology report things are still a “mess.” A mess being:

“a complex issue that is not well formulated or defined”

Issues regarding sustainability and scalability

Do teaching and learning materials actually need to persist? There are usecases for persistence e.g. non-repudiation. Also teachers have to be confident that a resource will be there next time they need to use it.

Does it actually matter if resources are scattered all over the place with metadata that is poor to nonexistent?

And finally…
…if you know the answer to that last question please comment below!

JISC Persistent Identifiers Meeting: General Discussion

Posted on February 9, 2010 by Lorna Campbell

Last week I attended a very productive and unusually amicable meeting on identifiers run by JISC and ably facilitated by Chris Awre. Besides their obvious critical relevance, my interest in identifiers goes back to an international symposium on the topic that CETIS hosted way back in 2003. That particular event generated a voluminous report and a series of usecases that I believe are still relevant today. The Digital Curation Centre ran a subsequent identifiers event in 2005 which presented various identifier technologies, a series of case studies and sparked considerable debate. I was interested to attend last week’s meeting to see how the debate regarding identifier requirements and technologies had moved forward given the significant developments of the intervening years, including Web 2.0, social networking, and OER.

And you know what? I think the debate has matured significantly. There was much greater acceptance that one size will never fit all, that there will always be multiple technologies to choose from, that choice of identifier scheme frequently depends on choice of technology platform (e.g. if you run DSpace you will use Handles) and that the technology is the easy part to solve. Previous identifier events tended to degenerate into holy wars but there was admirably little crusading evident last week. Although there was some flak flying around on the back channel.

I was slightly frustrated that, as usual, much of the debate focused implicitly on scholarly works and a particular form of “publication”. However there was much that was of relevance to the teaching and learning domain too. Here are some of the statement from the event that I would endorse:

Chris Awre, University of Hull

The emphasis on identifiers themselves can be distracting, it’s better to focus on the role and purpose of identifiers.

Identifying digital content at different phases of its lifecycle is key to the management of that content.

Identifiers need to have an associated meaning. An identifier is only an identifier if it is associated with a thing, otherwise it is just a string.

Identifiers need to disambiguate what they are identifying.

Henry Thompson, University of Edinburgh

Any naming schemes for sharing on the web are only as good as the services behind them.

Persistence of activity is critical, not persistence of technology. There are no purely technical solutions to vulnerabilities.

The only naming scheme of any technical sophistication is the Linean taxonomic scheme. (!)

Make it easy for ordinary users to mint good URIs.

Les Carr, University of Southampton

Persistence of URIs can be made difficult by institutions view of the web purely as a marketing tool.

Bas Cordewener, SURF Foundation

DOI is the only system that has a business model, but it can be expensive for repositories to implement.

Commercial influences should be kept at bay but we need to recognise that there are many different systems meeting different requirements.

Hugh Glaser, University of Southampton

Authority is established not bestowed.

Conclusion and JISC Interventions

The general conclusion of this event was that technology is not the problem, sufficient infrastructure already exists and one size will never fit all.

There was some debate regarding appropriate JISC interventions in this space but there was some consensus that JISC could usefully work with bodies such as UCISA, SCONUL and the Research Councils to provide advice on policy and business cases illustrating the appropriate use of identifiers. Case studies and demonstrators that situate solutions in context, articulate specific workflows and promote good practice in managing identifiers would also be of considerable value.

I’ll post a second piece shortly summarising the breakout group that focused specifically on identifier requirements within the teaching and learning domain.

OER, RSS and JorumOpen

Posted on December 9, 2009 by Lorna Campbell

Many of you who have an interest in open education resources and the Academy / JISC OER Pilot Programme will already be following the development of JorumOpen. JorumOpen will enable users worldwide to search, browse and download open educational resources deposited by UK Further and Higher Education Institutions and licenced under Creative Commons licence. OER Projects already have access to the JorumOpen deposit tool and the service itself, which is based on a customised version of DSpace will be available to the wider community from the 19th of January 2010.

CETIS have been liaising closely with the Jorum team in order to support the OER Programme’s requirements. One early requirement that has emerged is the need for bulk deposit and a request from some projects that this may be partially fulfilled by ingest of RSS feeds. Sounds simple enough but as with all such requirements the devil is in the detail. And in this case some of the details include which version or RSS to support, how to encode and handle licence information, what metadata to include, how to process said metadata, what is a realistic size for feeds?

In order to kick off a discussion on these and other issues Gareth Waller, Jorum’s Technical Manager (and DSpace wrangler extraordinaire) has written a short white paper titled Issues surrounding feed deposit into institutional repositories which presents the pros and cons of using syndicated feed formats to facilitate deposit into repositories such as JorumOpen. Gareth presents a succinct overview of syndicated feed formats and raises a number of questions relating to: item identification, item updates and deletions, missing items, polling periods, feed formats, metadata formats, local metadata application profiles, handling licences and whether to store links or download resources.

JISC, CETIS and Jorum would like to move towards the development of an agreed RSS profile for open educational resources so we are actively seeking comments from the community on the kind of issues Gareth raises in his paper. If you would like to comment or raise additional issues please post your comments here or on the Jorum Community Bay. You can download Gareth’s paper here or from the Community Bay.

When automatic metadata generation goes bad…

Posted on November 24, 2009 by Lorna Campbell

Or the strange case of Drs E. Embuggerance and H. Feisty.

This has already been reported on several other blogs but it’s too good not to share again. Looks like Google Scholar needs to work on its automatic metadata generation algorithm:

Embuggerance, E., and H. Feisty. 2008. The linguistics of laughter. English Today 1, no. 04: 47-47.

This curious incident was originally reported by Stephen Chrisomalis and subsequently picked up by Language Log. The comments on the latter post are particularly entertaining. Mark Liberman helpfully provides Google Scholar’s BibTex citation

@article{embuggerance2008linguistics,
title={{The linguistics of laughter}},
author={Embuggerance, E. and Feisty, H.},
journal={English Today},
volume={1},
number={04},
pages={47–47},
year={2008},
publisher={Cambridge Univ Press}
}

And goes on to suggest

“Perhaps we should continue the tradition of metonymic names for new linguistic natural kinds, and use embuggerance for cases where the automatic tagging of entities and relations goes astray.”

Over on Stephen Chrisomalis blog the comments have taken a rather different turn and degenerate into a rather thoughtful discussion of the relative merits of Google Scholar and JSTOR and automatic metadata generation vz human indexing. One commentor, Laughingrat, is appalled that any academic would even consider using Google scholar:

“…unless your college or university is extremely underfunded, the school library should have access to high-quality databases which contain records indexed by information professionals rather than unqualified hirelings or, worse, computers.”

Another commentor, Dale, puts forward a robust argument in favour of Google Scholar in particular and automatic metadata generation in general:

“Many in libraries and academia are keen to point out all of the warts in Google Scholar, but are less keen to be so critical of the databases for which they pay. That the MLA Bibliography, for example, is years behind in indexing scores of journals, and has incredibly poor coverage in many non-English languages (despite the International boast in its name) is a little known or explored fact in libraries. Other fee-based databases evince similar flaws (Library Literature, ironically, is one of the worst), but it isn’t nearly as much fun to pick on them as it is to shellac Google.”

The last words also has to go to Dale:

“What it comes down is machine indexing vs. human indexing. I cannot get the image of John Henry out of my mind when I think about this matchup. I think the human indexers can only win by extreme effort, and we all know what happened to poor John.”

Amen to that!

Orders from the Roundtable

Posted on November 13, 2009 by Lorna Campbell

The CETIS conference always strives to address current and cutting edge issues in the domain of education technology, however the OER Technical Roundtable session was arguably more timely than most given that it coincided with a Guardian article on open courseware and open educational resources: Any student, any subject, anywhere.

The session was attended by over thirty participants representing a wide range of projects and initiatives, all of whom brought a plethora of technical issues to the table. These issues were ably captured by my colleague R. John Robertson using some recalcitrant mind mapping software which he is still fighting with. John has already posted the raw list of issues on his blog and on Slideshare.

As expected the range of issues was considerable but the following broad themes did emerge:

Tracking – metrics, Google analytics, statistics to support advocacy.
Usability of repositories – deposit and the role of SWORD, discovery and use.
Streaming large media files.
Licensing and rights encoding.
Resource description – metadata and JorumOpen, portability and interoperability, tagging, automatic metadata generation, identification of derivative works, SEO, Google discovery, how to users search for resources?
Aggregators to manage distributed resources – metadata aggregation, resource aggregation, iTunes & iTunesU, OER broadcasting, batch upload, Flickr & Slideshare APIs.
Granularity – disaggregation and reuse, content packaging, dependencies between resources.

Participants voted with their feet and broke into groups to discuss tracking, resource description, aggregators and granularity. We’ll try to synthesis the outputs of these breakout groups in later blog posts but in the meantime here’s a summary of the potential activities the groups identified that JISC and CETIS could take forward to benefit both the OER Programme and the community more generally:

Develop an agreed RSS / Atom profile for open educational resources.
Undertake research to analyse how teachers and learning actually search for educational resources. What terms do they search for and what metadata is actually necessary to facilite their searches? Synthesise data from projects, including Jorum and Steeple that are already gathering information about search terms, techniques and characteristics.
Investigate how successful commercial systems such as Amazon and iTunes create and manage resource descriptions. What can we learn from them?
Opening access to analytics and anonymised user data. Encourage the sharing of Google analytics data between projects.
Set up shared Piwik or Google analytics accounts for each JISC programme.
Share and synthesise good practice in resource tracking. Record and disseminate case studies.
Identify requirements and minimum recommendations for resource tracking.
Fund mini-projects on esoteric approaches to tracking.

We intend to discuss these recommendations with JISC in the not too distant future with a view to taking some of them forward. Hopefully we’ll be in a position to discuss progress in some if not all of these areas at #cetis10!

Metadata Guidelines for the OER Programme

Posted on March 30, 2009 by Lorna Campbell

Following the HEFCE / JISC / Academy OER Programme Community Briefing day at the end of January I blogged about the programme’s technical and metadata requirements. The successful OER projects haven’t even been announced yet and we’re already receiving enquiries asking for clarification on the resource description requirements briefly outlined.

In line with the innovative pilot nature of the OER Programme JISC have decided to take a new approach to metadata. Rather than mandating a formal application profile based on a single open standard we are instead identifying the type of information that projects must record for the resources they create without mandating how this should be done. Hopefully this will give projects considerably greater flexibility as to how they describe their resources and ultimately we hope that this will result in richer descriptions that are of value to end users. However we do recognise that this freer approach is likely to have some impact on interoperability. We hope to learn a lot from this pilot regarding what works and what doesn’t and where the balance between formal and informal metadata lies.

We can’t provide definitive answers to all the questions that are likely to arise but here are some draft guidelines as a starting point.

Mandatory Metadata

All mandated metadata relates to the resource being described and not to the description of the resource.

Programme tag ukoer
All resources produced as a result of the HEFCE/Academy/JISC OER Programme must be tagged ukoer. Many applications provide a mechanism for adding such tags, however we need to consider how this tag may also be accommodated within LOM and DC metadata.

Title
The title of the resource being described.

Author / owner / contributor (from user profile)
Most systems, be they repositories, vles or applications such as SlideShare, YouTube, etc allow registered users to create a user profile detailing their name and other relevant details. When a user uploads a resource to such a system these details are usually associated with the resource.

Date
This is difficult to define in the context of open educational resources which have no formal publication date. Most applications are likely to record the date a resource is uploaded but it will also be important to record date of creation so users can judge the currency of a resource.

URL
Metadata must include a url that locates the resource being described. This is not as straightforward as it sounds as there are likely to be multiple copies of resources in multiple locations.

Technical Information
Includes file format, file size and other relevant information. Many applications will generate this information automatically.

Recommended Metadata

Language
The language of the resource.

Subject classifications
JISC will not mandate the use of specific subject classifications for the OER Programme. However projects are recommended to use subject classifications that are already being used by their subject and domain communities. JACS is one such example of a subject classification that is widely used in the UK F/HE community. It is not recommended that projects attempt to create new subject classification vocabularies. Further guidance on working with vocabularies will be provided.

Keywords
May be selected from controlled vocabularies or may be free text.

Tags
Tags are similar to keywords. They may be entered by the creator /publisher of a resource and by users of the resource and they are normally free text. Many applications such as flickr, SlideShare and YouTube support the use of tags.

Comments
Are usually generated by users of a resource and may describe how that resource has been used, in what context and whether it’s use was successful or otherwise.

Description
In contrast to comments, descriptions are usually generated by the creator/ publisher of a resource and tend to be more authoritative. Descriptions may provide a wide range of additional information about a resource including information on how it may be used or repurposed.

These guidelines are likely to change and develop as the OER Programme progresses and we learn more about issues specifically relating to the description of open educational resources in distributed environments. If you have any comments on these draft guidelines please feel free to comment here or va the CETIS Metadata SIG list at cetis-metadata@jiscmail.ac.uk.

Lorna Campbell

Cetis Blog

Category Archives: metadata