Self description and licences

One of the things that I noticed when I was looking for sources of UKOERs was that when I got to a resource there was often no indication on it that it was open: no UKOER tag, no CC-license information or logo. There may have been some indication of this somewhere on the way, e.g. on a repository’s information page about that resource, but that’s no good if someone arrives from a Google search, a direct link to the resource, or once someone has downloaded the file and put it on their own VLE.

Naomi Korn, has written a very useful briefing paper on embedding metadata about creative commons licences into digital resources as part of the OER IPR Support project starter pack. All the advice in that is worth following, but please, also make sure that licence and attribution information is visible on the resource as well. John has written about this in general terms in his excellent post on OERs, metadata, and self-description where he points out that this type of self description “is just good practice” which is complemented not supplanted by technical metadata.

So, OER resources, when viewed on their own, as if someone had found them through Google or a direct link, should display enough information about authorship, provenance, etc. for the viewer to know that they are open without needing an application to extract the metadata. The cut and paste legal text and technical code generated by the licence selection form on the Creative Commons website is good for this. (Incidentally, for HTML resources this code also includes technical markup so that the displayed text works as encoded metadata, which has been exploited recently by the OpenAttribute browser addon. I know the OpenAttribute team are working on embedding tools for licence selection and code generation into web content management systems and blogs).

Images, videos and sounds present their own specific problem for including human-readable licence text. Following practice from the publishing industry would suggest that small amounts of text discreetly tucked away on the bottom or side of an image can be enough to help. That example was generate by the Xpert attribution tool from an image of a bridge found on flickr. The Xpert tool will also does useful work for sounds and videos; but for sounds it is also possible to follow the example of the BBC podcasts and provide spoken information at the beginning or end of the audio, and for videos of course one can have scrolling credits at the end.

UKOER Sources

I have been compiling a directory of how people can get at the resources released by the UKOER pilot phase projects: that is the websites for human users and the “interoperability end points” for machines–ie the RSS and ATOM feed URLs, SRU targets, OAI-PMH base URLs and API documentation. This wasn’t nearly as easy as it should have been: I would have hoped that just listing the main URL for each project would have been enough for anyone to get at the resources they wanted or the interoperability end point in a click or two, but that often wasn’t the case.

So here are some questions I would like OER providers to answer by way of self assessment, which will hopefully simplify this in the future.

Does your project website have a very prominent link to where the OERs you have released may be found?

The technical requirements for phase 1 for delivery platforms said:

Projects are free to use any system or application as long as it is capable of delivering content freely on the open web. … In addition projects should use platforms that are capable of generating RSS/Atom feeds, particularly for collections of resources

So: what RSS feeds do you provide for collections of resources and where do you describe these? Have you thought about how many items you have in each feed and how well described they are?

Are your RSS feed URLs and other interoperability endpoints easy to find?

Do your interoperability end points work? I mean, have you tested them? Have you spoken to people who might use them?

While you’re thinking about interoperability end points: have you ever thought of your URI scheme as one? If for example you have a coherent scheme that puts all your OERs under a base URI, and better, provides URIs with some easily identifiable pattern for those OERs that form some coherent collection, then building simple applications such as Google Custom Search Engines becomes a whole lot easier. A good example is how MIT OCW is arranged: all most of the URIs have a pattern http://ocw.mit.edu/courses/[department]/[courseName]/[resourceType]/[filename].[ext] (the exceptions are things like video recordings where the actual media file is held elsewhere).

Call for Papers: Semantic Technologies for Learning and Teaching Support in Higher Education

Our friends at the University of Southampton, Hugh Davis, David Millard and Thanassis Tiropanis (with whom we worked on the SemTech project and who organised a subsequent workshop) are guest editing a Special Section of IEEE Transactions on Learning Technology on Semantic Technologies for Learning and Teaching Support in Higher Education.

Call for papers (pdf) from the IEEE Computer Society. Deadline for submission 1 April 2011.

JISC CETIS OER Technical Mini Projects Call

JISC has provided CETIS with funding to commission a series of OER Technical Mini Projects to explore specific technical issues that have been identified by the community during CETIS events such as #cetisrow and #cetiswmd and which have arisen from the JISC / HEA OER Programmes.

Mini project grants will be awarded as a fixed fee of £10,000 payable on receipt of agreed deliverables. Funding is not restricted to UK Higher and Further Education Institutions. This call is open to all OER Technical Interest Group members, including those outwith the UK. Membership of the OER TIG is defined as those members of oer-discuss@jiscmail.ac.uk who engage with the JISC CETIS technical discussions.

The CETIS OER Mini Projects are building on rapid innovation funding models already employed by the JISC. In addition to exploring specific technical issues these Mini Projects will aim to make effective use of technical expertise, build capacity, create focussed pre-defined outputs, and accelerate sharing of knowledge and practice. Open innovation is encouraged: projects are expected to build on existing knowledge and share their work openly.

It is expected that three projects will be funded in the first instance. If this model proves successful, additional funding may be made available for further projects.

Technical Mini Project Topics
Project 1: Analysis of Learning Resource Metadata Records

The aim of this mini project is to identify those descriptive characteristics of learning resources that are frequently recorded / associated with learning resources and that collection managers deem to be important.

The project will undertake a semantic analysis of a large corpus of educational metadata records to identify what properties and characteristics of the resources are being described. Analysis of textual descriptions within these records will be of particular interest e.g. free text used to describe licence conditions, educational levels and approaches.

The data set selected for analysis must include multiple metadata formats (e.g. LOM and DC) and be drawn from at least ten collections. The data set should include metadata from a number of open educational resource collections but it is not necessary for all records to be from oer collections.

For further background information on this topic and for a list of potential metadata sources please see Lorna’s blog post on #cetiswmd activities

Funding: £10,000 payable on receipt of agreed deliverables.

Project 2: Search Log Analysis

Many sites hosting collections of educational materials keep logs of the search terms used by visitors to the site when searching for resources. The aim for this mini project is to develop a simple tool that facilitates the analysis of these logs to classify the search terms used with reference to the characteristics of a resource that may be described in the metadata. Such information should assist a collection manager in building their collection (e.g. by showing what resources were in demand) and in describing their resources in such a way that helps users find them.

The analysis tool should be shown to work with search logs from a number of sites (we have identified some who are willing to share their data) and should produce reports in a format that are readily understood, for example a breakdown of how many searches were for “subjects” and which were the most popular subjects searched for. It is expected that a degree of manual classification will be required, but we would expect that the system is capable of learning how to handle certain terms and that this learning would be shared between users: a user should not have to tell the system that “Biology” is a subject once they or any other user has done so. The analysis tool should be free to use or install without restriction and should be developed as Open Source Software.

Further information on the sort of data that is available and what it might mean is outlined in my blog post Metadata Requirements from the Analysis of Search Logs

Funding: £10,000 payable on receipt of agreed deliverables.

Project 3: Open Call

Proposals are invited for one short technical project or demonstrator in any area relevant to the management, distribution, discovery, use, reuse and tracking of open educational resources. Topics that applicants may wish to explore include, but are not restricted to: resource aggregations, presentation / visualisation of aggregations, embedded licences, “activity data”, sustainable approaches to RSS endpoint registries, common formats for sharing search logs, analysis of use of advanced search facilities, use of OAI ORE.

Funding: £10,000 payable on receipt of agreed deliverables.

Guidelines

Proposals must be no more than 1500 words long and must include the following information:

  1. The name of the mini project.
  2. The name and affiliation and full contact details of the person or team undertaking the work plus a statement of their experience in the relevant area.
  3. A brief analysis of the issues the project will be addressing.
  4. The aims and objectives of the project.
  5. An outline of the project methodology and the technical approaches the project will explore.
  6. Identification of proposed outputs and deliverables.

Proposals are not required to include a budget breakdown, as projects will be awarded a fixed fee on completion.

All projects must be completed within six months of date of approval.

Submission Dates

In order to encourage open working practices project proposals must be submitted to the oer-discuss mailing list at oer-discuss@jicmail.ac.uk by 17.00 on Friday 8th April. List members will then have until the 17th of April to discuss the proposals and to provide constructive comments. Proposals will be selected by a panel of JISC and CETIS representatives who will take into consideration comments put forward by OER TIG members. Successful bidders will be notified by the 21st of April and projects are expected to start in May and end by 31st October 2011.

Successful bidders will be required to disseminate all project outputs under a relevant open licence, such as CC-BY. Projects must post regular short progress updates and all deliverables including a final report to the oer-discuss list and to JISC CETIS.

We encourage all list members to engage with the Mini Projects and to input comments suggestions and feedback through the list.

If you have any queries about this call please contact Phil Barker at phil.barker@hw.ac.uk

Metadata requirements from analysis of search logs

Many sites hosting collections of educational materials keep logs of the search terms used by visitors to the site who search for resources. Since it came up during the CETIS What Metadata (CETISWMD) event I have been think about what we could learn about metadata requirements from the analysis of these search logs. I’ve been helped by having some real search logs from Xpert to poke at with some Perl scripts (thanks Pat).

Essentially the idea is to classify the search terms used with reference to the characteristics of a resource that may be described in metadata. For example terms such as “biology” “English civil war” and “quantum mechanics” can readily be identified as relating to the subject of a resource; “beginners”, “101” and “college-level” relate to educational level; “power point”, “online tutorial” and “lecture” relate in some way to the type of the resource. We believe that knowing such information would assist a collection manager in building their collection (by showing what resources were in demand) and in describing their resources in such a way that helps users find them. It would also be useful to those who build standards for the description of learning resources to know which characteristics of a resource are worth describing in order to facilitate resource discovery. (I had an early run at doing this when OCWSearch published a list of top searches.)

Looking at the Xpert data has helped me identify some complications that will need to be dealt with. Some of the examples above show how a search phrase with more than one word can relate to a single concept, but in other cases, e.g. “biology 101″ and “quantum mechanics for beginners” the search term relates to more than one characteristic of the resource. Some search terms may be ambiguous: “French” may relate to the subject of the resource or the language (or both); “Charles Darwin” may relate to the subject or the author of a resource. Some terms are initially opaque but on investigation turn out to be quite rich, for example 15.822 is the course code for an MIT OCW course, and so implies a publisher/source, a subject and an educational level. Also, in real data I see the same search term being used repeatedly in a short period of time: I guess an artifact of how someone paging through results is logged as a series of searches: should these be counted as a single search or multiple searches?

I think these are all tractable problems, though different people may want to deal with them in different ways. So I can imagine an application that would help someone do this analysis. In my mind it would import a search log and allow the user to go through search by search classifying the results with respect to the characteristic of the resource to which the search term relates. Tedious work, perhaps, but it wouldn’t take too long to classify enough search terms to get an adequate statistical snap-shot (you might want to randomise the order in which the terms are classified in order to help ensure the snapshot isn’t looking at a particularly unrepresentative period of the logs). The interface should help speed things up by allowing the user to classify by pressing a single key for most searches. There could be some computational support: the system would learn how to handle certain terms and that this learning would be shared between users. A user should not have to tell the system that “Biology” is a subject once they or any other user has done so. It may also be useful to distinguish between broad top-level subjects (like biology) and more specific terms like “mitosis”, or alternatively to know that specific terms like “mitosis” relate to the broader term “biology”: in other words the option to link to a thesaurus might be useful.

This still seems achievable and useful to me.

The modern art of metadata

At a meeting relating to the Resource Discovery Task Force the other day, Caroline Williams compared the variety of resource description metadata standards and profiles in Libraries, Archives, Museums and beyond as being like a Jackson Pollock picture.
jacksonpollock
She wouldn’t call it a mess (nor would I), but I think we could agree that it’s a bit chaotic.

That got me wondering what the alternatives might be…

Yves Klein?
Monochrome bleu sans titre [Untitled blue monochrome], 1960
Everything the same shade of blue. No, not achievable even if that is what you want.

How about Piet Mondrian?
piet_mondrian1
Nice and neat, but compartmentalized, and quite few empty compartments.

Or Picasso,
picasso
trying to fit several perspectives into one picture with the result that… well, two noses.

Actually Caroline had the answer when I asked her what she would like to see,

Bridget Riley
June, 1992-2002
Diversity, but working together.

Some downside to OER?

As part of my non-CETIS work I occasionally go out to evaluate teaching practice in Engineering for the HE Academy Engineering Subject Centre. This involves me going to some University and talking to an Engineering lecturer and some students about an approach they are using for teaching and learning. I especially enjoy this because it brings me close to the point of education and helps keeps me in touch with what is really happening in universities around the UK. During a recent evaluation the following observations came up which are incidental to what I was actually evaluating but relevant, I think, to UKOER. They concern a couple of points raised, one by the lecturer and one by the students, that reflect genuine problems people might have to OER release.

Part of the lecturer’s approach involves a sequence of giving students some problems to solve each week, and then providing online and face-to-face support for these problems. The online support is good stuff; it’s video screen captures of worked model solutions with pretty good production values. Something like the Khan academy but less rough. It would be great if the this were released as OER, however doing so would compromise the pedagogic strategy that the tutor has adopted. I don’t want to go into the specifics of why this lecturer has adopted this strategy but it in general it may be important that the students try the problems before they look at the support, and this is an interesting example of how OER release isn’t pedagogically neutral.

The point raised by the students concerned to reuse of OERs rather than their release. They really liked what their lecturer had done, and part of what they liked about it was that it was personal. This was important to them not just because it meant that there was an exact fit between the resources and their course but because they took it as showing that the lecturer had taken a deal of time and made a real effort in preparing their course. They were right in that, but they also went on to say that if the lecturer had taken resources from elsewhere, that they themselves could have found, they would have drawn the opposite inference. We may think that the students would be wrong in this, or that their expectations are unrealistic, but what’s important is that they felt that they would have been demotivated had the lecturer reused resources created elsewhere. I think this fits into a wider picture of how reuse of materials affects the relationship between teacher and student.

I’m not claiming that either of these observations are conclusive or in any way compelling arguments against the release of OER, and I will object to anyone claiming that a couple of data points is ever conclusive or compelling, but I did find them interesting.

Important advice on licensing

It frequently comes to the attention of the CETIS-pedantry department that certain among the people with whom we interact, while they have much to say and write that is worth heeding, do not know when to use “licence” and when to use “license”. Those of you who prefer to use US English can stop reading now, unless you’re intrigued by the convolutions of the UK variant of the language: this won’t ever be an issue for you.

It’s quite simple: licence is a noun, it’s the thing; license is a verb, it’s what you do. But how to remember that? Well, hopefully you’ll see that advice is a noun but advise is a verb; similarly device (noun), devise (verb); practice (noun), practise (verb). Words ending –ise are normally verbs[*]. So license/licence sticks to the pattern of c for noun, s for verb.

Hope this helps.

[* OK, you may prefer –ize, which isn’t just for US usage in some cases–but that’s a different rant story]

Hopes and fears for eReaders and eTextBooks

About 15 years ago, when I was first starting to promote the use of resources for “computer aided learning” the message was fairly clear: reading text off a screen is problematic so don’t use computers for this, use them for what they are good at. For me, in physical sciences at that time, they were good at multimedia presentation and the calculations necessary for creating interactive models that allow active engagement with the physics being taught. More generally, computers were good at things that allowed more pedagogically appropriate approaches to teaching and learning.

I’ve been disappointed since then: the most widely adopted applications of technology in teaching and learning are to use them to project presentations instead of transparencies on an OHP, and as VLEs to distribute course info and handouts. In both example the net impact of the computer is to do the same thing in a slightly more convenient way. Now a platform has reached maturity that allows a slightly more convenient way to read books, reproducing the text-on-paper experience. It’s bound to be the next big thing.

So, what to do about this? Admit that in practice technology will enhance learning by making small incremental improvements to established practice? Press for enhanced capability where it will facilitate good pedagogy? Work in anticipation of some revolutionary change driven by factors outwith the HE system?

In the meantime, some relevant stuff elsewhere:

An open e-Textbook usecase, our contribution to the ISO/IEC JTC1 SC36 Study Period on e-Textbooks.

Digital Textbooks, a blog devoted to documenting significant initiatives that relate to any and all aspects of digital textbooks, most notably their use in higher education.

Wolfram assistants the sort of good stuff that could find its way into a digital text book.

Amazon Kindle customer review on the bad stuff: problems with footnotes in academic eTexts.

Google custom search for UKOER

It has become very clear to me over the last week or so that I haven’t done enough to publicise some work done over the summer by my colleague Lisa Scott (Lisa Rogers, as she then was) on showing how you can create a Google Custom Search Engine to search for OER materials. In summary, it’s very easy, very useful, but not quite the answer to all your UKOER discovery problems.

A Google Custom Search Engine (Google CSE) allows one to use Google to search only certain selected pages. The pages to be searched can be specified individually or as URL patterns identifying a part of a site or an entire site. Furthermore the search query can be modified by adding terms to those entered by the user.

The custom search engine can be accessed through a search box that can be hosted on Google or embedded in a web page, blog etc. Likewise, the search results page can be presented on Google or embedded in another site. Embedding of both search box and results page utilises javascript hosted on the Google site.

The pages to be searched can be specified either directly by entering the URL patterns via the Google CSE interface, listed in an XML or TSV (tab separated variable) file which is uploaded to the Google CSE site, or as a feed from any external site. This latter option offers powerful possibilities for dynamic or collective creation of Custom Search Engines, especially since Google provide a javascript snippet which will use the links on a page as the list of URLs to search. So, for example a group of people could have access to a wiki on which they list the sites they wish to search and thus build a CSE for their shared interest, whatever that may be.

A refinement that is sometimes useful is to label pages or sites that are searched. Labels might refer to sub-topics of the theme of the custom search engine or to some other facet such as resource type. So a custom search engine for engineering OERs might label pages as different branches of engineering {mechnical, electronic, civil, chemical, …} or type of resource to be found {presentation, image, movie, simulation, articles, …}. In practice, whatever the categorisation chosen for labels, there will often be pages or sites that mix resource from different categories, so use of this feature requires thought as to how to handle this.

A Google CSE for UKOER
Our example of a simple Google CSE can be found hosted on Google.

This works as a Google search limited to pages at the domains/directories listed below; where the URL pattern doesn’t lead only to content that is UKOER the term ‘+UKOER’ added to the search terms entered by the user. The ‘+’ in the added term means that only those pages which contain the term UKOER are returned. This is possible since the programme mandated that all resources should be associated with the tag UKOER. Each site was labelled so that after searching, the user could limit results to those found on any one site (e.g. just those on Jorum Open) or strand of UKOER. The domains/directories searched are:

* http://open.jorum.ac.uk/
* http://www.vimeo.com/
* http://www.youtube.com/
* http://www.slideshare.net/
* http://www.scribd.com/
* http://www.flickr.com/
* http://repository.leedsmet.ac.uk/main/
* http://openspires.oucs.ox.ac.uk/
* http://unow.nottingham.ac.uk/
* https://open.exeter.ac.uk/repository/
* http://web.anglia.ac.uk/numbers/
* http://www.multimediatrainingvideos.com/
* http://www.cs.york.ac.uk/jbb
* http://www.simshare.org.uk/
* http://fetlar.bham.ac.uk/repository/
* http://open.cumbria.ac.uk/moodle/
* http://skillsforscientists.pbworks.com/
* http://core.materials.ac.uk/search/
* http://www.humbox.ac.uk/

These were chosen as they were known to be used by a number of UKOER projects for disseminating resources. We must stress that these are meant to be illustrative of sites where UKOER resources may be found, they are definitely not intended to be a complete or even a sufficient set of sites.

This is the simplest option, the configuration files are hosted on Google and managed through forms on the Google website. Expanding it to cover other web sites requires being given permission to contribute by the original creator and then adding URLs as required.

Reflections
Setting up this search engine was almost trivially easy. Embedding it in a website is also straightforward (Google provides code snippets to cut and paste).

The approach will only be selective for OERs if those resources can be identified through a term or tag added to the user-entered search query or if it can be selected through a specific URL pattern (including the case where a site is wholly or predominantly OERs). This wasn’t always the case.

Importantly, not all expected results appear, this is possibly because the resources on these sites aren’t tagged as UKOER or may be due to the pages not being indexed by Google. However, sometimes the omission seems inexplicable. For example a search for “dental” limited to the Core materials website on Google yields the expected results the equivalent search on the CSE yields no results.

While hosting the configuration files off-google and editing them as XML files or modifying the programmatically allows some interesting refinement of the approach we found this to be less easy. One difficulty is that the documentation on Google is somewhat fragmented and frequently confusing. Different parts of it seem to have been added by different people and different times, and it was often the case that a “link to more information” about something we were trying to do failed to resolve the difficulty that had been encountered. This was compounded by some unpredictable behaviour which may have been caused by caching (maybe on serving the configuration files, or Google reading them, or Google serving the results), or by delays in updating the indexes for the search engine, which made testing changes to the configuration files difficult. These difficulties can be overcome, but we were unconvinced that there would be much benefit in this case and so concentrated our effort elsewhere.

Conclusions
If it works for the sites you’re interested in, we recommend the simple Google custom searches as very quick method for providing a search for a subset of resources from across a specified range of hosts. We reserve judgement on the facility for creating dynamic search engines by hosting the configurations files on ones own server.