Licence information in schema.org and LRMI

When the Learning Resource Metadata Initiative (LRMI) technical working group started its work it focused on identifying the properties and relationships that were important for educational resources but could not be adequately expressed using schema.org as it then stood. One of those important pieces of information was the licence under which a resource was released, and so the LRMI spec from the start had the property useRightsUrl  “The URL where the owner specifies permissions for using the resource.” When schema.org adopted most of the LRMI properties, useRightsUrl was an exception, it was not adopted pending further consideration–not surprising really given the wide-ranging applicability of licence information beyond learning resources.

Back in June the good news came that with version 1.6 of schema.org included a license property for Creative Works that does all that LRMI wanted, and more.

What does this mean for LRMI adopters?

Some adopters of LRMI have already started using useRightsUrl.  Such implementations are valid LRMI but not valid schema.org, which means that they will only be understood by applications that have been written specifically to understand LRMI and not by the general purpose web-scale search applications. This is sub-optimal.

In passing, let me mention another complication. With schema.org you have a choice of syntax: microdata and RDFa 1.1 lite. With RDFa there was already a mechanism for identifying a link to a licence, that is rel=”license”.  Just to complicate a little more, RDFa allows name spacing, and the term license appears in at least three widely used namespaces: HTML5, Dublin Core Terms, and the Creative Commons Rights Expression Language–hopefully this will never matter to you.  To exemplify one of these options I’ll use the HTML that you get when you use the Creative Commons License Chooser (but let’s be absolutely clear, what I am writing about applies to any type of license whether the terms be open or commercial):

<a rel="license" href="http://creativecommons.org/licenses/by/4.0/">

The good news is that all these options play nicely together, you can have the best of all worlds.

If you are already using itemprop=”useRightsUrl” to identify the link to a licence using LRMI in microdata, you can also use the license property and rel=”license”. The following  LRMI microdata with a bit RDFa thrown in works:

<html>
  <body itemscope itemtype="http://schema.org/CreativeWork">
    <a itemprop="license useRightsUrl" rel="license"
        href="https://creativecommons.org/licenses/by/4.0/">
        Creative Commons Attribution 4.0 International (CC BY 4.0) licence
    </a>
  </body>
</html>

If you are using LRMI / schema.org in RDFa, then the following is valid

<html>
  <body vocab="http://schema.org/" typeof="CreativeWork">
    <a rel="license useRightsUrl"
       href="https://creativecommons.org/licenses/by/4.0/">
       Creative Commons Attribution 4.0 International (CC BY 4.0) licence
    </a>
  </body>
</html>

License does what LRMI asked for and more

In my opinion the schema.org license property is superior to the LRMI useRightsUrl for a few reasons. It does everything that LRMI wanted by way of identifying the URL of the licence under which the creative work is released, but also:

  • It belongs to a more widely recognised namespace, especially important if you are wanting to generate RDF data
  • I prefer the semantics of the name and definition: a license can include  restrictions of use as well as grant rights and permissions.
  • the range, i.e. the type of value that can be provided, includes Creative Works as well as Urls

That last points allows one to encode the name, url, description, date, accountable person and a whole host of other information about the licence (albeit at the cost of the not being able to do so alongside LRMI’s useRightsUrl quite so simply)

Summary

The inclusion in schema.org of the license property is good news for aims for LRMI. If you use LRMI and care about licensing you should tag the information you provide about the license with it. If you already use LRMI’s useRightsUrl or RDFa’s rel=”license” there is no need to stop doing so.

facebooktwittergoogle_plusredditpinterestlinkedinmail

Who is using LRMI metadata?

facebooktwittergoogle_plusredditpinterestlinkedinmail

One question that we always get asked about LRMI is “who is using it?” There are two sides to this, use by search service providers and use by resource providers, this post touches on the latter.

In phase 2 of the LRMI project, various organizations were given small amounts of money to implement LRMI in their systems and workflows. Those organizations are listed on the Creative Commons web site, and Lorna is in the process of gathering together the lessons they learnt which will be reported back shortly. Perhaps more importantly, at least from the point of view of sustainability, are implementations that arise spontaneously, either by organizations with learning resources to disseminate who make a conscious decision to use LRMI, or those who in using schema.org markup find that one of the properties that LRMI added is appropriate.  Of course no one doing this is under any obligation to inform us of what they are doing, so it is harder to keep track of such use. Fortunately the Google Custom Search Engine Wilbert and I cobbled together  can be used to discover such implementations. It’s a bit hit-and-miss, you need to search for common topics (Math, English) and trawl through the results for new sites,  but it’s better than nothing.

Heads up for HEDIIP

A while back I summarised the input about semantics and academic coding that Lorna and I had made on behalf of Cetis for a study on possible reforms to JACS, the Joint Academic Coding System. That study has now been published.

JACS is mainatained by HESA (the Higher Education Statistics Agency) and UCAS (Universities and Colleges Admissions Service) as a means of classifying UK University courses by subject; it is also used by a number of other organisations for classification of other resources, for example teaching and learning resources. The report (with appendices) considers the varying requirements and uses of subject coding in HE and sets out options for the development of a replacement for JACS.

Of course, this is all only of glancing interest, until you realise that stuff like Unistats and the Key Information Set (KIS) are powered by JACS.
- See more

ebooks 2013

Every year for the past dozen or so years the Department of Information Sciences at UCL have organised a meeting on ebooks. I’ve only been to one of them before, two or three years ago, when the big issues were around what publishers’ DRM requirements for ebooks meant for libraries. I came away from that musing on what the web would look like if it had been designed by publishers and librarians (imagine questions like: “when you lend out our web page, how will you know that the person looking at the screen is a member of your library?”…). So I wasn’t sure what to expect when I decided to go to this year’s meeting. It turned out to be far more interesting than I had hoped, I latched on to three themes of particular interest to me: changing paradigms (what is an ebook?), eTextBooks and discovery.

Changing paradigms

With the earliest printed books, or incunabula, such as the Gutenberg Bible, printers sought to mimic the hand written manuscripts with which 15th cent scholars were familiar; in much the same way as publishers now seek to replicate printed books as ebooks.

With the earliest printed books, or incunabula, such as the Gutenberg Bible, printers sought to mimic the hand written manuscripts with which 15th cent scholars were familiar; in much the same way as publishers now seek to replicate printed books as ebooks.

In the first presentation of the day Lorraine Estelle, chief executive of Jisc Collections, focussed on access to electronic resources. Access not lending; resources not ebooks. She highlighted the problems of using yesterday’s language and thinking as being problematic in this context, like having a “horseless carriage” and buying it hay. [This is my chance to make the analogy between incunabula and ebooks again, see right.] The sort of discussions I recalled from the previous meeting I attended reflect this thinking, publishers wanting a digital copy of a book to be equivalent to the physical book, only lendable to one person at a time and to require replacing after a certain number of loans.

We need to treat digital content as offering new possibilities and requiring new ways of working. This might be uncomfortable for publishers (some more than others) and there was some discussion about how we cannot assume that all students will naturally see the advantages, especially if they have mostly encountered problematic content that presents little that could not be put on paper but is encumbered with DRM to the point that it is questionable as to whether they really own the book. But there is potential as well as resistance. Of course there can be more interesting, more interactive content–Will Russell of the Royal Society of Chemistry described how they have been publishing to mobile devices, with tools such as Chem Goggles that will recognise a chemical structure and display information about the chemical. More radically, there can also be new business models: Lorraine suggested Institutions could become publishers of their own teaching content, and later in the day Caren Milloy, also of Jisc Collections, and Brian Hole of Ubiquity Press pointed to the possibilities of open access scholarly publishing.

Caren’s work with the OAPEN Library is worth looking through for useful information relating to quality assurance in open monograms such as notifying readers of updates or errata. Caren also talked about the difficulties in advertising that a free online version of a resource is available when much of the dissemination and discovery ecosystem (you know, Amazon, Google…) is geared around selling stuff, difficulties that work with EDitEUR on the ONIX metadata scheme will hopefully address soon.

Brian described how Ubiquity Press can publish open access ebooks by driving down costs and being transparent about what they charge for. They work from XML source, created overseas, from which they can publish in various formats including print on demand, and explore economies of scale by working with university presses, resulting in a charge to the author (or their funders) of about £150 for a chapter assuming there is nothing to complex in that chapter.

eTextBooks

All through the day there were mentions of eTextBooks, starting again with Lorraine who highlighted the paperless medic and how his quest to work only with digital resources is complicated by the non-articulation of the numerous systems he has to use. When she said that what he wanted was all his content (ebooks, lecture handouts, his own notes etc.) on the same platform, integrated with knowledge about when and where he had to be for lectures and when he had exams, I really started to wonder how much functionality can you put into an eContent platform before it really becomes a single-person content-oriented VLE. And when you add in the ability to share notes with the social and communication capability of most mobile devices, what then do you have?

A couple of presentations addressed eTextBooks directly, from a commercial point of view. Jenni Evans spoke about Vital Source and Andrejs Alferovs about Kortext both of which are in the business of working with institutions distributing online textbooks to students. Both seem to have a good grasp of what students want, which I think should be useful requirements to feed into eTextBook standardization efforts such as eTernity, these include:

  • ability to print
  • offline access
  • availability across multiple devices
  • reliable access under load
  • integration with VLE
  • integration with syllabus/curriculum
  • epub3 interactive content
  • long term access
  • ability for student to highlight/annotate text and share this with chosen friends
  • ability to search text and annotations

Discovery

There was also a theme of resource discovery running through the day, and I have already mentioned in passing that this referenced Google and Amazon, but also social media. Nick Canty spoke about a survey of library use of social media, I thought it interesting that there seemed to be some sophisticated use of the immediacy of Twitter to direct people to more permanent content, e.g. to engagement on Facebook or the library website.

Both Richard Wallis of OCLC and Robert Faber of OUP emphasized that users tend to use Google to search and gave figures for how much of the access to library catalogue pages came direct from Google and other external systems, not from their own catalogue search interface. For example the Biblioteque Nationale de France found that 80% of access to their catalogue pages cam directly from web search engines not catalogue searches, and Robert gave similar figures for access to Oxford Journals. The immediate consequence of this is that if most people are trying to find content using external systems then you need to make sure that at least some (as much as possible, in fact) of your content is visible to them–this feeds in to arguments about how open access helps solve discoverability problems. But Richard went further, he spoke about how the metadata describing the resources needs to be in a language that Google/Bing/Yahoo understand, and that language is schema.org. He did a very good job distinguishing between the usefulness of specialist metadata schema for exchanging precise information between libraries or publishers, but when trying to pass general information to Google:

it’s no use using a language only you speak.

Richard went on to speak about the Google Knowledge graph and their “things not strings” approach facilitated by linked data. He urged libraries to stop copying text and to start linking, for example not to copy an author name from an authority file but to link to the entry in that file, in Eric Miller’s words to move from cataloguing to “catalinking”.

ebooks?

So was this really about ebooks? Probably not, and the point was made that over the years the name of the event has variously stressed ebooks and econtent and that over that time what is meant by “ebook” has changed. I must admit that for me there is something about the idea of a [e]book that I prefer over a “content aggregation” but if we use the term ebook, let’s use it acknowledging that the book of the future will be as different from what we have now as what we have now is from the medieval scroll.

Picture Credit
Scanned image of page of the Epistle of St Jerome in the Gutenberg bible taken from Wikipedia. No Copyright.

Learning Resource Metadata is Go for Schema

The Learning Resource Metadata Initiative aimed to help people discover useful learning resources by adding to the schema.org ontology properties to describe educational characteristics of creative works. Well, as of the release of schema draft version 1.0a a couple of weeks ago, the LRMI properties are in the official schema.org ontology.

Schema.org represents two things: 1, an ontology for describing resources on the web, with a hierarchical set of resource types each with defined properties that relate to their characteristics and relationships with other things in the schema hierarchy; and 2, a syntax for embedding these into HTML pages–well, two syntaxes, microdata and RDFa lite. The important factor in schema.org is that it is backed by Google, Yahoo, Bing and Yandex, which should be useful for resource discovery. The inclusion of the LRMI properties means that you can now use schema.org to mark up your descriptions of the following characteristics of a creative work:

audience the educational audience for whom the resource was created, who might have educational roles such as teacher, learner, parent.

educational alignment an alignment to an established educational framework, for example a curriculum or frameworks of educational levels or competencies. Expressed through an abstract thing called an Alignment Object which allows a link to and description of the node in the framework to which the resource aligns, and specifies the nature of the alignment, which might be that the resource ‘assesses’, ‘teaches’ or ‘requires’ the knowledge/skills/competency to which the resource aligns or that it has the ‘textComplexity’, ‘readingLevel’, ‘educationalSubject’ or ‘educationLevel’ expressed by that node in the educational framework.

educational use a text description of purpose of the resource in education, for example assignment, group work.

interactivity type The predominant mode of learning supported by the learning resource. Acceptable values are ‘active’, ‘expositive’, or ‘mixed’.

is based on url A resource that was used in the creation of this resource. Useful for when a learning resource is a derivative of some other resource.

learning resource type The predominant type or kind characterizing the learning resource. For example, ‘presentation’, ‘handout’.

time required Approximate or typical time it takes to work with or through this learning resource for the typical intended target audience

typical age range The typical range of ages the content’s intended end user.

Of course, much of the other information one would want to provide about a learning resource (what it is about, who wrote it, who published it, when it was written/published, where it is available, what it costs) was already in schema.org.

Unfortunately one really important property suggested by LRMI hasn’t yet made the cut, that is useRightsURL, a link to the licence under which the resource may be used, for example the creative common licence under which is has been released. This was held back because of obvious overlaps with non-educational resources. The managers of schema.org want to make sure that there is a single solution that works across all domains.

Guides and tools

To promote the uptake of these properties, the Association of Educational Publishers has released two new user guides.

The Smart Publisher’s Guide to LRMI Tagging (pdf)

The Content Developer’s Guide to the LRMI and Learning Registry (pdf)

There is also the InBloom Tagger described and demonstrated in this video.

LRMI in the Learning Registry

As the last two resources show, LRMI metadata is used by the Learning Registry and services built on it. For what it is worth, I am not sure that is a great example of its potential. For me the strong point of LRMI/schema.org is that it allows resource descriptions in human readable web pages to be interpreted as machine readable metadata, helping create services to find those pages; crucially the metadata is embedded in the web page in way that Google trusts because the values of the metadata are displayed to users. Take away the embedding in human readable pages, which is what seems to happen when used with the learning registry, and I am not sure there is much of an advantage for LRMI compared to other metadata schema,–though to be fair I’m not sure that there is any comparative disadvantage either, and the effect on uptake will be positive for both sides. Of course the Learning Registry is metadata agnostic, so having LRMI/schema.org metadata in there won’t get in the way of using other metadata schema.

Disclosure (or bragging)

I was lucky enough to be on the LRMI technical working group that helped make this happen. It makes me vary happy to see this progress.

What could a GPS for learner journeys look like?

Last weekend, a motley crew of designers, students, developers, business and government people came together in Edinburgh to prototype designs and apps to help learners manage their journeys. With help, I built a prototype that showed how curriculum and course offering data can be combined with e-portfolios to help learners find their way.

The first official Scottish government data jam, facilitated by Snook and supported by TechCube, is part of a wider project to help people navigate the various education and employment options in life, particularly post 16. The jam was meant to provide a way to quickly prototype a wide range of ideas around the learner journey theme.

While many other teams at the jam built things like a prototype social network, or great visualisations to help guide learners through their options, we decided to use the data that was provided to help see what an infrastructure could look like that supported the apps the others were building.

In a nutshell, I wanted to see whether a mash-up of open data in open standard formats could help answer questions like:

  • Where is the learner in their journey?
  • Where can we suggest they go next?
  • What can help them get there?
  • Who can help or inspire them?

Here’s a slide deck that outlines the results. For those interested in the nuts and bolts read on to learn more about how we got there.

Where is the learner?

To show how you can map where someone is on their learning journey, I made up an e-portfolio. Following an excellent suggestion by Lizzy Brotherstone of the Scottish Government, I nicked a story about ‘Ryan’ from an Education Scotland website on learner journeys. I recorded his journey in a Mahara e-portfolio, because it outputs data in the standard LEAP2a format- I could have used PebblePad as well for the same reason.

I then transformed the LEAP2a XML into very rough but usable RDF using a basic stylesheet I made earlier. Why RDF? Because it makes it easy for me to mash up the portfolios with other datasets; other data formats would also work. The made-up curriculum identifiers were added manually to the RDF, but could easily have been taken from the LEAP2a XML with a bit more time.

Where can we suggest they go next?

I expected that the Curriculum for Excellence would provide the basic structure to guide Ryan from his school qualifications to a college course. Not so, or at least, not entirely. The Scottish Qualifications Framework gives a good idea of how courses relate in terms of levels (i.e. from basic to a PhD and everything in between), but there’s little to join subjects. After a day of head scratching, I decided to match courses to Ryan’s qualifications by level and comparing the text of titles. We ought to be able to do better than that!

The course data set was provided to us was a mixture of course descriptions from the Scottish Qualifications Authority, and actual running courses offered by Scottish colleges all in one CSV file. During the jam, Devon Walshe of TechCube made a very comprehensive data set of all courses that you should check out, but too late for me. I had a brief look at using XCRI feeds like the ones from Adam Smith college too, but went with the original CSV in the end. I tried using LOD Refine to convert the CSV to RDF, but it got stuck on editing the RDF harness for some reason. Fortunately, the main OpenRefine version of the same tool worked its usual magic, and four made-up SQA URIs later, we were in business.

This query takes the email of Ryan as a unique identifier, then finds his qualification subjects and level. That’s compared to all courses from the data jam course data set, and whittled down to those courses that match Ryan’s qualifications and are above the level he already has.

The result: too many hits, including ones that are in subjects that he’s unlikely to be interested in.

So let’s throw in his interests as well. Result: two courses that are ideal for Ryan’s skills, but are a little above his level. So we find out all the sensible courses that can take him to his goal.

What can help them get there?

One other quirk about the curriculum for excellence appears to be that there are subject taxonomies, but they differ per level. Intralect implemented a very nice one that can be used to tag resources up to level 3 (we think). So Intralect’s Janek exported the vocabulary in two CSV files, which I imported in my triple store. He then built a little web service in a few hours that takes the outcome of this query, and returns a list of all relevant resources in the Intralibrary digital repository for stuff that Ryan has already learned, but may want to revisit.

Who can help or inspire them?

It’s always easier to have someone along for the journey, or to ask someone who’s been before you. That’s why I made a second e-portfolio for Paula. Paula is a year older than Ryan, is from a different, but nearby school, and has done the same qualifications. She’s picked the same qualification as a goal that we suggested to Ryan, and has entered it as a goal on her e-portfolio. Ryan can get it touch with her over email.

This query takes the course suggested to Ryan, and matches it someone else’s stated academic goal, and reports on what she’s done, what school she’s from, and her contact details.

Conclusion

For those parts of the Curriculum for Excellence for which experiences and outcomes have been defined, it’d be very easy to be very precise about progression, future options, and what resources would be particularly helpful for a particular learner at a particular part of the journey. For the crucial post 16 years, this is not really possible in the same way right now, though it’s arguable that its all the more important to have solid guidance at that stage.

Some judicious information architecture would make a lot more possible without necessarily changing the syllabus across the board. Just a model that connects subject areas across the levels, and school and college tracks would make more robust learner journey guidance possible. Statements that clarify which course is an absolute pre-requisite for another, and which are suggested as likely or preferable would make it better still.

We have the beginnings of a map for learner journeys, but we’re not there yet.

Other than that, I think agreed identifiers and data formats for curriculum parts, electronic portfolios or transcripts and course offerings can enable a whole range of powerful apps of the type that others at the data jam built, and more. Thanks to standards, we can do that without having to rely on a single source of truth or a massive system that is a single point of failure.

Find out all about the other great hacks on the learner journey data jam website.

All the data and bits of code I used are available on github