Where to put your EPUB metadata

Posted on 15/01/2014 by Phil Barker

Even in the knowledge that current mainstream EPUB readers and applications for managing eBooks will most likely ignore all but the most trivial metadata, we still have use cases that involve more sophisticate metadata. For example we would like to use the LRMI alignment object in schema.org to say that a particular subsection of a book can be useful in the context of a specific unit in a shared curriculum.

So, without evaluating pros and cons, starting from the most basic/most common, what are the options? This is a summary takes information from Garrish and Gulling, EPUB 3 Best Practices, OReilly 2013, (which I take to be authoritative and also as an example of best practice with regard to the metadata in the epub file) as well as the EPUB 3.0 Publications and Content Documents specifications. Any comments would be greatly appreciated.

1. Simple Dublin Core

Within the OEPBS directory of an unpacked EPUB3 is the content.opf file. It pretty much equates to the manifest of an IMS Content Package. The top-level element is <package> and <metadata> is a required first child of <package>.

The default metadata vocabulary is the Dublin Core Metadata Element Set (DCMES, simple DC), with prefix dc:. Three elements are mandatory–title, identifier and language–others are optional. For example, in /OEPBS/content.opf

<?xml version=’1.0’ encoding=’UTF-8’?>
<package xmlns:dc="http://purl.org/dc/elements/1.1/ [...]">
    <metadata>
        <dc:identifier>urn:isbn:9781449325299</dc:identifier>
        <dc:title>EPUB 3 Best Practices</dc:title>
        <dc:language>en</dc:language>
        <dc:rights>Copyright © 2013 Matt Garrish and Markus Gylling</dc:rights>
[...]

2 Other metadata schemas

The package element has a prefix attribute that may be used to declare prefixes for metadata schemas other than DCMES. Four vocabularies are reserved, i.e. the prefix may be used without a declaration: dcterms, marc, onix and media (the vocabulary used for EPUB3 media overlays). Example

<dcterms:title>EPUB 3 Best Practices</dcterms:title>

Other vocabularies may be used providing a prefix and a URL in a way so similar to xmlns that is makes you wonder why they didn’t just use xmlns.

<package prefix="prism: http://prismstandard.org/namespaces/basic/3.0/" [...]>

3 the meta element

If used without the refines attribute (see below) the meta element can provide information about the package as a whole, e.g.

<meta property="dcterms:title">EPUB 3 Best Practices</meta>

I have no idea what would be the benefit of this over <dcterms:title>.

4 Refining metadata elements: id attribute and the meta element

The id attribute can be used to provide an identifier any element in the metadata that it may be refined. One example of this is mandatory, i.e. that one occurrence of the dc:identifier element must be the publication identifier:

<dc:identifier id="pub-identifier">urn:isbn:9781449325299</dc:identifier>

In general the refinements are described using the meta element with the refines attribute and a property attribute that specifies the nature of the refinement. It’s kind of like RDF reification. The default vocabulary for the property attribute includes “file-as” – an alternative string for a name to be used when filing, “identifier-type” – a way to distinguish between different identifiers, “meta-auth” – the authority for a given instance of metadata, “title-type” – which of the six forms of title being provided.

<dc:creator id="1234">Matt Garrish</dc:creator>
<meta refines="#1234" property="file-as" id="5678">Garrish, Matt</meta>
<meta refines="#1234" property="role">Author</meta>

Terms from other vocabularies may be used for “property” so long as a prefix is declared.

Refinements may have ids and so may be refined.

<meta refines="#5678" property="meta-auth">Phil Barker</meta>

So and so you can make statements about your metadata statements to you heart’s content (though including the whole of the linked data graph in each epub would be silly).

The scheme attribute may be used to identify the controlled vocabulary from which the meta element’s value is drawn. For example, if the identifier is a DOI (which in onix is apparently entry 06 of codelist 5) you can have

<dc:identifier id="pub-id">urn:doi:10.1016/j.iheduc.2008.03.001 </dc:identifier>
<meta refines="#pub-id"
      property="identifier-type"
      scheme="onix:codelist5">06</meta>

Or, using the marc relator value Aut to specify author

<meta refines="#1234" property="role" scheme="marc:relators">Aut</meta>

5 Sub-package level metadata

The id attribute may be used to provide an identifier of an subelement of <package> or any element in the XHTML content documents, down to a span element around a phrase, word or character. So a chapter may have id=”chap1″ then we can use meta elements in the metadata to describe it seperately from the rest of the epub

<meta refines="#chap1" property="prism:contentType">bookChapter<meta>

6 Links to metadata records

The link element is an optional, repeatable subelement of <metadata>, “used to associate resources with a publication, such as metadata records” The metadata may be within package or anywhere on the www.
Example

<link rel="marc21xml-record" href="pub/meta/nor-wood-marc21.xml" />
<link refines="#chap1" rel="ex:schema_org-record"
      media-type="application/ld+json"
      href="http://example.org/nor-wood-lrmi.json" />

Metadata embedded in the XHTML5 content

As far as I can see the EPUB3 specs are mute on metadata in HTML of the content documents, e.g. as html:meta elements or as microdata or RDFa, there doesn’t seem to be any reason why one should not put metadata here. I wouldn’t expect any EPUB system to look that deeply into the package but it would be a good approach to helping the metadata travel with the resource if the EPUB is disaggregated and passed into a non-EPUB3 CMS.

Embed innovation or implant potential?

Posted on 12/08/2013 by Phil Barker

This thought on etextbooks is an overflow from a conversation I was having on skype with Li and Tore about a workshop aimed at scoping what we would like the etextbooks of the future to look like. We were talking about how the idea of a textbook–its role in teaching and learning and hence (perhaps) its nature–was different in different cultures (Europe, US, Asia) and educational settings (school, higher education), when Tore said something along the lines of “why are are discussing this, shouldn’t we be talking about educational requirements”. Of course we should be talking about educational requirements and how they might be met by technologies such as ebooks, but I think there is more than that. My immediate reply was that by defining an area of interest as “etextbooks” we were implying a continuity with textbooks. I don’t think continuity implies a simple like-for-like replacement because I think the potential for etextbooks is far greater than that for paper textbooks, so moving to etextbooks should radically shift the trajectory of change. But the implication seems to be that etextbooks will pick up where paper text books leave off. That, I think is different from 20 or so years ago when we were talking about how computer based learning (or more recently online courses and technology enhanced learning) marked a step change in how education was delivered. In that case much of the talk was about how technology will radically change education. Even if my characterisation of the two cases as opposing is a bit crude (as it is), it’s worth comparing the two approaches. I’ll do that here, just briefly.

The technology-will-revolutionise-education approach runs the risk of alienating the people who you most need on your side if that revolutionary change is to be an improvement, that is the teachers and students. I remember we used to talk about technology as a Trojan Horse for introducing pedagogic improvement in HE, something that I stopped doing when I went to a presentation where the speaker pointed out that the Trojan Horse was an act of war in the context of a bloody siege, and perhaps that isn’t the way learning technologists should approach teachers. More importantly, introducing technology probably isn’t the best way to approach improving education. Introducing technology is not straightforward, it will take attention away from other matters: whatever the initial intent, it will distract from thinking about teaching and learning. If you want to improve education you should focus on that and probably not do something else that is really difficult in it’s own right at the same time.

So the start-with-something-familiar approach has an advantage here in that it simply focuses on planting a technology with higher potential into existing practice. The risk is that substiution is seen as as all that needs to be done, or that requirements that arise from this objective are over prioritised. For example, I have seen requirements for page-faithful display (i.e. the ability to reproduce on the ebook reader exactly what would be on paper) and page numbers as requirements for etextbooks. They may be desirable for marketing purposes, and there are real functional requirements relating to how content is presented and how it may be referenced, but building-in these restrictions as requirements would, in my view, be a mistake. Let’s have a strategy where we aim to embed but with a view to enhancing.

A triangle of objectives for etextbook technology; from the bottom: cost, availability, portability, functionality, innovation.

The path forward suggested for the US by the Educause/Internet 2 pilot etextbook pilot. Start with a basis aimed at increasing adoption and move forward to improvements in functionality and transformation.
Image from Grajek, Susan, Understanding what higher education needs from e-textbooks: an EDUCAUSE/Internet2 pilot (Research Report), EDUCAUSE July 2013.

I think this is the approach which is suggested by the recent report on the Educause/Internet2 pilots Understanding what higher education needs from e-textbooks, summarised in the image on the right. I must admit that I find this somewhat depressing, I am interested in getting to the peak of that pyramid as quickly as possible, but I would rather get there with teachers and learners than to be touting some theoretical improvement that is divorced from real teaching and learning. And of course, it’s important to be thinking from the outset what functionality and innovation should be built once the technology is in people’s hands.

I am presenting a session at Alt-C 2013 entitled Into the Mainstream? New developments in eTextBooks next month where I hope to discuss ideas like this.

Book now available. Into the Wild – Technology for Open Educational Resources

Posted on 21/03/2013 by Phil Barker

Into the Wild (Book cover)

With great pleasure and more relief I can now announce the availability of Into the wild – technology for open educational resources, a book of our reflections on the technology involved in three years of the UK OER Programmes.

From the blurb:

Between 2009 and 2012 the Higher Education Funding Council funded a series of programmes to encourage higher education institutions in the UK to release existing educational content as Open Educational Resources. The HEFCE-funded UK OER Programme was run and managed by the JISC and the Higher Education Academy. The JISC CETIS “OER Technology Support Project” provided support for technical innovation across this programme. This book synthesises and reflects on the approaches taken and lessons learnt across the Programme and by the Support Project.

This book is not intended as a beginners guide or a technical manual, instead it is an expert synthesis of the key technical issues arising from a national publicly-funded programme. It is intended for people working with technology to support the creation, management, dissemination and tracking of open educational resources, and particularly those who design digital infrastructure and services at institutional and national level.

You may remember Lorna writing back in August that Amber Thomas, Martin Hawksey, Lorna and I had written 90% of this book together in a Book Sprint. Well, the last 10% and the publication turned in to a bit of a marathon-relay, something about which I might write some time, but now the book is available in a variety of formats:

If you want glossy-covered paperback, then you can order it print-on-demand from Lulu (£3.36); if you’re not so fussed about the glossy cover and binding then there is a print-quality pdf you can print yourself.
If you have an ePub reader you can download, there is a free download of an epub2 file.
If you have a Kindle, you can download the .mobi file and transfer it, or if you prefer the convenience of Amazon’s distribution over whisper-net you can buy it from them (77p, they don’t seem to distribute for free unless you agree to give them exclusive rights for all electronic formats).
finally, if you prefer your ebook reading as PDFs, there is one of those too.

All varieties are free or at minimum cost for the distribution channel used; the content is cc-by licensed and editable versions are available if you wish to remix and fix what we’ve done.

Available via the Cetis publications site.

eTextBooks Europe

Posted on 21/01/2013 by Phil Barker

I went to a meeting for stakeholders interested in the eTernity (European textbook reusability networking and interoperability) initiative. The hope is that eTernity will be a project of the CEN Workshop on Learning Technologies with the objective of gathering requirements and proposing a framework to provide European input to ongoing work by ISO/IEC JTC 1/SC36, WG6 & WG4 on eTextBooks (which is currently based around Chinese and Korean specifications). Incidentally, as part of the ISO work there is a questionnaire asking for information that will be used to help decide what that standard should include. I would encourage anyone interested to fill it in.

The stakeholders present represented many perspectives from throughout Europe: publishers, publishing industry specification bodies (e.g. IPDF who own EPUB3, and DAISY), national bodies with some sort of remit for educational technology, and elearning specification and standardisation organisations. I gave a short presentation on the OER perspective.

Many issues were raised through the course of the day, including (in no particular order)

Interactive and multimedia content in eTextbooks
Accessibility of eTextbooks
eTextbooks shouldn’t be monolithic and immutable chunks of content, it should be possible to link directly to specific locations or to disaggregate the content
The lifecycle of an eTextbook. This goes beyond initial authoring and publishing
Quality assurance (of content and pedagogic approach)
Alignment with specific curricula
Personalization and adaptation to individual needs and requirements
The ability to describe the learning pathway embodied in an eTextbook, and vary either the content used on this pathway or to provide different pathways through the same content
The ability to describe a range IPR and licensing arrangements of the whole and of specific components of the eTextbook
The ability to interact with learning systems with data flowing in both directions

If you’re thinking that sounds like a list of the educational technology issues that we have been busy with for the last decade or two, then I would agree with you. Furthermore, there is a decade or two’s worth of educational technology specs and standards that address these issues. Of course not all of those specs and standards are necessarily the right ones for now, and there are others that have more traction within digital publishing. EPUB3 was well represented in the meeting (DITA is the other publishing standard mentioned in the eTernity documentation, but no one was at the meeting to talk about that) and it doesn’t seem impossible to meet the educational requirements outlined in the meeting within the general EPUB3 framework. The question is which issues should be prioritised and how should they be addressed.

Of course a technical standard is only an enabler: it doesn’t in itself make any change to teaching and learning; change will only happen if developers create tools and authors create resources that exploit the standard. For various reasons that hasn’t happened with some of the existing specs and standards. A technical standard can facilitate change but there needs to a will or a necessity to change in the first place. One thing that made me hopeful about this was a point made by Owen White of Pearson that he did not to think of the business he is in as being centred around content creation and publishing but around education and learning and that leads away from the view of eBooks as isolated static aggregations.

For more information keep an eye on the eTernity website

Text and Data Mining workshop, London 21 Oct 2011

Posted on 21/10/2011 by Phil Barker

There were two themes running through this workshop organised by the Strategic Content Alliance: technical potential and legal barriers. An important piece of background is the Hargreaves report.

The potential of text and data mining is probably well understood in technical circles, and were well articulated by JohnMcNaught of NaCTeM. Briefly the potential lies in the extraction of new knowledge from old through the ability to surface implicit knowledge and show semantic relationships. This is something that could not be done by humans, not even crowds, because of volume of information involved. Full text access is crucial, John cited a finding that only 7% of the subject information extracted from research papers was mentioned in the abstract. There was a strong emphasis, from for example Jeff Lynn of the Coalition for a digital economy and Philip Ditchfield of GSK, on the need for business and enterprise to be able to realise this potential if it were to remain competetive.

While these speakers touched on the legal barriers it was Naomi Korn who gave them a full airing. They start in the process of publishing (or before) when publishers acquire copyright, or a licence to publish with enough restriction to be equivalent. The problem is that the first step of text mining is to make a copy of the work in a suitable format. Even for works licensed under the most liberal open access licence academic authors are likely to use, CC-by, this requires attribution. Naomi spoke of attribution stacking, a problem John had mentioned when a result is found by mining 1000s of papers: do you have to attribute all of them? This sort of problem occurs at every step of the text mining process. In UK law there are no copyright exceptions that can apply: it is not covered by fair dealling (though it is covered by fair use in the US and similar exceptions in Norwegian and Japanese law, nowhere else); the exceptions for transient copies (such as in a computers memory when readng on line) only apply if that copy has not intrinsic value.

The Hargreaves report seeks to redress this situation. Copyright and other IP law is meant to promote innovation not stifle it, and copyright is meant to cover creative expressions, not the sort of raw factual information that data mining processes. Ben White of the British Library suggested an extension of fair dealling to permit data mining of legally obtained publications. The important thing is that, as parliament acts on the Hargreaves review people who understand text mining and care about legal issues make sure that any legislation is sufficient to allow innovation, otherwise innovators will have to move to those jurisdictions like the US, Japan and Norway where the legal barriers are lower (I’ll call them ‘text havens’).

Thanks to JISC and the SCA on organising this event; there’s obviously plenty more for them to do.

Amazon kindle and textbooks

Posted on 10/08/2011 by Phil Barker

Amazon are renting textbooks for the kindle. Over the last couple of months I’ve been using a Kindle. We bought it with the idea of seeing how it might be useful for educational content, eTextbooks at the most basic level, though I’ve already written about my misgivings on that score. Well, we quickly came to the conclusion that the Kindle device wasn’t much good for eTextbooks: no colour, screen refresh too slow for dynamic content, not good for non-linear content (breakout boxes, footnotes, even multiple columns)–sure it displays pdfs and HTML, but it’s difficult to get difficult to get a magnification that works well and navigating around the page is clunky, and it doesn’t do ePub. But it’s fine for novels and there may be some educational utility in the bookmarking, note making and the sharing of these that is possible (though making notes using the kindle isn’t a great user experience). Anyway, given all that it’s interesting to note that the textbooks shown in the Amazon ad I mention above rely on colour, are non-linear, and both of them would be really engaging if the diagrams were interactive or even just animated. Neither of them are being displayed on a Kindle.

[Aside: textbook rental might be an attractive idea for some, but pricing based on a typical rental of 30 days!?]

Important advice on licensing

Posted on 04/02/2011 by Phil Barker

It frequently comes to the attention of the CETIS-pedantry department that certain among the people with whom we interact, while they have much to say and write that is worth heeding, do not know when to use “licence” and when to use “license”. Those of you who prefer to use US English can stop reading now, unless you’re intrigued by the convolutions of the UK variant of the language: this won’t ever be an issue for you.

It’s quite simple: licence is a noun, it’s the thing; license is a verb, it’s what you do. But how to remember that? Well, hopefully you’ll see that advice is a noun but advise is a verb; similarly device (noun), devise (verb); practice (noun), practise (verb). Words ending –ise are normally verbs[*]. So license/licence sticks to the pattern of c for noun, s for verb.

Hope this helps.

[* OK, you may prefer –ize, which isn’t just for US usage in some cases–but that’s a different ~~rant~~ story]

“Marketing” and open educational resources

Posted on 09/03/2009 by Phil Barker

I went to the CETIS Education Content SIG meeting on Open Educational Resources in Milton Keynes at the end of February. I came away with two thoughts about OER and marketing: first about the role of the OER content in marketing courses, second about the need to market the concept of OER in UK HE.
Continue reading →

George Orwell is blogging

Posted on 19/10/2008 by Phil Barker

George Orwell is blogging, so is Samuel Pepys. And quite aside from the content (I’m an Orwell fan, the merits of this content was discussed when the blog was launched here, and here), I think this is brilliant way of putting diaries online as open content[1]. Delivery, at least, relies on software anyone can use for free[2]; you and I can get the text in a machine readable format, HTML and RSS; each entry gets a URI; the entries can be tagged and commented on; locations can be mapped on Google (Orwell, Pepys), other concepts mentioned linked to encyclopaedia entries; the blog owners could, at least in principle, export the whole lot in XML and stick it in a database to process, and anyone can process entries with text mining software or by setting up a Google custom search engine or . . . .

The two examples above are slightly short of perfect. I like to see the dates for the blog entries matching the dates for the diary entries (the Pepys diaries do this, Orwell managed it at first, but then slipped). And I think it would make more sense if the monthly archives were arranged to be read top-to-bottom in chronological order. Also I wonder if hosting on wordpress.com is the best idea. It has its attractions, but the tags in the Orwell blog link to posts from other blogs which are well out of scope while the Pepys diary has some very interesting customizations; also if the Orwell blog owners do ever find a way to go back to posting against the diary entry date I imagine they would have problems setting up redirects so that links to the current posts still work.

Notes:
1. I guess I should be clear: I’m not saying that these diaries are open content. The Pepys text is from Project Gutenberg, I don’t know the licensing arrangements for other aspects of the blog; Orwell’s text is still copyright in many countries (including the UK and the US), I don’t the licensing arrangements for the blog.
2. The Orwell diary is on WordPress.com; Pepys uses a customized installation of Moveable Type.

Why share?

Posted on 02/09/2008 by Phil Barker

In a comment to a previous post of mine, Gayle reminded me of the point made by the ACETS project:

â€œRe-use is not in itself a good or bad thing and it should not be encouraged or discouraged as a matter of dogma. Rather it should be nurtured and supported where it can provide benefits and not where it will not.â€

So the question we should ask is: when will re-use provide benefits? Here are some links to recent and ongoing work relating to the benefits of sharing, reuse and open content.
Continue reading →

Phil Barker

Cetis Blog

Category Archives: educational content