Embed innovation or implant potential?

This thought on etextbooks is an overflow from a conversation I was having on skype with Li and Tore about a workshop aimed at scoping what we would like the etextbooks of the future to look like. We were talking about how the idea of a textbook–its role in teaching and learning and hence (perhaps) its nature–was different in different cultures (Europe, US, Asia) and educational settings (school, higher education), when Tore said something along the lines of “why are are discussing this, shouldn’t we be talking about educational requirements”. Of course we should be talking about educational requirements and how they might be met by technologies such as ebooks, but I think there is more than that. My immediate reply was that by defining an area of interest as “etextbooks” we were implying a continuity with textbooks. I don’t think continuity implies a simple like-for-like replacement because I think the potential for etextbooks is far greater than that for paper textbooks, so moving to etextbooks should radically shift the trajectory of change. But the implication seems to be that etextbooks will pick up where paper text books leave off. That, I think is different from 20 or so years ago when we were talking about how computer based learning (or more recently online courses and technology enhanced learning) marked a step change in how education was delivered. In that case much of the talk was about how technology will radically change education. Even if my characterisation of the two cases as opposing is a bit crude (as it is), it’s worth comparing the two approaches. I’ll do that here, just briefly.

The technology-will-revolutionise-education approach runs the risk of alienating the people who you most need on your side if that revolutionary change is to be an improvement, that is the teachers and students. I remember we used to talk about technology as a Trojan Horse for introducing pedagogic improvement in HE, something that I stopped doing when I went to a presentation where the speaker pointed out that the Trojan Horse was an act of war in the context of a bloody siege, and perhaps that isn’t the way learning technologists should approach teachers. More importantly, introducing technology probably isn’t the best way to approach improving education. Introducing technology is not straightforward, it will take attention away from other matters: whatever the initial intent, it will distract from thinking about teaching and learning. If you want to improve education you should focus on that and probably not do something else that is really difficult in it’s own right at the same time.

So the start-with-something-familiar approach has an advantage here in that it simply focuses on planting a technology with higher potential into existing practice. The risk is that substiution is seen as as all that needs to be done, or that requirements that arise from this objective are over prioritised. For example, I have seen requirements for page-faithful display (i.e. the ability to reproduce on the ebook reader exactly what would be on paper) and page numbers as requirements for etextbooks. They may be desirable for marketing purposes, and there are real functional requirements relating to how content is presented and how it may be referenced, but building-in these restrictions as requirements would, in my view, be a mistake. Let’s have a strategy where we aim to embed but with a view to enhancing.

A triangle of objectives for etextbook technology; from the bottom: cost, availability, portability, functionality, innovation.

The path forward suggested for the US by the Educause/Internet 2 pilot etextbook pilot. Start with a basis aimed at increasing adoption and move forward to improvements in functionality and transformation.
Image from Grajek, Susan, Understanding what higher education needs from e-textbooks: an EDUCAUSE/Internet2 pilot (Research Report), EDUCAUSE July 2013.

I think this is the approach which is suggested by the recent report on the Educause/Internet2 pilots Understanding what higher education needs from e-textbooks, summarised in the image on the right. I must admit that I find this somewhat depressing, I am interested in getting to the peak of that pyramid as quickly as possible, but I would rather get there with teachers and learners than to be touting some theoretical improvement that is divorced from real teaching and learning. And of course, it’s important to be thinking from the outset what functionality and innovation should be built once the technology is in people’s hands.

I am presenting a session at Alt-C 2013 entitled Into the Mainstream? New developments in eTextBooks next month where I hope to discuss ideas like this.

ebooks 2013

Every year for the past dozen or so years the Department of Information Sciences at UCL have organised a meeting on ebooks. I’ve only been to one of them before, two or three years ago, when the big issues were around what publishers’ DRM requirements for ebooks meant for libraries. I came away from that musing on what the web would look like if it had been designed by publishers and librarians (imagine questions like: “when you lend out our web page, how will you know that the person looking at the screen is a member of your library?”…). So I wasn’t sure what to expect when I decided to go to this year’s meeting. It turned out to be far more interesting than I had hoped, I latched on to three themes of particular interest to me: changing paradigms (what is an ebook?), eTextBooks and discovery.

Changing paradigms

With the earliest printed books, or incunabula, such as the Gutenberg Bible, printers sought to mimic the hand written manuscripts with which 15th cent scholars were familiar; in much the same way as publishers now seek to replicate printed books as ebooks.

With the earliest printed books, or incunabula, such as the Gutenberg Bible, printers sought to mimic the hand written manuscripts with which 15th cent scholars were familiar; in much the same way as publishers now seek to replicate printed books as ebooks.

In the first presentation of the day Lorraine Estelle, chief executive of Jisc Collections, focussed on access to electronic resources. Access not lending; resources not ebooks. She highlighted the problems of using yesterday’s language and thinking as being problematic in this context, like having a “horseless carriage” and buying it hay. [This is my chance to make the analogy between incunabula and ebooks again, see right.] The sort of discussions I recalled from the previous meeting I attended reflect this thinking, publishers wanting a digital copy of a book to be equivalent to the physical book, only lendable to one person at a time and to require replacing after a certain number of loans.

We need to treat digital content as offering new possibilities and requiring new ways of working. This might be uncomfortable for publishers (some more than others) and there was some discussion about how we cannot assume that all students will naturally see the advantages, especially if they have mostly encountered problematic content that presents little that could not be put on paper but is encumbered with DRM to the point that it is questionable as to whether they really own the book. But there is potential as well as resistance. Of course there can be more interesting, more interactive content–Will Russell of the Royal Society of Chemistry described how they have been publishing to mobile devices, with tools such as Chem Goggles that will recognise a chemical structure and display information about the chemical. More radically, there can also be new business models: Lorraine suggested Institutions could become publishers of their own teaching content, and later in the day Caren Milloy, also of Jisc Collections, and Brian Hole of Ubiquity Press pointed to the possibilities of open access scholarly publishing.

Caren’s work with the OAPEN Library is worth looking through for useful information relating to quality assurance in open monograms such as notifying readers of updates or errata. Caren also talked about the difficulties in advertising that a free online version of a resource is available when much of the dissemination and discovery ecosystem (you know, Amazon, Google…) is geared around selling stuff, difficulties that work with EDitEUR on the ONIX metadata scheme will hopefully address soon.

Brian described how Ubiquity Press can publish open access ebooks by driving down costs and being transparent about what they charge for. They work from XML source, created overseas, from which they can publish in various formats including print on demand, and explore economies of scale by working with university presses, resulting in a charge to the author (or their funders) of about £150 for a chapter assuming there is nothing to complex in that chapter.


All through the day there were mentions of eTextBooks, starting again with Lorraine who highlighted the paperless medic and how his quest to work only with digital resources is complicated by the non-articulation of the numerous systems he has to use. When she said that what he wanted was all his content (ebooks, lecture handouts, his own notes etc.) on the same platform, integrated with knowledge about when and where he had to be for lectures and when he had exams, I really started to wonder how much functionality can you put into an eContent platform before it really becomes a single-person content-oriented VLE. And when you add in the ability to share notes with the social and communication capability of most mobile devices, what then do you have?

A couple of presentations addressed eTextBooks directly, from a commercial point of view. Jenni Evans spoke about Vital Source and Andrejs Alferovs about Kortext both of which are in the business of working with institutions distributing online textbooks to students. Both seem to have a good grasp of what students want, which I think should be useful requirements to feed into eTextBook standardization efforts such as eTernity, these include:

  • ability to print
  • offline access
  • availability across multiple devices
  • reliable access under load
  • integration with VLE
  • integration with syllabus/curriculum
  • epub3 interactive content
  • long term access
  • ability for student to highlight/annotate text and share this with chosen friends
  • ability to search text and annotations


There was also a theme of resource discovery running through the day, and I have already mentioned in passing that this referenced Google and Amazon, but also social media. Nick Canty spoke about a survey of library use of social media, I thought it interesting that there seemed to be some sophisticated use of the immediacy of Twitter to direct people to more permanent content, e.g. to engagement on Facebook or the library website.

Both Richard Wallis of OCLC and Robert Faber of OUP emphasized that users tend to use Google to search and gave figures for how much of the access to library catalogue pages came direct from Google and other external systems, not from their own catalogue search interface. For example the Biblioteque Nationale de France found that 80% of access to their catalogue pages cam directly from web search engines not catalogue searches, and Robert gave similar figures for access to Oxford Journals. The immediate consequence of this is that if most people are trying to find content using external systems then you need to make sure that at least some (as much as possible, in fact) of your content is visible to them–this feeds in to arguments about how open access helps solve discoverability problems. But Richard went further, he spoke about how the metadata describing the resources needs to be in a language that Google/Bing/Yahoo understand, and that language is schema.org. He did a very good job distinguishing between the usefulness of specialist metadata schema for exchanging precise information between libraries or publishers, but when trying to pass general information to Google:

it’s no use using a language only you speak.

Richard went on to speak about the Google Knowledge graph and their “things not strings” approach facilitated by linked data. He urged libraries to stop copying text and to start linking, for example not to copy an author name from an authority file but to link to the entry in that file, in Eric Miller’s words to move from cataloguing to “catalinking”.


So was this really about ebooks? Probably not, and the point was made that over the years the name of the event has variously stressed ebooks and econtent and that over that time what is meant by “ebook” has changed. I must admit that for me there is something about the idea of a [e]book that I prefer over a “content aggregation” but if we use the term ebook, let’s use it acknowledging that the book of the future will be as different from what we have now as what we have now is from the medieval scroll.

Picture Credit
Scanned image of page of the Epistle of St Jerome in the Gutenberg bible taken from Wikipedia. No Copyright.

eTextBooks Europe

I went to a meeting for stakeholders interested in the eTernity (European textbook reusability networking and interoperability) initiative. The hope is that eTernity will be a project of the CEN Workshop on Learning Technologies with the objective of gathering requirements and proposing a framework to provide European input to ongoing work by ISO/IEC JTC 1/SC36, WG6 & WG4 on eTextBooks (which is currently based around Chinese and Korean specifications). Incidentally, as part of the ISO work there is a questionnaire asking for information that will be used to help decide what that standard should include. I would encourage anyone interested to fill it in.

The stakeholders present represented many perspectives from throughout Europe: publishers, publishing industry specification bodies (e.g. IPDF who own EPUB3, and DAISY), national bodies with some sort of remit for educational technology, and elearning specification and standardisation organisations. I gave a short presentation on the OER perspective.

Many issues were raised through the course of the day, including (in no particular order)

  • Interactive and multimedia content in eTextbooks
  • Accessibility of eTextbooks
  • eTextbooks shouldn’t be monolithic and immutable chunks of content, it should be possible to link directly to specific locations or to disaggregate the content
  • The lifecycle of an eTextbook. This goes beyond initial authoring and publishing
  • Quality assurance (of content and pedagogic approach)
  • Alignment with specific curricula
  • Personalization and adaptation to individual needs and requirements
  • The ability to describe the learning pathway embodied in an eTextbook, and vary either the content used on this pathway or to provide different pathways through the same content
  • The ability to describe a range IPR and licensing arrangements of the whole and of specific components of the eTextbook
  • The ability to interact with learning systems with data flowing in both directions

If you’re thinking that sounds like a list of the educational technology issues that we have been busy with for the last decade or two, then I would agree with you. Furthermore, there is a decade or two’s worth of educational technology specs and standards that address these issues. Of course not all of those specs and standards are necessarily the right ones for now, and there are others that have more traction within digital publishing. EPUB3 was well represented in the meeting (DITA is the other publishing standard mentioned in the eTernity documentation, but no one was at the meeting to talk about that) and it doesn’t seem impossible to meet the educational requirements outlined in the meeting within the general EPUB3 framework. The question is which issues should be prioritised and how should they be addressed.

Of course a technical standard is only an enabler: it doesn’t in itself make any change to teaching and learning; change will only happen if developers create tools and authors create resources that exploit the standard. For various reasons that hasn’t happened with some of the existing specs and standards. A technical standard can facilitate change but there needs to a will or a necessity to change in the first place. One thing that made me hopeful about this was a point made by Owen White of Pearson that he did not to think of the business he is in as being centred around content creation and publishing but around education and learning and that leads away from the view of eBooks as isolated static aggregations.

For more information keep an eye on the eTernity website

The Challenge of ebooks

Yesterday I was in London, along with a group of people with a wide range of experience in digital resource management, OERs, and publishing for a workshop which was part of the Challenge of eBooks project. Here’s a quick summary and some reflections.

To kick off, Ken Chad defined eBooks for the purpose of the workshop, and I guess the report to be delivered by the project, as anything delivered digitally that was longer than a journal article. I’ll come back to what I think are the problems with that later, but we didn’t waste time discussing it. It did mean that we included in the discussion such things as scanned copies of texts such as those that can be made under the CLA licence, and the difficulties around managing and distributing those.

For the earliest printed books, or incunabula, such as the Gutenberg Bible, printers sought to mimic the hand written manuscripts with which 15th cent scholars were familiar; in much the same way publishers now seek to replicate printed books as ebooks.

With the earliest printed book, or incunabula, such as the Gutenberg Bible, printers sought to mimic the hand written manuscripts with which 15th cent scholars were familiar; in much the same way as publishers now seek to replicate printed books as ebooks.

The main part of the workshop was organised around a “jobs to be done” framework. The idea of this is to focus on what people are trying to do “people don’t want a 5mm drill bit, they want a 5mm hole”. I found that useful in distinguishing ebooks in the domain of HE from the vast majority of those sold. In the latter case the job to be done is simply reading the book: the customer wants a copy of a book simply because they want to read that book, or a book by that author, or a book of that genre, but there isn’t necessarily any further motive beyond wanting the experience of reading the book. In HE the job to be done (ultimately) is for the student or researcher to learn something, though other players may have a job to do that leads to this, for example providing a student with resources that will help them learn something. I have views on how the computing power in the delivery platform can be used for more than just making the delivery of text more convenient: how it can be used to make the content interactive, or to deliver multimedia content, or to aid discussion or just connect different readers of the same text (I was pleased that someone mentioned the way a kindle will show which passages have been bookmarked/commented on by other readers).

The issues raised in discussion included rights clearance, the (to some extent technical, but mostly legal) difficulties of creating course packs containing excerpts of selected texts, the diversity of platforms and formats, disability access, and relationships with publishers.

It was really interesting that accessibility featured so strongly. Someone suggested that this was because the mismatch between an ebook and the device on which it is displayed creates an impairment so frequently that accessibility issues are plain for all to see.

A lot of the issues seem to go back publishers struggling with a new challenge, not knowing how they can meet it and keep their business model intact. It was great to have Suzanne Hardy of the PublishOER project there with her experience of how publishers will respond to an opportunity (such as getting more information about their users through tracking) but need help in knowing what the opportunities are when all they can see is the threat of losing control of their content. Whether publishers can make the necessary changes in currently print-oriented business processes to realise these benefits was questioned. Also there are challenges to libraries in HE, who are used to being able to buy one copy of a book for an institution whereas publishers now want to be able to sell access to individuals–partly, I guess, so that they can make that link between a user and the content they provide, but also because one digital copy can go a lot further than a single physical copy.

Interestingly, the innovation in ebooks is coming not from conventional publishers but from players such as Amazon, Apple and from publishers such as O’Reilly and Pearson. (Note that Pearson have a stake in education that includes an assessment business, online courses and colleges and so go beyond being a conventional publisher.) Also, the drive behind these innovations comes from new technology making new business models possible, not from evolution of current business, nor, arguably, from user demand.

So, anyway, what is an ebook? I am not happy with a definition that includes web sites of additional content created to accompany a book, or pages of a physical book that have been scanned. That doesn’t represent the sort of technical innovation that is creating new and interesting opportunities and the challenges come with them. Yes there are important (long-standing) issues around digital content in general, some of which will overlap with ebooks, but I will be disappointed if the report from this project is full of issues that could have been written about 10yrs ago. That’s not because I think those issues are dead but because I think ebooks are something different that deserves attention. I’ll suggest two approaches defining to what that something is:

1. an ebook is what the ebook reading devices and apps read well. By-and-large that means content in mobi or ePub format. Ebook readers don’t handle scanned page images well. They don’t read most pdf well (though depends on the tool and nature of pdf used, but aim of pdf was to maintain page layout which is exactly what you don’t want on an ebook reader). Word processed files are borderline but mostly word processed documents are page-oriented which raises the same issue as with pdfs. In short WYSIWYG and ebooks don’t match.

2. an ebook is aggregated content, packaged so that it can be moved from server to device, with more-or-less linear navigation. In the aggregation (which is often a zip file under another extension name) are assets (the text, images and other content that are viewed) plus metadata that describes the book as a whole (and maybe the assets individually) and information about how the assets should be navigated (structural metadata describing the organisation of the book). That’s essentially what mobi and ePub are. It’s also what IMS Content Packaging and offspring like SCORM and Common Cartridge are; and for that matter it’s what the MS Office and Open Office formats are.

I had a short discussion with Zak Mensah of JISC Digital Media about whether the content should be mostly text based. I would like to see as much non-text material as is useful, but clearly there is a limit. It would be perverse to take a set of videos, sequence them one after another with screen of text between each one like a caption frame in a silent movie, and then call it a book. However, there is something more than text that would make sense as a book: imagine replacing all the illustrations in a well-illustrated text book with models, animations, videos … for example, a chemistry book with interactive models of chemical structures, graphs that change when you alter the parameters; or a Shakespeare text with videos of performance in parallel with text…that still makes sense as a book.

[image of page from Gutenberg Bible taken from wikipedia]

Text and Data Mining workshop, London 21 Oct 2011

There were two themes running through this workshop organised by the Strategic Content Alliance: technical potential and legal barriers. An important piece of background is the Hargreaves report.

The potential of text and data mining is probably well understood in technical circles, and were well articulated by JohnMcNaught of NaCTeM. Briefly the potential lies in the extraction of new knowledge from old through the ability to surface implicit knowledge and show semantic relationships. This is something that could not be done by humans, not even crowds, because of volume of information involved. Full text access is crucial, John cited a finding that only 7% of the subject information extracted from research papers was mentioned in the abstract. There was a strong emphasis, from for example Jeff Lynn of the Coalition for a digital economy and Philip Ditchfield of GSK, on the need for business and enterprise to be able to realise this potential if it were to remain competetive.

While these speakers touched on the legal barriers it was Naomi Korn who gave them a full airing. They start in the process of publishing (or before) when publishers acquire copyright, or a licence to publish with enough restriction to be equivalent. The problem is that the first step of text mining is to make a copy of the work in a suitable format. Even for works licensed under the most liberal open access licence academic authors are likely to use, CC-by, this requires attribution. Naomi spoke of attribution stacking, a problem John had mentioned when a result is found by mining 1000s of papers: do you have to attribute all of them? This sort of problem occurs at every step of the text mining process. In UK law there are no copyright exceptions that can apply: it is not covered by fair dealling (though it is covered by fair use in the US and similar exceptions in Norwegian and Japanese law, nowhere else); the exceptions for transient copies (such as in a computers memory when readng on line) only apply if that copy has not intrinsic value.

The Hargreaves report seeks to redress this situation. Copyright and other IP law is meant to promote innovation not stifle it, and copyright is meant to cover creative expressions, not the sort of raw factual information that data mining processes. Ben White of the British Library suggested an extension of fair dealling to permit data mining of legally obtained publications. The important thing is that, as parliament acts on the Hargreaves review people who understand text mining and care about legal issues make sure that any legislation is sufficient to allow innovation, otherwise innovators will have to move to those jurisdictions like the US, Japan and Norway where the legal barriers are lower (I’ll call them ‘text havens’).

Thanks to JISC and the SCA on organising this event; there’s obviously plenty more for them to do.

Hopes and fears for eReaders and eTextBooks

About 15 years ago, when I was first starting to promote the use of resources for “computer aided learning” the message was fairly clear: reading text off a screen is problematic so don’t use computers for this, use them for what they are good at. For me, in physical sciences at that time, they were good at multimedia presentation and the calculations necessary for creating interactive models that allow active engagement with the physics being taught. More generally, computers were good at things that allowed more pedagogically appropriate approaches to teaching and learning.

I’ve been disappointed since then: the most widely adopted applications of technology in teaching and learning are to use them to project presentations instead of transparencies on an OHP, and as VLEs to distribute course info and handouts. In both example the net impact of the computer is to do the same thing in a slightly more convenient way. Now a platform has reached maturity that allows a slightly more convenient way to read books, reproducing the text-on-paper experience. It’s bound to be the next big thing.

So, what to do about this? Admit that in practice technology will enhance learning by making small incremental improvements to established practice? Press for enhanced capability where it will facilitate good pedagogy? Work in anticipation of some revolutionary change driven by factors outwith the HE system?

In the meantime, some relevant stuff elsewhere:

An open e-Textbook usecase, our contribution to the ISO/IEC JTC1 SC36 Study Period on e-Textbooks.

Digital Textbooks, a blog devoted to documenting significant initiatives that relate to any and all aspects of digital textbooks, most notably their use in higher education.

Wolfram assistants the sort of good stuff that could find its way into a digital text book.

Amazon Kindle customer review on the bad stuff: problems with footnotes in academic eTexts.

A short update on Ramlet

Ramlet, or Resource Aggregation Model for Learning, Education and Training (which is working group 13 of the IEEE Learning Technology Standards Subcommittee) is an ongoing piece of work which aims to define a conceptual model that includes an ontology and a nomenclature for enabling the interpretation of externalized representations of digital aggregates of resources for learning, education, and training applications. In other words, it will help show semantic relationships between content aggregation formats such as IMS CP, ATOM, MPEG 21 DID and OAI-ORE.

Like many standardization efforts, progress is slow and gradual so it’s difficult to know when it’s worth giving an update. But last week the RAMLET technical editor, Scott Lewis sent this message about the conceptual model:

This standard has taken a long time, but it is a complex standard that presents an ontology for resource aggregation and down-loadable files to help implement the ontology.

The good news is that virtually all of the technical work has been done for the standard and for a series of IEEE recommend practices that will be published after the standard is published. The working group expects to have a draft of the base standard for internal review by year’s end and a balloting draft submitted to IEEE in Q1 of 2010. The series of recommended practices that specify mappings for IMS CP, ATOM, METS, MPEG-21 DID, and OAI-PMH ORE will be published as soon as possible after the standard is published. Again, the technical work for these recommend practice has been done, and it is just a matter of converting that work to IEEE recommended practices after the base standard has been approved.

CETIS’s Wilbert Kraan is taking part in the RAMLET work, working on a proof of concept implementation using standard open source components.