Even in the knowledge that current mainstream EPUB readers and applications for managing eBooks will most likely ignore all but the most trivial metadata, we still have use cases that involve more sophisticate metadata. For example we would like to use the LRMI alignment object in schema.org to say that a particular subsection of a book can be useful in the context of a specific unit in a shared curriculum.
So, without evaluating pros and cons, starting from the most basic/most common, what are the options? This is a summary takes information from Garrish and Gulling, EPUB 3 Best Practices, OReilly 2013, (which I take to be authoritative and also as an example of best practice with regard to the metadata in the epub file) as well as the EPUB 3.0 Publications and Content Documents specifications. Any comments would be greatly appreciated.
1. Simple Dublin Core
Within the OEPBS directory of an unpacked EPUB3 is the content.opf file. It pretty much equates to the manifest of an IMS Content Package. The top-level element is <package> and <metadata> is a required first child of <package>.
The default metadata vocabulary is the Dublin Core Metadata Element Set (DCMES, simple DC), with prefix dc:. Three elements are mandatory–title, identifier and language–others are optional. For example, in /OEPBS/content.opf
<?xml version=’1.0’ encoding=’UTF-8’?> <package xmlns:dc="http://purl.org/dc/elements/1.1/ [...]"> <metadata> <dc:identifier>urn:isbn:9781449325299</dc:identifier> <dc:title>EPUB 3 Best Practices</dc:title> <dc:language>en</dc:language> <dc:rights>Copyright © 2013 Matt Garrish and Markus Gylling</dc:rights> [...]
2 Other metadata schemas
The package element has a prefix attribute that may be used to declare prefixes for metadata schemas other than DCMES. Four vocabularies are reserved, i.e. the prefix may be used without a declaration: dcterms, marc, onix and media (the vocabulary used for EPUB3 media overlays). Example
<dcterms:title>EPUB 3 Best Practices</dcterms:title>
Other vocabularies may be used providing a prefix and a URL in a way so similar to xmlns that is makes you wonder why they didn’t just use xmlns.
<package prefix="prism: http://prismstandard.org/namespaces/basic/3.0/" [...]>
3 the meta element
If used without the refines attribute (see below) the meta element can provide information about the package as a whole, e.g.
<meta property="dcterms:title">EPUB 3 Best Practices</meta>
I have no idea what would be the benefit of this over <dcterms:title>.
4 Refining metadata elements: id attribute and the meta element
The id attribute can be used to provide an identifier any element in the metadata that it may be refined. One example of this is mandatory, i.e. that one occurrence of the dc:identifier element must be the publication identifier:
<dc:identifier id="pub-identifier">urn:isbn:9781449325299</dc:identifier>
In general the refinements are described using the meta element with the refines attribute and a property attribute that specifies the nature of the refinement. It’s kind of like RDF reification. The default vocabulary for the property attribute includes “file-as” – an alternative string for a name to be used when filing, “identifier-type” – a way to distinguish between different identifiers, “meta-auth” – the authority for a given instance of metadata, “title-type” – which of the six forms of title being provided.
<dc:creator id="1234">Matt Garrish</dc:creator> <meta refines="#1234" property="file-as" id="5678">Garrish, Matt</meta> <meta refines="#1234" property="role">Author</meta>
Terms from other vocabularies may be used for “property” so long as a prefix is declared.
Refinements may have ids and so may be refined.
<meta refines="#5678" property="meta-auth">Phil Barker</meta>
So and so you can make statements about your metadata statements to you heart’s content (though including the whole of the linked data graph in each epub would be silly).
The scheme attribute may be used to identify the controlled vocabulary from which the meta element’s value is drawn. For example, if the identifier is a DOI (which in onix is apparently entry 06 of codelist 5) you can have
<dc:identifier id="pub-id">urn:doi:10.1016/j.iheduc.2008.03.001 </dc:identifier> <meta refines="#pub-id" property="identifier-type" scheme="onix:codelist5">06</meta>
Or, using the marc relator value Aut to specify author
<meta refines="#1234" property="role" scheme="marc:relators">Aut</meta>
5 Sub-package level metadata
The id attribute may be used to provide an identifier of an subelement of <package> or any element in the XHTML content documents, down to a span element around a phrase, word or character. So a chapter may have id=”chap1″ then we can use meta elements in the metadata to describe it seperately from the rest of the epub
<meta refines="#chap1" property="prism:contentType">bookChapter<meta>
6 Links to metadata records
The link element is an optional, repeatable subelement of <metadata>, “used to associate resources with a publication, such as metadata records” The metadata may be within package or anywhere on the www.
Example
<link rel="marc21xml-record" href="pub/meta/nor-wood-marc21.xml" /> <link refines="#chap1" rel="ex:schema_org-record" media-type="application/ld+json" href="http://example.org/nor-wood-lrmi.json" />
Metadata embedded in the XHTML5 content
As far as I can see the EPUB3 specs are mute on metadata in HTML of the content documents, e.g. as html:meta elements or as microdata or RDFa, there doesn’t seem to be any reason why one should not put metadata here. I wouldn’t expect any EPUB system to look that deeply into the package but it would be a good approach to helping the metadata travel with the resource if the EPUB is disaggregated and passed into a non-EPUB3 CMS.