About metadata & resource description (pt 2)

Trying to show how resource description on sites such as Flickr relates to metadata…

Some people have looked at the metadata requirements for the UK OER programme and taken them as a prescription for which LOM or Dublin Core elements they should use. After all that’s what metadata is, isn’t it? But UK OER projects are also encouraged to use Web2.0 or social sharing platforms (Flickr, YouTube, SlideShare etc.) to make their resources available, and these sites don’t know anything about metadata, do they?

Well, in my previous post I tried to distinguish between resource description and metadata, where resource description is pretty much any information about anything, and metadata is the structured information about a resource (acknowledging that the distinction is not always made by everyone). I think that some of the “metadata” requirements given for OER in various discussions are actually better seen at first as resource description requirements.

The second problem with seeing the UK OER metadata requirements as a prescriptions for which elements to use is that, to me at least, it misses the point of what metadata does best. I think that the best view of metadata is that it shows the relationship between resources. “Resources” here means anything — information resources like the OERs, people, places, things, organizations, abstract concepts — so long as the thing can be identified. What metadata does is express or assert a relationship such as “this OER was created by this person”.

So looking at an image’s “canonical” page on Flickr, we see a resource description which has a link to the photo stream of the person who uploaded it (me) and from there there is a link to my profile page on Flickr. That’s done with metadata, but how do we get at it?

Well, in the HTML for the image page the link is rendered as

<a href="/photos/philbarker/"
   title="Link to phil barker's photostream"
   rel="dc:creator cc:attributionURL">
       <b>phil barker</b>

the rel=”dc:creator cc:attributionURL” tell a computer what the relationship between this page and the URL is, i.e. that the URL identifies the creator of the page and should be used for attribution. That’s not great because I’m not my photostream, in fact my photostream doesn’t even describe me.

Things are better on the photostream page though, it has in its HTML

<link rel="alternate"
  title="Flickr: phil barker's Photostream RSS feed"

which points any application that knows how to read HTML and RSS to the RSS feed for my photostream, where we see in the entry for that picture the following:

<author flickr:profile="http://www.flickr.com/people/philbarker/">nobody@flickr.com (phil barker)</author>

As well as the description of me (my name and not-my-email-address) there is the link to my profile page. Looking at the HTML for that profile page, not only does it generate a human readable rendering in a browser, but it includes the following

<div class="vcard">
    <span class="nickname">phil barker</span>
    <span class="RealName">/
        <span class="fn n">
           <span class="given-name">Phil</span>
           <span class="family-name">Barker</span>

That is a computer readable hCard microformat version of my contact information (coincidentally it’s the same underlying schema for person-data that is used in the LOM)

So there’s your Author metadata on Flickr. And I’ll note that all this happened without me ever thinking that I was “cataloguing”!

To generalise and idealise slightly, the resource pages (the canonical page for the image, the photostream page, my profile page) have embedded in them one or more of the following

  • links which describe the relationship of the resources described on those pages to each other in a computer-readable manner
  • links to alternative descriptions in machine readable metadata, e.g. an RSS or ATOM XML file for the resource described on the page
  • embedded computer readable metadata, e.g. vCard person-data embedded in the hCard microformat.

See also Adam’s post Objects in this Mirror are Closer than they Appear: Linked Data and the Web of Concepts.

About metadata & resource description (pt 1)

Trying to distinguish between metadata and resource description…

In our online support session for the UKOER programme, some of which John has summarized (1 2 3), instead of giving participants a definition of what metadata is we gave them a choice and asked them to vote on what they understood the word to mean.

The options were:
A: data about data
B: structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource.
C: pretty much any information about anything.
D: any of the above.

You might recognise option A as the etymological definition, B as the NISO’s definition, found in Understanding Metadata [pdf]. I was interested in how many people included C in what they understood when they used/heard the term metadata. This was prompted by comment, I forget from whom and in what context, that the idea of metadata defined in option B was fine in a specialized academic sense, but the the word was used more widely and so loosely that you could no longer rely on that being what people meant. In other words you could not assume that someone who said they had metadata would be able provide you with nicely structured machine readable XML/RDF/HTML-Meta tagged information.

Our sample of participants in the online session wasn’t scientifically chosen. Everyone had some connexion with the UK OER programme either working for a project or helping to manage or provide advice to the programme; there were approximately equal representation of managers and technical people (with some overlap, I guess), and one person had a library/information science background (that was my co-presenter, John!). The vote came out as
5 for A: Data about data;
14 for B: Structured information…;
0 for C: any information about anything;
10 for D: any of the above.

In retrospect it’s not surprising that no one voted for C, since the people in our audience who recognise that as a meaning are likely to have come across A and B as well.

Like someone said during the vote, you can tell B is the “right” answer because it is the longest and most formal looking option :-). For me, data about data is too restrictive in range and I think it would be helpful not to call option C/D metadata. I would rather use the term resource description to cover all options and reserve metadata for the structured information about a resource (which includes but is broader than data about data). So metadata tells a computer that 2009-09-11 is to be interpreted as a date in ISO8601 format and is the sort of structured information found in LOM and Dublin Core. Resource description may be metadata or may be free text for people to read. Computers such as those run by Google can do a pretty good job of processing information aimed at people; people (on the whole) aren’t very good at information aimed at computers.

I think that the best view of metadata is that it shows the relationship between resources. “Resources” here means anything that can be identified (if you cannot identify it you cannot show how things are related to it), including: information resources like the OERs, people, places, things, organizations, abstract concepts. What metadata does is express the assertion that this OER (for example) was created by this person. I’ll try to show how this allows the mixing up of metadata and resource description (in a good way) in my next post.