Repositories and the Open Web

On the 19 April, in London CETIS are holding a meeting in London on Repositories and the Open Web. The theme of the meeting is how repositories and social sharing / web 2.0 web sites compare as hosts for learning materials: how well does each facilitate the tasks of resource discovery and resource management; what approaches to resource description do the different approaches take; and are there any lessons that users of one approach can draw from the other?

Both the title of the event (does the ‘and’ imply a distinction? why not repositories on the open web?) and the tag CETISROW may be taken as slightly provocative. Well, the tag is meant lightheartedly, of course, and yes there is a rich vein of work on how repositories can work as part of the web. Just looking back are previous CETIS events I would like to highlight these contributions to previous meetings:

  • Lara Whitelaw presented on the PROWE Project, about using wikis and blogs as shared repositories to support part-time distance tutors in June 2006.
  • David Davies spoke about RSS, Yahoo! Pipes and mashups in June 2007.
  • Roger Greenhalgh, talking about the National Rural Knowledge Exchange, in the May 2008 meeting. And many of us remember his “what’s hot in pigs” intervention in an earlier meeting.
  • Richard Davis talking about SNEEP (social network extensions for ePrints) at the same meeting

Most recently we’ve seen a natural intersection between the aims of Open Educational Resources initiatives and the use of hosting on web 2 and social sharing sites, so, for example, the technical requirements suggested for the UKOER programme said this under delivery platforms:

Projects are free to use any system or application as long as it is capable of delivering content freely on the open web. However all projects must also deposit their content in JorumOpen. In addition projects should use platforms that are capable of generating RSS/Atom feeds, particularly for collections of resources e.g. YouTube channels. Although this programme is not about technical development projects are encouraged to make the most of the functionality provided by their chosen delivery platforms.

We have followed this up with some work looking at the use of distribution platforms for UKOER resources which treats web 2 platforms and repository software as equally useful for that task.

So, there’s a longstanding recognition that repositories live on the open web, and that formal repositories aren’t the only platform suitable for the management and dissemination of learning materials. But I would missing something I think important if I left it at that. For some time I’ve had misgivings about the direction that conceptualising your resource management and dissemination as a repository leads. A while back a colleague noticed that a description of some proposed specification work, which originated from repository vendors, developers and project managers, talked about content being “hidden inside repositories”, which we thought revealing. Similarly, I’ve written before that repository-think leads to talk of interoperability between repositories and repository-related services (I’m sure I’ve written that before). Pretty soon one ends up with a focus on repositories and repository-specific standards per se and not on the original problem of resource management and dissemination. A better solution, if you want to disseminate your resource widely, is not to “hide them in repositories” in the first place. Also, in repository-world the focus is on metadata, rather than resource description: the encoding of descriptive data into fields can be great for machines, but I don’t think that we’ve done a great job of getting that encoding right for educational characteristics of resources, and that this has been at the expense of providing suitable information for people.

Of course not every educational resource is open, and so the open web isn’t an appropriate place for all collections. Also, once you start using some of the web 2.0 social sharing sites for resource management you begin to hit some problems (no option for creative commons licensing, assumptions that the uploader created/owns the resource, limitations on export formats, etc.)–though there are some exceptions. It is, however, my belief that all repository software could benefit from the examples shown by the best of the social sharing websites, and my hope that we will see that in action during this meeting.

Detail about the meeting (agenda, location, etc.) will be posted on the CETIS wiki.

Registration is open, through the CETIS events system.

Resource description requirements for a UKOER project

CETIS have provided information on what we think are the metadata requirements for the UK OER programme, but we have also said that individual projects should think about their own metadata requirements in addition to these. As an example of what I mean by this, here is what I produced for the Engineering Subject Centre’s OER project.

Like it says on the front page it’s an attempt to define what information about a resource should be provided, why, for whom, and in what format, where:

“Who” includes project funders (HEFCE + JISC and Academy as their agents), project partner contributing resource, project manager, end users (teachers and students), aggregators—that is people who wish to build services on top of the collection.

“Why” includes resource management, selection and use as well as discovery through Google or otherwise, etc. etc.

“Format” includes free text for humans to read (which is incidentally what Google works from) and encoded text for machine operations (e.g. XML, RSS, HTML metatags, microformats, metadata embedded in other formats or held in the database of whatever content management system lies behind the host we are using).

You can read it on Scribd: Resource description requirements for EngSC OER project

[I should note that I work for the Engineering Subject Centre as well as CETIS and this work was not part of my CETIS work.]

It would be useful to know if other projects have produced anything similar. . .

About metadata & resource description (pt 2)

Trying to show how resource description on sites such as Flickr relates to metadata…

Some people have looked at the metadata requirements for the UK OER programme and taken them as a prescription for which LOM or Dublin Core elements they should use. After all that’s what metadata is, isn’t it? But UK OER projects are also encouraged to use Web2.0 or social sharing platforms (Flickr, YouTube, SlideShare etc.) to make their resources available, and these sites don’t know anything about metadata, do they?

Well, in my previous post I tried to distinguish between resource description and metadata, where resource description is pretty much any information about anything, and metadata is the structured information about a resource (acknowledging that the distinction is not always made by everyone). I think that some of the “metadata” requirements given for OER in various discussions are actually better seen at first as resource description requirements.

The second problem with seeing the UK OER metadata requirements as a prescriptions for which elements to use is that, to me at least, it misses the point of what metadata does best. I think that the best view of metadata is that it shows the relationship between resources. “Resources” here means anything — information resources like the OERs, people, places, things, organizations, abstract concepts — so long as the thing can be identified. What metadata does is express or assert a relationship such as “this OER was created by this person”.

So looking at an image’s “canonical” page on Flickr, we see a resource description which has a link to the photo stream of the person who uploaded it (me) and from there there is a link to my profile page on Flickr. That’s done with metadata, but how do we get at it?

Well, in the HTML for the image page the link is rendered as

<a href="/photos/philbarker/"
   title="Link to phil barker's photostream"
   rel="dc:creator cc:attributionURL">
       <b>phil barker</b>
</a>

the rel=”dc:creator cc:attributionURL” tell a computer what the relationship between this page and the URL is, i.e. that the URL identifies the creator of the page and should be used for attribution. That’s not great because I’m not my photostream, in fact my photostream doesn’t even describe me.

Things are better on the photostream page though, it has in its HTML

<link rel="alternate"
  type="application/rss+xml"
  title="Flickr: phil barker's Photostream RSS feed"
  href="http://api.flickr.com/services/feeds/photos_public.gne?id=56583935@N00&lang=en-us&format=rss_200">

which points any application that knows how to read HTML and RSS to the RSS feed for my photostream, where we see in the entry for that picture the following:

<author flickr:profile="http://www.flickr.com/people/philbarker/">nobody@flickr.com (phil barker)</author>

As well as the description of me (my name and not-my-email-address) there is the link to my profile page. Looking at the HTML for that profile page, not only does it generate a human readable rendering in a browser, but it includes the following


<div class="vcard">
    <span class="nickname">phil barker</span>
...
    <span class="RealName">/
        <span class="fn n">
           <span class="given-name">Phil</span>
           <span class="family-name">Barker</span>
        </span>
    </span>
...
</div>

That is a computer readable hCard microformat version of my contact information (coincidentally it’s the same underlying schema for person-data that is used in the LOM)

So there’s your Author metadata on Flickr. And I’ll note that all this happened without me ever thinking that I was “cataloguing”!

To generalise and idealise slightly, the resource pages (the canonical page for the image, the photostream page, my profile page) have embedded in them one or more of the following

  • links which describe the relationship of the resources described on those pages to each other in a computer-readable manner
  • links to alternative descriptions in machine readable metadata, e.g. an RSS or ATOM XML file for the resource described on the page
  • embedded computer readable metadata, e.g. vCard person-data embedded in the hCard microformat.

See also Adam’s post Objects in this Mirror are Closer than they Appear: Linked Data and the Web of Concepts.

About metadata & resource description (pt 1)

Trying to distinguish between metadata and resource description…

In our online support session for the UKOER programme, some of which John has summarized (1 2 3), instead of giving participants a definition of what metadata is we gave them a choice and asked them to vote on what they understood the word to mean.

The options were:
A: data about data
B: structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource.
C: pretty much any information about anything.
D: any of the above.

You might recognise option A as the etymological definition, B as the NISO’s definition, found in Understanding Metadata [pdf]. I was interested in how many people included C in what they understood when they used/heard the term metadata. This was prompted by comment, I forget from whom and in what context, that the idea of metadata defined in option B was fine in a specialized academic sense, but the the word was used more widely and so loosely that you could no longer rely on that being what people meant. In other words you could not assume that someone who said they had metadata would be able provide you with nicely structured machine readable XML/RDF/HTML-Meta tagged information.

Our sample of participants in the online session wasn’t scientifically chosen. Everyone had some connexion with the UK OER programme either working for a project or helping to manage or provide advice to the programme; there were approximately equal representation of managers and technical people (with some overlap, I guess), and one person had a library/information science background (that was my co-presenter, John!). The vote came out as
5 for A: Data about data;
14 for B: Structured information…;
0 for C: any information about anything;
10 for D: any of the above.

In retrospect it’s not surprising that no one voted for C, since the people in our audience who recognise that as a meaning are likely to have come across A and B as well.

Like someone said during the vote, you can tell B is the “right” answer because it is the longest and most formal looking option :-). For me, data about data is too restrictive in range and I think it would be helpful not to call option C/D metadata. I would rather use the term resource description to cover all options and reserve metadata for the structured information about a resource (which includes but is broader than data about data). So metadata tells a computer that 2009-09-11 is to be interpreted as a date in ISO8601 format and is the sort of structured information found in LOM and Dublin Core. Resource description may be metadata or may be free text for people to read. Computers such as those run by Google can do a pretty good job of processing information aimed at people; people (on the whole) aren’t very good at information aimed at computers.

I think that the best view of metadata is that it shows the relationship between resources. “Resources” here means anything that can be identified (if you cannot identify it you cannot show how things are related to it), including: information resources like the OERs, people, places, things, organizations, abstract concepts. What metadata does is express the assertion that this OER (for example) was created by this person. I’ll try to show how this allows the mixing up of metadata and resource description (in a good way) in my next post.