Phil Barker » lrmi

LRMI, Open badges and alignment objects

Phil Barker — Thu, 03 Apr 2014 11:59:55 +0000

I had the pleasure yesterday to talk on the Mozilla Open Badges community call about how LRMI and Open Badges may intersect. Open Badges are a means of displaying digital recognition of skills and achievements, there’s a technical framework behind the badges that offers the means of providing data in support of the claimed achievement. A particular part of this technical framework is the assertion specification, which includes a pointer from each badge to “the educational standards this badge aligns to, if any”. This parallels the LRMI alignment object very closely: in short the educationalAlignment property that LMRI added to schema.org allows encoding of statements along the lines of “this resource [teaches|assess|requires|has level] X” where X is some point in an shared educational framework, e.g. of attainment standards, topics or educational levels or shared curriculum. Diagrammatically

The creative work aligns with a node in an educational framework. The alignment object identifies that node and the nature of the alignment.

The Mozilla badge alignment object is described thus:

Property	Expected Type	Description
name	Text	Name of the alignment.
url	URL	URL linking to the official description of the standard.
description	Text	Short description of the standard

and an example is provided

{
  "name": "Awesome Robotics Badge",
...
  "alignment": [
    { "name": "CCSS.ELA-Literacy.RST.11-12.3", 
      "url": "http://www.corestandards.org/ELA-Literacy/RST/11-12/3", 
      "description": "Follow precisely a complex multistep procedure when carrying out experiments, taking measurements, or performing technical tasks; analyze the specific results based on explanations in the text."
    }]
...
}

Diagrammatically:

The badge information includes an assertion that the skill or achievement aligns with some point in an educational standard

Not only do the LRMI and Open Badge alignment objects both do the same thing they seem to have have the following semantically equivalent properties relating to identifying the thing that is aligned to:

OpenBadge alignment object URL == LRMI alignment object targetURL
OpenBadge alignment object name == LRMI alignment object targetName
OpenBadge alignment object description == LRMI alignment object targetDescription

(I like to think that this is not coincidence, but I don’t know how the similarity arose.)

The differences:

Open Badges do not identify the type of alignment. It has no need, I guess, since the alignment is always one of “asserts ability at” or something similar. LRMI currently recommends no relevant value.
Open Badges do not name the framework, I guess the assume that identifying the node will lead to knowledge of the framework. LRMI felt that this would not always be enough.
The LRMI alignment object can be used in conjunction with a property of schema.org/CreativeWorks, I don’t think Mozilla open badge assertions are creative works in that sense, I think they are some type of schema.org/Intangible.
Syntactically, OpenBadge assertions are made using JSON, I don’t think they use microdata. Through schema.org, LRMI uses microdata and JSON-LD.

aligning the alignment objects

The discussion that I hope to kick off with the Mozilla Open Badge and LRMI communities is should/could we make the similarities between the two alignment objects more explicit? This would give developers a two-for-one offer, understand the way Open Badges expresses alignment and you’ve understood what LRMI does, and vice versa. I don’t suppose either group wants to change a spec that is in productive use, but an informative statement about the similarities could be provided without changing either.

Beyond that I wonder if the Open Badge community have thought about use of schema.org when advertising badges, i.e. if you provide a webpage saying “we offer the following badges for X, Y and Z” would there be benefit in marking this up with schema.org microdata to improve discoverability by search engines? If there is benefit in doing so, then it would be worth thinking about what type of schema.org Thing badges are and how the LRMI alignment object might be attached to it.

The bigger picture is that someone working with the starting point of wanting to learn about something could find resources to help them learn it with the help of LRMI alignments and discover the means of showing that they had learnt it via Open Badge alignments.

Explaining the LRMI Alignment Object

Phil Barker — Thu, 06 Mar 2014 15:04:20 +0000

The educational alignment property and the associated alignment object that LRMI introduced into schema.org have been described as the “killer feature” for LRMI. However, I know from the number of questions asked about the alignment object and from examples I have seen of it being used wrongly that it is not the easiest construct to understand.

Perhaps the problems come from the nature of the alignment object as a conceptual abstraction, so maybe it will be help to show some concrete examples of how it may be used. However, bear in mind that the abstraction was a deliberate design decision made so that the alignment object should be more widely applicable than the examples given here. So I will first discuss a little about why some simpler more direct approaches were considered and rejected (as were some approaches that would be even more abstract).

basic use case

The general use case for which the alignment object was introduced to meet was , in brief,

“help people find resources that can be useful in teaching or learning in some specific scenario.”

That looks deceptively simple. The complications come when defining the “specific scenario” and unpacking the word “useful”.

enter “educational frameworks”

One practical approach to defining various aspects of the specific scenario involves reference to an educational framework of some sort. By educational framework I mean a structured description of educational concepts such as a shared curriculum, syllabus or set of learning objectives, or a vocabulary for describing some other aspect of education such as educational levels or reading ability.

“Educational framework” is a deliberately broad concept as we wanted LRMI to be applicable globally and across many levels and modes of education. Some specific examples are school-level curricula or attainment standards such as:

the US Common Core State Standards initiative, which defines learning objectives for K12 Maths and English,
the National Curriculum does similar for England but is less specific and covers other disciplines as well
Scotland’s Curriculum for Excellence which covers learning experiences as well as outcomes.

Perhaps more relevant to higher education many professional bodies define the competencies required to become a member of their profession, for example:

The UK General Medical Council’s “Tomorrow’s Doctors“,
the UK Engineering Council’s “Standard for Professional Engineering Competence” and
the Law Society of Scotland’s Professional Education and Training requirements.

As well as having a role in defining competences and outcomes, measures of academic level or difficulty may be useful independently as reference points, for example:

the US K12 grade levels are well understood in terms of school level,
the more formally defined Scottish Credit and Qualifications Framework (SCQF) level descriptors
various empirical measures of reading difficulty, for example general idea of “reading age” and the specific measures of reading ability and text level used by lexile.

One the other hand you may just want to specify the subject being taught, or the educational discipline for which is it being taught. Various classifcation schemes for academic subjects are available, for example:

in the UK, JACS, the Joint Academic Coding System, is used to describe the subject of a course
the W3C open linked education community group aims to build a subject database of wider scope.

All of these frameworks (and many others) may be used to describe aspects of an educational scenario.

ways of being useful

Life isn’t simple enough for us to meet the use case described above by adding a single property to schema.org Creative Works to say that the resource “aligns with” (i.e. is useful in the context defined by) some entry or node in an educational framework. In prescribing a “useful” resource we would want to distinguish between resources that teach and asses a topic; we also want a resource that assumes suitable previous knowledge, or requires some specific reading level, or assumes a certain general academic level. There may be other forms of alignment. There isn’t agreement on a minimum core set of properties required to address that word “useful” in the use case, but there is agreement that a resource can “align” with an “educational framework” in several ways, some of which we can enumerate. Hence the birth of the alignment property and abstract Educational Alignment object.

the abstraction

I think of it like this:

We start with a Creative work:

and an educational framework:
(Note, there is no schema.org class of type EducationalFramework, but we assume that we can refer to some of the following properties pertaining to it: some text that identifies the framework as whole (let’s call it a name), and the URLs, names and/or descriptions of nodes within the framework.)

The alignment object was created to describe the relationship between the two. The following properties alignment objects are defined: educationalFramework, which can be used to hold text that identifies the educational framework you are pointing to; targetDescription, targetName and targetURL, which can hold the values that correspond to properties we assumed that nodes in the educational framework would have. It also has an alignmentType property that I think of switching the object to specify the different types of alignment that are possible. So we can put them together to express an alignment between a creative work and some node in an educational framework:

common mistakes

I have seen both of these mistakes in actual markup of webpages.

1. the alignment object on its own is fairly meaningless. Unless it is referenced by the educational Alignment property of a creative work it’s as useful as half a link.

2. since the alignment object is a proper schema.org Thing (to be specific a subtype of an Intangible Thing) it inherits the properties that every schema.org Thing has. e.g. a name, a URL, a description an image. Some of these make some sense in some cases (see below) but importantly, none of them are used in expressing the alignment: the url of an alignment object is not the same as the url of the creative work or the node to which it aligns.

real-world examples of alignment assertions

I would like to use two real-world examples of where services provide information that can be seen as an assertion that a resource is useful in connection with (i.e. aligns with) an educational framework:

1. Kritikos, where students can tell other students what is useful for their course.

Screen shot of Kritikos information page about an MIT OCW lecture video. See it in kritikos.

Kritikos is a custom search engine for visual media relevant to teaching and learning engineering. In part the customisation comes through the use of a Google CSE, but more relevant to this post is the part that comes through allowing users to classify whether resources found on it are useful for specific courses [aside: this part of the kritikos service is built on a Learning Registry node].

The example shown here is the kritikos information page for a video of a lecture from MIT Open CourseWare. It includes “what others are saying about this resource” with the information from a year 3 MEng Aerospace Engineering student that it is relevant to “Flight Dynamics and Control”. The link from this assertion leads to other resources deemed useful by users for that module. “Flight Dynamics and Control” is a module at the University of Liverpool (code AERO317) that exists within the framework of Liverpool’s Aerospace Engineering programme. It is worth noting that kritikos can also be used to record when a resource is not relevant to a course–this is useful for weeding out false positives that get through the Google custom search engine. [Disclosure/bragging: I had an advisory role in the project that lead to kritikos.]

So, there’s an expression of an educational alignment; how does it relate to the alignment object?

The creative work in question is the MIT lecture (to be precise it’s a http://schema.org/VideoObject), we could describe a few of its characteristics with schema.org properties:
name = “Lec 7 | MIT 16.885J Aircraft Systems Engineering, Fall 2005″
url=http://www.youtube.com/watch?v=2QRfkG7jOfY
duration = PT110M22S
I’m not guessing this, the YouTube page has Schema.org microdata in it.

The node in the educational framework is a bit less well defined, but we would be justified in calling the module description a node in a framework called “University of Liverpool Modules” and saying the name for this node is “AERO317″, its description is “Flight Dynamics and Control”. It has a page on the web which gives us a url, http://tulip.liv.ac.uk/mods/vital/vital_AERO317_200809.htm. So we can express the alignment:

item type=http://schema.org/VideoObject
    name = "Lec 7 | MIT 16.885J Aircraft Systems Engineering, Fall 2005"
    url = http://www.youtube.com/watch?v=2QRfkG7jOfY
    duration = PT110M22S
    educationalAlignment = item1

item1 type= http://schema.org/AlignmentObject
    alignmentType = "Teaches"
    educationalFramework = "University of Liverpool  Modules"
    targetName = "AERO317"
    targetDescrption = "Flight Dynamics and Control"
    targetUrl = http://tulip.liv.ac.uk/mods/vital/vital_AERO317_200809.htm

What about the other properties of the AlignmentObject, the ones it inherited by virtue of being an official Intangible Thing in the schema.org hierarchy? Well you could envisage the image property pointing to the screenshot above, and the url property being a url with a fragment identifier that points to the “what others are saying” part of the kritikos page. Sure, you can give it a name and descriptions if you want to. Maybe these aren’t especially useful, but the point it that they are clearly different from the url, name and description of the University of Liverpool course to which the MITOCW video aligns.

2. OER Commons, aligning to US Common Core State Standards

I’ll cover this in less detail. The main problem with the example above is that the educational framework, while locally useful, is somewhat ad hoc we had to kind of look at the course structure at Liverpool University in a certain way to see it as an educational framework. Better examples of a more widely shared and more formally constructed educational frameworks are those of the US Common Core State Standards Initiative. OER Commons is a repository and search engine for Open Educational Resources that expresses alignment to these frameworks in its descriptions.

Screenshot from a resource description on OERCommons showing educational alignment information on the right.

The screenshot on the left shows such an alignment being displayed (the image links to the actual page in question, which is more legible). You see that in this case the creative work called “Chocolate Chocolate Chocolate” aligns with the Common Core Standard “CCSS.ELA-Literacy.RL.1.9 : Compare and contrast the adventures and experiences of characters in stories.”

Interestingly there is some other information given about the “degree of alignment”, i.e. how good a match that resource is to teaching that State Standard.

justification for the abstraction of the alignment object

In part the motivation for creating an alignment object class in schema.org was the issue mentioned above about not knowing what might be all the possible forms of alignment between a resource and an educational framework used to characterise some aspect of a teaching and learning scenario. However I hope the examples above go someway to showing that alignments are real (if intangible) things, you can give them URLs, and names if you want. Furthermore they do have properties. For example, they are asserted by someone: a student at Liverpool University in the kritikos example and a user of OER Commons in the other. In the OER Commons example there is other information about the degree of alignment. This goes some way to convincing me that the alignment object isn’t just some computer science trick of indirection.

Learning Resource Metadata is Go for Schema

Phil Barker — Wed, 24 Apr 2013 13:44:39 +0000

The Learning Resource Metadata Initiative aimed to help people discover useful learning resources by adding to the schema.org ontology properties to describe educational characteristics of creative works. Well, as of the release of schema draft version 1.0a a couple of weeks ago, the LRMI properties are in the official schema.org ontology.

Schema.org represents two things: 1, an ontology for describing resources on the web, with a hierarchical set of resource types each with defined properties that relate to their characteristics and relationships with other things in the schema hierarchy; and 2, a syntax for embedding these into HTML pages–well, two syntaxes, microdata and RDFa lite. The important factor in schema.org is that it is backed by Google, Yahoo, Bing and Yandex, which should be useful for resource discovery. The inclusion of the LRMI properties means that you can now use schema.org to mark up your descriptions of the following characteristics of a creative work:

audience the educational audience for whom the resource was created, who might have educational roles such as teacher, learner, parent.

educational alignment an alignment to an established educational framework, for example a curriculum or frameworks of educational levels or competencies. Expressed through an abstract thing called an Alignment Object which allows a link to and description of the node in the framework to which the resource aligns, and specifies the nature of the alignment, which might be that the resource ‘assesses’, ‘teaches’ or ‘requires’ the knowledge/skills/competency to which the resource aligns or that it has the ‘textComplexity’, ‘readingLevel’, ‘educationalSubject’ or ‘educationLevel’ expressed by that node in the educational framework.

educational use a text description of purpose of the resource in education, for example assignment, group work.

interactivity type The predominant mode of learning supported by the learning resource. Acceptable values are ‘active’, ‘expositive’, or ‘mixed’.

is based on url A resource that was used in the creation of this resource. Useful for when a learning resource is a derivative of some other resource.

learning resource type The predominant type or kind characterizing the learning resource. For example, ‘presentation’, ‘handout’.

time required Approximate or typical time it takes to work with or through this learning resource for the typical intended target audience

typical age range The typical range of ages the content’s intended end user.

Of course, much of the other information one would want to provide about a learning resource (what it is about, who wrote it, who published it, when it was written/published, where it is available, what it costs) was already in schema.org.

Unfortunately one really important property suggested by LRMI hasn’t yet made the cut, that is useRightsURL, a link to the licence under which the resource may be used, for example the creative common licence under which is has been released. This was held back because of obvious overlaps with non-educational resources. The managers of schema.org want to make sure that there is a single solution that works across all domains.

Guides and tools

To promote the uptake of these properties, the Association of Educational Publishers has released two new user guides.

The Smart Publisher’s Guide to LRMI Tagging (pdf)

The Content Developer’s Guide to the LRMI and Learning Registry (pdf)

There is also the InBloom Tagger described and demonstrated in this video.

LRMI in the Learning Registry

As the last two resources show, LRMI metadata is used by the Learning Registry and services built on it. For what it is worth, I am not sure that is a great example of its potential. For me the strong point of LRMI/schema.org is that it allows resource descriptions in human readable web pages to be interpreted as machine readable metadata, helping create services to find those pages; crucially the metadata is embedded in the web page in way that Google trusts because the values of the metadata are displayed to users. Take away the embedding in human readable pages, which is what seems to happen when used with the learning registry, and I am not sure there is much of an advantage for LRMI compared to other metadata schema,–though to be fair I’m not sure that there is any comparative disadvantage either, and the effect on uptake will be positive for both sides. Of course the Learning Registry is metadata agnostic, so having LRMI/schema.org metadata in there won’t get in the way of using other metadata schema.

Disclosure (or bragging)

I was lucky enough to be on the LRMI technical working group that helped make this happen. It makes me vary happy to see this progress.

Where does schema.org fit in the (semantic) web?

Phil Barker — Thu, 16 Aug 2012 16:02:46 +0000

Over the summer I’ve done a couple of presentations about what schema.org is and how it is implemented (there are links below). Quick reminder: schema.org is a set of microdata terms (itemtypes and properties) that big search engines have agreed to support. I haven’t said much about why I think it is important, with the corollary of “what it is for?”.

The schema.org FAQ answers that second question with:

…to improve the web by creating a structured data markup schema supported by major search engines. On-page markup helps search engines understand the information on web pages and provide richer search results. … Search engines want to make it easier for people to find relevant information on the web.

So, the use case for schema.org is firmly anchored around humans searching the web for information. That’s important to keep in mind because when you get into the nitty gritty of what schema.org does, i.e. identifying things and describing their characteristics and relationships to other things, in the context of the web, then you are bound to run into people who talk about the semantic web, especially because the RDFa semantic web initiative covers much of the same ground as schema.org. To help understand where schema.org fits into the semantic web more generally it is useful to think about what various semantic web initiatives cover that schema doesn’t. Starting with what is closest to schema.org, this includes: resource description for purposes other than discovery; descriptions not on web pages; data feeds for machine to machine communication; interoperability for raw data in different formats (e.g. semantic bioinformatics); ontologies in general, beyond the set of terms agreed by schema.org partners, and their representation. RDFa brings some of this sematic web thinking to the markup of web pages, hence the overlap with schema.org. Thankfully, there is now an increasing overlap between the semantic web community and the schema.org community, so there is an evolving understanding of how they fit with each other. Firstly, the schema.org data model is such that:

“[The] use of Microdata maps easily into RDFa Lite. In fact, all of Schema.org can be used with the RDFa Lite syntax as is.”

Secondly there is a growing understanding of the complementary nature of schema.org and RDFa, described by Dan Brickley; in summary:

This new standard [RDFa1.1], in particular the RDFa Lite specification, brings together the simplicity of Microdata with improved support for using multiple schemas together… Our approach is “Microdata and more”.

So, if you want to go beyond what is in the schema.org vocabulary then RDFa is a good approach, if you’re already committed to RDFa then hopefully you can use it in a way that Google and other search engines will support (if that is important to you). However schema.org was the search engine providers’ first choice when it came to resource discovery, at least first in the chronological sense. Whether it will remain their first preference is moot but in that same blog post mentioned above they make a commitment to it that (to me at least) reads as a stronger commitment than what they say about RDFa:

We want to say clearly that we continue to support Microdata

It is interesting also to note that schema.org is the search engine company’s own creation. It’s not that there is a shortage of other options for embedding metadata into web pages, HTML has always had meta tags for description, keywords, author, title; yet not only are these not much supported but the keywords tag especially can be considered harmful. Likewise, Dublin Core is at best ignored (see Invisible institutional repositories for an account of the effect of the use of Dublin Core in Google Scholar–but note that Google Scholar differs in its use of metadata from Google’s main search index.)

So why create schema.org? The Google schema.org faq says this:

Having a single vocabulary and markup syntax that is supported by the major search engines means that webmasters don’t have to make tradeoffs based on which markup type is supported by which search engine. schema.org supports a wide collection of item types, although not all of these are yet used to create rich snippets. With schema.org, webmasters have a single place to go to learn about markup for a wide selection of item types, search engines get structured information that helps improve search result quality, and users end up with better search results and a better experience on the web.

(NB: this predates the statement quoted above about “Microdata and more” approach)

There are two other reasons I think are important: control and trust. While anyone can suggest extensions to and comment on the schema.org vocabulary through the W3C web schemas task force, the schema.org partners, i.e. Google, Microsoft Bing, Yahoo and Yandex pretty much have the final say on what gets into the spec. So the search engines have a level of control over what is in the schema.org vocabulary. In the case of microdata they have chosen to support only a subset of the full spec, and so have some control over the syntax used. (Aside: there’s an interesting parallel between schema.org and HTML5 in the way both were developed outwith the W3C by companies who had an interest in developing something that worked for them, and were then brought back to the W3C for community engagement and validation.)

Then there is trust, that icing on the old semantic web layer cake (perhaps the cake is upside down, the web needs to be based on trust?). Google, for example, will only accept metadata from a limited number of trusted providers and then often only for limited use, for example publisher metadata for use in Google Scholar. For the world in general Google won’t display content that is not visible to the user. The strength of the microdata and RDFa approach is that what is marked up for machine consumption can also be visible to the human reader; indeed if it the marked-up content is hidden Google will likely ignore it.

So, is it used? By the big search engines, I mean. Information gleaned from schema.org markup is available in the XML can be retrieved using a Google Custom Search Engine, which allows people to create their own search engines for niche applications, for example jobs for US military veterans. However, it is use on the main search site, which we know is the first stop for people wanting to find information, that would bring about significant benefits in terms the ease and sophistication with which people can search. Well, Google and co. aren’t known for telling the world exactly how they do what they do, but we can point to a couple of developments to which schema.org markup surely contributes.

First, of course, is the embellishment of search engine result pages that the “rich snippets” approach allows: inclusion of information such as author or creator, ratings, price etc., and filtering of results based on these properties. (Rich snippets is Google’s name for the result of marking up HTML with microdata, RDFa etc., which predates and has evolved into the schema.org initiative).

Secondly, there is the Knowledge graph, which while it is known to use FreeBase, and seems to get much of its data from dbpedia, has a “things not strings” approach which resonates very well with the schema.org ontology. So perhaps it is here that we will see the semantic web approach and schema.org begin to bring benefits to the majority of web users.

See also

Webinar: Learning resource metadata for schema.org

Phil Barker — Fri, 13 Jul 2012 10:39:55 +0000

As you may know, I have been involved in the development of the Learning Resource Metadata Initiative‘s extension of schema.org since about this time last year. Things are shaping up well for the inclusion of the LRMI properties in the main schema.org vocabulary, so this seems like a good time(*) to start explaining and promoting them. To that end, we will be running webinar, hosted on JISC’s BlackBoard Collaborate service on Fri 27 July starting at 15:00 UK time, it will run for up to 2 hours.

Update: the webinar happened, you can get the slides that were used from slideshare and you can view a full recording of the webinar (that’s a BlackBoard Collaborate recording, you need Java for it to play).

In this webinar we will explore the background, intent and output of the Learning Resource Metadata Initiative (LRMI). The LRMI has proposed extensions to the schema.org microdata vocabulary with the aim of facilitating the discovery of learning resources through major search engines and other discovery services. We will provide an introduction to schema.org and describe the specific approach taken by LRMI.

My first take at an outline programme is along the lines of:

Outline of schema.org as semantic tagging of HTML content (this isn’t intended to be a tutorial on how to add schema to a web page, but I think it will be useful to make sure everyone starts from the same understanding of schema’s place in the web)
Who is behind schema.org
Their motivation: “improve search services”–what that means
What schema.org (initial release) offers for Learning Resources and what it doesn’t.
Who is behind LRMI
How LRMI worked
Most importantly, what LRMI produced

I am delighted that helping me with this webinar will be two key players in LRMI and schema.org. Dan Brickley, who many of you will know from his years of activity on RDF and the semantic web and who is heavily involved in the outreach, standards and community work around schema.org, and Greg Grossmeier of Creative Commons who is Co-chair of the LRMI technical working group and so has steered us from the collection of user requirement through to the development of new schema.org properties.

The target audience is staff from UK Further and Higher Education with an interest in the dissemination of learning resources (for example Open Educational Resources, OERs) and building services for their discovery, especially those people involved in JISC projects and services. If demand is high priority will be given to this audience.

(* yeah, OK, Friday afternoon at the end of July isn’t really a good time for this, but it ended up as the best time for the people involved given their other constraints….)

Will using schema.org metadata improve my Google rank?

Phil Barker — Fri, 25 May 2012 15:18:07 +0000

It’s a fair question to ask. Schema.org metadata is backed by Google, and has the aim of making it easier for people to find the right web pages, so does using it to describe the content of a page improve the ranking of that page in Google search results? The honest answer is “I don’t know”. The exact details of the algorithm used by Google for search result ranking are their secret; some people claim to have elucidated factors beyond the advice given by Google, but I’m not one of them. Besides, the algorithm appears to be ever changing, so what worked last week might not work next week. What I do know is that Google says:

Google doesn’t use markup for ranking purposes at this time—but rich snippets(*) can make your web pages appear more prominently in search results, so you may see an increase in traffic.

*Rich Snippets is Google’s name for the semantic mark up that it uses, be it microformats, microdata (schema.org) or RDFa.

I see no reason to disbelieve Google on this, so the answer to the question above would seem to be “no”. But how then does using schema.org make it easier for people to find the right web pages? (and let’s assume for now that yours are the right pages). Well, that’s what the second part of the what Google says is about: making pages appear more prominently in search result pages. As far as I can see this can happen in two ways. Try doing a search on Google for potato salad. Chances are you’ll see something a bit like this

Selection from the results page for a Google search for potato salad showing enhanced search options (check boxes for specific ingredients, cooking times, calorific value) and highlighting these values in some of the result snippets.

You see how some of the results are embellished with things like star ratings, or information like cooking time and number of calories–that’s the use of rich snippets to make a page appear more prominent.

But there’s more: the check boxes on the side allow the search results to be refined by facets such as ingredients, cooking time and calorie content. If a searcher uses those check boxes to narrow down their search, then only pages which have the relevant information marked-up using schema.org microdata (or other rich snippet mark-up) will appear in the search results.

So, while it’s a fair question to ask, the question posed here is the wrong question. It would be better to ask “will schema.org metadata help people find my pages using Google”, to which the answer is yes if Google decides to use that mark up to enhance search result pages and/or provide additional search options.

Background
I have been involved in the LRMI (Learning Resource Metadata Initiative), which has proposed extensions to schema.org for describing the educational characteristics of resources–see this post I did for Creative Commons UK for further details. I have promised a more technical briefing of the hows and whys of LRMI/Schema.org to be developed here, but given my speed of writing I wouldn’t hold my breath waiting for it.–In the meantime this is one of several questions I thought might be worth answering. If you can think of any, let me know.

Learning resource metadata initiative

Phil Barker — Thu, 08 Sep 2011 03:40:49 +0000

In the spirit of Godwin’s law, I would propose that

“As any discussion about metadata grows longer the probability of a comparison to Google approaches one.”

Of course the comparison is usually that formal metadata is insignificant for the resource discovery needs of most people when compared to Google.

On one hand this is an over simplification: metadata is important for resource management in general not just for resource discovery, the information contained in metadata can be exposed to Google and other search engines, and it helps resource discovery in other ways, for example in displaying relationships between resources that can be browsed and crawled. It remains, however, true that all the effort that has gone into formalising and standardising metadata schema has had little, if any, direct effect on how people find resources through the search engine of their choice. So it’s interesting that the big search engines are now taking an interest in metadata markup of web pages, first with Google’s rich snippets, and now the more extensive (in a number of ways) schema.org initiative. I guess that this approach (that is, marking up the human readable infomation on a web page to show its relationship to a formal metadata schema as opposed to holding it seperately in a purely machine readable format) appeals to search engines because of their suspicion that any information not visible to the reader of a page (e.g. metadata elements in the HTML head element) might be there purely to spam search engine results.

Of course, my interest through CETIS is in educational metadata, and I have already dabbled in using rich snippets to mark-up a description of an educational resource. So I was extremely interested to hear about the Learning Resource Metadata Initiative headed up by Creative Commons and the Association of Educational Publishers, aiming to apply the schema.org approach to educational resources (schema.org initially, with an RDFa expression planned as a secondary output derived from it). I was extremely pleased to be accepted on to the technical working group to help draw up the details. Tomorrow is the first face to face meeting of that technical group, which is why I writing this on a plane on the way to San Fransisco.

While this will be the first face to face meeting, the technical group has made a start on its work. The previous work in educational metadata has been surveyed; use cases for lrmi have been collected, including those which were submitted for the Dublin Core Education Application Profile; and we’ve had a couple of teleconference meetings. It’s early days, so a lot is still open, but this much I can say (but I say it as an individual, I’m not claiming to be reporting any consensus of the working group). The scope of lrmi is resource discovery, and for me it stands or falls on whether it helps discovery through search engines. With respect to this there does already seem to be some uncertainty (generally) over how search engines will use schema.org and how the governance of the main schema.org vocabulary allows for community-driven additions and usage profiles (there is an upcoming schema.org meeting that might help clarify this). However, I guess that in the end it will come down to Google and others using what they find useful and ignoring what that don’t: which isn’t a bad way of establishing an industry standard in this field (I see parallels with browser developers and HTML5). The use cases gathered include the usual discovery issues, so far I haven’t seen anything unexpected, so hopefully the lrmi output will align with other efforts to meet those same scenarios. There is one slight coda to that though, there is a lot of interest in expressing the usefulness of a resource for specific learning objectives as set out in standard curricula. This is largely with respect to showing the alignment of a resource with US state standard curricula, and the US national core K-12 curriculum. I know very little about the US standard curriculum/a, but I do think it is important that (and believe it would be useful) any approach adopted by lrmi to showing this alignment should be usable more generally for, e.g., the English National Curriculum and possibly for wider competency frameworks as used in UK HE for some disciplines (e.g. medecine, Scottish law, engineering). I should stress that, while the level of interest in this is noteworthy, showing such alignments isn’t new: it’s achievable with the LOM (classification with purpose set to learning objective), Dublin Core has had the conformsTo term for showing alignment to an educational standard for a number of years, and it has been discussed for the conceptual model for ISO MLR part 5.

I’ll report more when I am home from the meeting and will, of course, be happy to feed forward any comments you have, but to be kept up to date on all developments and to have a more direct say join the LRMI discussion group.

RDFa Rich snippets for educational content

Phil Barker — Wed, 30 Mar 2011 15:25:33 +0000

Prompted by a comment from Andy Powell that

It would be interesting to think about how much of required resource description for UKOER can be carried in the RDFa vocabularies currently understood by Google. Probably quite a lot.

I had a look at Googles webmaster advice on Marking up products for rich snippets.

My straw man mapping from the UKOER description requirements to Rich Snippets was:

Mandated Metadata
Programme tag = Brand?
Project tag = Brand?
Title = name
Author / owner / contributor = seller?
Date =
URL = offerURL (but not on OER page itself)
Licence information [Use CC code] price=0

Suggested Metadata
Language =
Subject = category
Keywords = category?
Additonal tags = category?
Comments = a review
Description = description

I put this into a quick example, and you can see what Google makes of it using the rich snippet testing tool. [I’m not sure I’ve got the nesting of a Person as the seller right.]

So, interesting? I’m not sure that this example shows much that is interesting. Trying to shoe-horn information about an OER into a schema that was basically designed for adverts isn’t ideal, but they already done recipes as well, once they’ve got the important stuff like that done they might have a go at educational resources. But it is kind-of interesting that Google are using RDFa; there seems to be a slow increase in the number of tools/sites that are parsing and using RDFa.