Sources of metadata

Posted on May 22, 2007 by johnr

Lorcan Dempsey has an interesting post about sources of metadata. http://orweblog.oclc.org/archives/001351.html

He gives some brief thoughts that attempt to tease out the different sources of metadata that are commonly in use today, suggesting four key sources: “professional, contributed, programmatically promoted, and intentional”.

Related posts here: Automatic metadata generation

StandardConnection – an interesting NSDL project

Posted on January 26, 2007 by johnr

StandardConnection:Mapping NSDL Educational Objects to Content Standards.
PHASE II: An Achievement Standards Network Architecture
Standard Connection is a NSDL service project.

http://www.thegateway.org/asn
http://projects.ischool.washington.edu/sasutton/NSDL/StandardConnection/
The short version:

Following on from a project examining automatically matching learning resource descriptions with curriculum components, StandardConnection Phase2 is developing a machine addressable registry and repository of content standards and tools to use it. A content standard may perhaps be equivalent to a national curriculum programme of study in the UK. For example, in fifth year Jimmy will study algebra, one of the modules will be on quadratic equations.

Among other things the tool would allow a user to compare and match syllabi in different school districts or for different exam boards.

The longer version:

During a trip to Seattle I had the opportunity to have a chat with Stuart about some of the projects he’s working on. In the course of our discussion he talked a lot about this current project and noted that, irrespective of all the interesting and useful projects he has worked on, this project will probably be the one that has the biggest impact in the real world. If the automatic tagging works it will certainly be great, but in my opinion what it has already done is quite significant.

The project aims to “define and create:

A machine-addressable repository of state and national organization academic content standards;
A machine-addressable registry of alignments that relate or align those standards to each other;
A cataloging specification using internationally accepted technical standards for metadata tagging for describing and locating NSDL and other resources according to state and national organization academic content standards: and
A set of protocols, schemas, and program interfaces that will enable these services to communicate and resolve queries against the repositories.”

As interesting as the investigation of automatically assigned metadata in phase 1 may have been, the really cool part about this new phase is that they have put together a system that supports machine addressable mappings between different content standards – and they have some 450 such standards already stored.

This supports users if they want to compare and match standards – to translate the example into British terms: how does Kelly’s Higher compare to the first year of Mike’s A-level? And in the longer term this means that teachers dealing with young people switching between schools which follow syllabi from different different courses of study can gain an overview of what they should have studied up to that point.

A real world use Stuart cited related to a difference between a regional and federal standards for a certain grade. In the US, most schools are certified by the state and their syllabi (standards) set by state or district. If they want to receive federal funding however, the pupils progress has to be assessed at different grades (say in their 4th and 8th year) by standardised testing against a federal standard. Looking at one subject area, schools in one district where consistently performing very well against the state standard but poorly against the federal one. The registry supported a detailed comparison of the standards and demonstrated that the teaching was fine – the standards were just out of sync. If the assessment had been held at the end of the 5th year the pupils would have covered the same ground in both standards – they just covered material in a different order. The level of detailed comparison required to carry this out was made significantly easier by the registry.

In a highly mobile society this sort of development should make it much easier for pupils moving between school systems and those trying to teach them.

As an aside: Stuart’s also working on NSDL Registry: Supporting Interoperable Metadata Distribution.

Automatic Metadata Generation

Posted on November 8, 2006 by johnr

Discussions about automatic metadata generation seem to conflate a number of different processes. As I see it there are a number of separate but related areas of interest in the area of â€˜automaticâ€™ metadata generation; these are:

The automatic population of record values based on collection-level profiles
The automatic population of record values based on user profiles
The automatic population of record values derived from other created metadata values
The automatic generation of metadata from existing structured information about the asset
The automatic generation of metadata from the use of the asset itself
The automatic generation of metadata from the content of the asset itself

The automatic population of record values based on collection-level profiles (bulk transforms)

By this I mean the addition to records of values which are fixed attributes of the collection they are part of. The collection could be the complete set or records in the repository or any meaningful or useful subset. This automatic population of values could part of a bulk import process or part of a record export process (singly or in bulk). Such profiles are particularly important when considering the movement and interoperability of metadata â€“ some metadata may be unnecessary in a local context but vital in a wider context.

For example, the universityâ€™s name could be added to the publisher details when a set of metadata about institutional minutes is exposed for harvest. Another example would be the automatic addition of educational level information to a set of learning objects designed by high school teachers for local use but which need to be tagged at that level as their metadata is available in aggregator services.

The automatic population of record values based on user or object profiles

This process uses a user or object template to provide automatic or default (i.e. editable) values to a metadata record. It is most likely to be part of the record creation process.

Arguably this is very similar to 1. Iâ€™ve separated them to try to draw out the distinction between what is a collection-based transformation and what is essentially a stored template.

For example, the system automatically adds the authorâ€™s details and institutional affiliation to a record when the user starts to create it. Or for example the system has a departmental profile for adding a thesis to the system which provides standard information about the department and the university.

The automatic population of record values derived from other created metadata values

This involves the direct derivation of metadata values from user created values. This process may involve the use of external services. This is distinct from a profile in that a look up / or web service action is required based on data entry.

For example, creating a metadata entry for (or link to) representation information based on the supplied file type or including the authorâ€™s email address based on their name.

The automatic generation of metadata from existing structured information about the asset

This involves utilizing the supplied metadata or other structured information with the asset to attempt to generate further metadata about an asset.

Where I feel moves into generation (3 still population as itâ€™s a direct inference; 4 involves a step beyond this)

For example, mapping a paper to a subject area based on what a system can infer from publicly available information about an author, (e.g. where they work, the courses they are associated with, the research funding theyâ€™ve obtained, previous items theyâ€™ve submitted to the system). Another case would be the assignation of an audience or educational level to a learning object based on the same available information.

The automatic generation of metadata from the use of the asset itself

This covers a spectrum of techniques based on the use of the asset. Common approaches range from deriving an inference of quality from usage statistics, citations, or links to inferring metadata more directly from the analysis of user reviews and annotations, feedback, or tagging patterns.

For example, Google, Amazon recommender system, citation metrics (not entirely sure Iâ€™d put this here),

This is arguably not formal metadata (though the LOM allows for user annotations to be formally included), but it increasingly plays a role in the management and discovery of assets and is likely to continue to do so.

The automatic generation of metadata from the content of the asset itself

The Holy Grail of automatic metadata generation is often regarded as the ability to remove the need for manual metadata creation. Whether or not this is always desirable, there is a greater certainty that it should be possible to derive more information from the asset itself â€“ especially when the asset is text-based. This process can involve trying to identify structured information contained with the asset, or more basically it can attempt to derive structured information from the entirety of the asset.

For example, tools which mine pdfs or scrape web pages in an attempt to identify the title, author, and references contained in a paper, or tools which count the words in document to try to match it to a subject term.

John Robertson

Cetis Blogs

Category Archives: Metadata

Sources of metadata

StandardConnection – an interesting NSDL project

Automatic Metadata Generation