Back to the Future – revisiting the CETIS codebashes

As a result of a request from the Cabinet Office to contribute to a paper on the use of hackdays during the procurement process, CETIS have been revisiting the “Codebash” events that we ran between 2002 and 2007. The codebashes were a series of developer events that focused on testing the practical interoperability of implementations of a wide range of content specifications current at the time, including IMS Content Packaging, Question and Test Interoperability, Simple Sequencing (I’d forgotten that even existed!), Learning Design and Learning Resource Meta-data, IEEE LOM, Dublin Core Metadata and ADL SCORM. The term “codebash” was coined to distinguish the CETIS events from the ADL Plugfests, which tested the interoperability and conformance of SCORM implementations. Over a five year period CETIS ran four content codebashes that attracted participants from 45 companies and 8 countries. In addition to the content codebashes, CETIS also additional events focused on individual specifications such as IMS QTI, or the outputs puts of specific JISC programmes such as the Designbashes and Widgetbash facilitated by Sheila MacNeill. As there was considerable interest in the codebashes and we were frequently asked for guidance on running events of this kind, I wrote and circulated a Codebash Facilitation document. It’s years since I’ve revisited this document, but I looked it out for Scott Wilson a couple of weeks ago as potential input for the Cabinet Office paper he was in the process of drafting together with a group of independents consultants. The resulting paper Hackdays – Levelling the Playing Field can be read and downloaded here.

The CETIS codebashes have been rather eclipsed by hackdays and connectathons in recent years, however it appears that these very practical, focused events still have something to offer the community so I thought it might be worth summarising the Codebash Facilitation document here.

Codebash Aims and Objectives

The primary aim of CETIS codebashes was to test the functional interoperability of systems and applications that implemented open learning technology interoperability standards, specifications and application profiles. In reality that meant bringing together the developers of systems and applications to test whether it was possible to exchange content and data between their products.

A secondary objective of the codebashes was to identify problems, inconsistencies and ambiguities in published standards and specifications. These were then fed back to the appropriate maintenance body in order that they could be rectified in subsequent releases of the standard or specification. In this way codebashes offered developers a channel through which they could contribute to the specification development process.

A tertiary aim of these events was to identify and share common practice in the implementation of standards and specifications and to foster communities of practice where developers could discuss how and why they had taken specific implementation decisions. A subsidiary benefit of the codebashes was that they acted as useful networking events for technical developers from a wide range of backgrounds.

The CETIS codebashes were promoted as closed technical interoperability testing events, though every effort was made to accommodate all developers who wished to participate. The events were aimed specifically at technical developers and we tried to discourage companies from sending marketing or sales representatives, though I should add that we were not always scucessful! Managers who played a strategic role in overseeing the development and implementation of systems and specifications were encouraged to participate however.

Capturing the Evidence

Capturing evidence of interoperability during early codebashes proved to be extremely difficult so Wilbert Kraan developed a dedicated website built on a Zope application server to facilitate the recording process. Participants were able to register the tools applications that they were testing and to upload content or data generated by these application. Other participants could then take this content test it in their own applications, allowing “daisy chains” of interoperability to be recorded. In addition, developers had the option of making their contributions openly available to the general public or visible only to other codebash participants. All participants were encouraged to register their applications prior to the event and to identify specific bugs and issues that they hoped to address. Developers who could not attend in person were able to participate remotely via the codebash website.

IPR, Copyright and Dissemination

The IPR and copyright of all resources produced during the CETIS codebashes remained with the original authors, and developers were neither required nor expected to expose the source code of their tools and applications to other participants.

Although CETIS disseminated the outputs of all the codebashes, and identified all those that had taken part, the specific performance of individual participants was never revealed. Bug reports and technical issues were fed back to relevant standards and specifications bodies and a general overview on the levels of interoperability achieved was disseminated to the developer community. All participants were free to publishing their own reports on the codebashes, however they were strongly discouraged from publicising the performance of other vendors and potential competitors. At the time, we did not require participants to sign non-disclosure agreements, and relied entirely on developers’ sense of fair play not to reveal their competitors performance. Thankfully no problems arose in this regard, although one or two of the bigger commercial VLE developers were very protective of their code.

Conformance and Interoperability

It’s important to note that the aim of the CETIS codebashes was to facilitate increased interoperability across the developer community, rather than to evaluate implementations or test conformance. Conformance testing can be difficult and costly to facilitate and govern and does not necessarily guarantee interoperability, particularly if applications implement different profiles of a specification or standard. Events that enable developers to establish and demonstrate practical interoperability are arguably of considerably greater value to the community.

Although CETIS codebashes had a very technical focus they were facilitated as social events and this social interaction proved to be a crucial component in encouraging participants to work closely together to achieve interoperability.

Legacy

These days the value of technical developer events in the domain of education is well established, and a wide range of specialist events have emerged as a result. Some are general in focus such as the hugely successful DevCSI hackdays, others are more specific such as the CETIS Widgetbash, the CETIS / DecCSI OER Hackday and the EDINA Wills World Hack running this week which aims to build a Shakespeare Registry of metadata of digital resources relating to Shakespeare covering anything from its work and live to modern performance, interpretation or geographical and historical contextual information. At the time however, aside from the ADL Plugfests, the CETIS codebashes were unique in offering technical developers an informal forum to test the interoperability of their tools and applications and I think it’s fair to say that they had a positive impact not just on developers and vendors but also on the specification development process and the education technology community more widely.

Links

Facilitating CETIS CodeBashes paper
Codebash 1-3 Reports, 2002 – 2005
Codebash 4, 2007
Codebash 4 blog post, 2007
Designbash, 2009
Designbash, 2010
Designbash, 2011
Widgetbash, 2011
OER Hackday, 2011
QTI Bash, 2012
Dev8eD Hackday, 2012

CETIS OER Visualisation Project

As part of our work in the areas of open educational resources and data analysis CETIS are undertaking a new project to visualise the outputs of the JISC / HEA Open Educational Resource Programmes and we are very lucky to have recruited data wrangler extraordinaire Martin Hawksey to undertake this work. Martin’s job will be to firstly develop examples and workflows for visualising OER project data stored in the JISC CETIS PROD database, and secondly to produce visualisations around OER content and collections produced by the JISC / HEA programmes. Oh, and he’s only got 40 days to do it! You can read Martin’s thoughts on the task ahead over at his own blog MASHe:

40 days to let you see the impact of the OER Programme #ukoer

PROD Data Analysis

A core aspect of CETIS support for the OER Phase 1 and 2 Programmes has been the technical analysis of tools and systems used by the projects. The primary data collection tool used for this purpose is the PROD database. An initial synthesis of this data has already been completed by R. John Robertson, however there is potential for further analysis to uncover potentially richer information sets around the technologies used to create and share OERs.
This part of the project will aim to deliver:

  • Examples of enhanced data visualisations from OER Phase 1 and 2.
  • Recommendations on use and applicability of visualisation libraries with PROD data to enhance the existing OER dataset.
  • Recommendations and example workflows including sample data base queries used to create the enhanced visualisations.

And we also hope this work will uncover some general issues including:

  • Issues around potential workflows for mirroring data from our PROD database and linking it to other datasets in our Kasabi triple store.
  • Identification of other datasets that would enhance PROD queries, and some exploration of how transform and upload them.
  • General recommendations on wider issues of data, and observed data maintenance issues within PROD.

Visualising OER Content Outputs

The first two phases of the OER Programme produced a significant volume of content, however the programme requirements were deliberately agnostic about where that content should be stored, aside from a requirement to deposit or reference it in Jorum. This has enabled a range of authentic practices to surface regarding the management and hosting of open educational content; but it also means that there is no central directory of UKOER content, and no quick way to visualise the programme outputs. For example, the content in Jorum varies from a single record for a whole collection, to a record per item. Jorum is working on improved ways to surface content and JISC has funded the creation of a prototype UKOER showcase, in the meantime though it would be useful to be able to visualise the outputs of the Programmes in a compelling way. For example:

  • Collections mapped by geographical location of the host institution.
  • Collections mapped by subject focus.
  • Visualisations of the volume of collections.

We realise that the data that can be surfaced in such a limited period will be incomplete, and that as a result these visualisations will not be comprehensive, however we hope that the project will be able to produce compelling attractive images that can be used to represent the work of the programme.

The deliverables of this part of the project will be:

  • Blog posts on the experience of capturing and using the data.
  • A set of static or dynamic images that can be viewed without specialist software, with the raw data also available.
  • Documentation/recipes on the visualisations produced.
  • Recommendations to JISC and JISC CETIS on visualising content outputs.

When is Linked Data not Linked Data? – A summary of the debate

One of the activities identified during last December’s Semantic Technology Working Group meeting to be taken forward by CETIS was the production of a briefing paper that disambiguated some of the terminology for those that are less familiar with this domain. The following terms in particular were highlighted:

  • Semantic Web
  • semantic technologies
  • Linked Data
  • linked data
  • linkable data
  • Open Data

I’ve finally started drafting this briefing paper and unsurprisingly defining the above terms is proving to be a non-trivial task! Pinning down agreed definitions for Linked Data, linked data and linkable data is particularly problematic. And I’m not the only one having trouble. If you look up Semantic Web and Linked Data / linked data on wikipedia you will find entries flagged as having multiple issues. It does rather feel like we’re edging close to holy war territory here. But having said that I do enjoy a good holy war as long as I’m watching safely from the sidelines.

So what’s it all about? As far as I can make out much of the debate boils down to whether Linked Data must adhere to the four principles outlined in Tim Berners Lee’s Linked Data Design Issues, and in particular whether use of RDF and SPARQL is mandatory. Some argue that RDF is integral to Linked Data, other suggest that while it may be desirable, use of RDF is optional rather than mandatory. Some reserve the capitalized term Linked Data for data that is based on RDF and SPARQL, preferring lower case “linked data”, or “linkable data”, for data that uses other technologies.

The fact that the Linked Data Design Issues paper is a personal note by Tim Berners Lee, and is not formally endorsed by W3C also contributes to the ambiguity. The note states:

  1. Use URIs as names for things
  2. Use HTTP URIs so that people can look up those names.
  3. When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL)
  4. Include links to other URIs. so that they can discover more things.

I’ll refer to the steps above as rules, but they are expectations of behaviour. Breaking them does not destroy anything, but misses an opportunity to make data interconnected. This in turn limits the ways it can later be reused in unexpected ways. It is the unexpected re-use of information which is the value added by the web. (Berners Lee, http://www.w3.org/DesignIssues/LinkedData.html)

In the course of trying to untangle some of the arguments both for and against the necessity of using RDF and SPARQL I’ve read a lot of very thoughtful blog posts which it may be useful to link to here for future reference. Clearly these are not the only, or indeed the most recent, posts that discuss this most topical of topics, these happen to be the ones I have read and which I believe present a balanced over view of the debate in such a way as to be of relevance to the JISC CETIS community.

Linked data vs. Web of data vs. …
– Andy Powell, Eduserv, July 2009

The first useful post I read on this particular aspect of the debate is Andy Powell’s from July 2009. This post resulted from the following question Andy raised on twitter;

is there an agreed name for an approach that adopts the 4 principles of #linkeddata minus the phrase, “using the standards (RDF, SPARQL)” ??

Andy was of the opinion that Linked Data “implies use of the RDF model – full stop” adding:

“it’s too late to re-appropriate the “Linked Data” label to mean anything other than “use http URIs and the RDF model”.”

However he is unable to provide a satisfactory answer to his own question, i.e. what do you call linked data that does not use the RDF model, and despite exploring alternative models he concludes by professing himself to be worried about this.

Andy returned to this theme in a more recent post in January 2010, Readability and linkability which ponders the relative emphasis given to readability and linkability by initiatives such as the JISC Information Environment. Andy’s general principles have not changed but he presents term machine readable data (MRD) as a potential answer to the question he originally asked in his earlier post.

Does Linked Data need RDF?
– Paul Miller, The Cloud of Data, July 2009

Paul Miller’s post is partially a response to Andy’s query. Paul begins by noting that while RDF is key to the Semantic Web and

“an obvious means of publishing — and consuming — Linked Data powerfully, flexibly, and interoperably.”

he is uneasy about conflating RDF with Linked Data and with assertions that

“‘Linked Data’ can only be Linked Data if expressed in RDF.”

Paul discusses the wording an status of Tim Berners Lee’s Linked Data Design Issues and suggest that it can be read either way. He then goes on to argue that by elevating RDF from the best mechanism for achieving Linked Data to the only permissible approach we risk barring a large group

“with data to share, a willingness to learn, and an enthusiasm to engage.”

Paul concludes by asking the question:

“What are we after? More Linked Data, or more RDF? I sincerely hope it’s the former.”

No data here – just Linked Concepts and Linked, open, semantic?
– Paul Walk, UKOLN, July & November 2009

Paul Walk has published two useful posts on this topic; the first summarising and commenting on the debate sparked by the two posts above, and the second following the Giant Global Graph session at the CETIS 2009 Conference. This latter post presents a very useful attempt at disambiguating the terms Open data , Linked Data and Semantic Web. Paul also tries to untangle the relationship between these three memes and helpfully notes:

  • data can be open, while not being linked
  • data can be linked, while not being open
  • data which is both open and linked is increasingly viable
  • the Semantic Web can only function with data which is both open and linked

So What Is It About Linked Data that Makes it Linked Data™?
– Tony Hirst, Open University, March 2010

Much more recently Tony Hirst published this post which begins with a version of the four Linked Data principles cut from wikipedia. This particular version makes no mention of either RDF or SPARQL. Tony goes on to present a very neat example of data linked using HTTP URI and Yahoo Pipes and asks

“So, the starter for ten: do we have an example of Linked Data™ here?”

Tony broadly believes the answer is yes and is of a similar opinion to Paul Miller that too rigid adherence to RDF and SPARQL

“will put a lot of folk who are really excited about the idea of trying to build services across distributed (linkable) datasets off…”

Perhaps more controversially Tony questions the necessity of universal unique URIs that resolve to content suggesting that:

“local identifiers can fulfil the same role if you can guarantee the context as in a Yahoo Pipe or a spreadsheet”

Tony signs off with:

“My name’s Tony Hirst, I like linking things together, but RDF and SPARQL just don’t cut it for me…”

Meshing up a JISC e-learning project timeline, or: It’s Linked Data on the Web, stupid
– Wilbert Kraan, JISC CETIS, March 2009

Back here at CETIS Wilbert Kraan has been experimenting with linked data meshups of JISC project data held in our PROD system. In contrast to the approach taken by Tony, Wilbert goes down the RDF and SPARQL route. Wilbert confesses that he originally believed that:

“SPARQL endpoints were these magic oracles that we could ask anything about anything.”

However his attempts to mesh up real data sets on the web highlighted the fact that SPARQL has no federated search facility.

“And that the most obvious way of querying across more than one dataset – pulling in datasets from outside via SPARQL’s FROM – is not allowed by many SPARQL endpoints. And that if they do allow FROM, they frequently cr*p out.”

Wilbert concludes that:

“The consequence is that exposing a data set as Linked Data is not so much a matter of installing a SPARQL endpoint, but of serving sensibly factored datasets in RDF with cool URLs, as outlined in Designing URI Sets for the UK Public Sector (pdf).”

And in response to a direct query regarding the necessity of RDF and SPARQL to Linked Data Wilbert answered

“SPARQL and RDF are a sine qua non of Linked Data, IMHO. You can keep the label, widen the definition out, and include other things, but then I’d have to find another label for what I’m interested in here.”

Which kind of brings us right back to the question that Andy Powell asked in July 2009!

So there you have it. A fascinating but currently inconclusive debate I believe. Apologies for the length of this post. Hopefully one day this will go on to accompany our “Semantic Web and Linked Data” briefing paper.