Briefing Paper: the Semantic Web, Linked and Open Data

CETIS has published a new Briefing Paper on the Semantic Web, Linked and Open Data. This briefing paper provides a high level overview of key concepts relating to the Semantic Web, semantic technologies, linked and open data; along with references to relevant examples and standards. The briefing is intended to provide a starting point for those within the teaching and learning community who may have come across the concept of semantic technologies and the Semantic Web but who do not regard themselves as experts and wish to learn more. Examples and links are provided as starting points for further exploration. The briefing paper is supplemented by the blog post When is Linked Data not Linked Data? which provides a summary of the debate surrounding the definition and characteristics of Linked Data.

The briefing paper can be downloaded in pdf format here or you can pick up printed copy at a CETIS event near you soon!

….more what you’d call “guidelines”….

Owen Stephen’s has written a helpful post which makes a very useful contribution to the debate regarding the interpretation of Tim Berners Lee’s Linked Data Design Issues. See my earlier post for a summary of the debate. With all these attempts to clarify the ambiguity I couldn’t help being reminded of the infamous Pirate Code from Pirates of the Caribbean:

“And thirdly, the code is more what you’d call “guidelines” than actual rules. Welcome aboard!”

Sorry, couldn’t resist it ;)

When is Linked Data not Linked Data? – A summary of the debate

One of the activities identified during last December’s Semantic Technology Working Group meeting to be taken forward by CETIS was the production of a briefing paper that disambiguated some of the terminology for those that are less familiar with this domain. The following terms in particular were highlighted:

  • Semantic Web
  • semantic technologies
  • Linked Data
  • linked data
  • linkable data
  • Open Data

I’ve finally started drafting this briefing paper and unsurprisingly defining the above terms is proving to be a non-trivial task! Pinning down agreed definitions for Linked Data, linked data and linkable data is particularly problematic. And I’m not the only one having trouble. If you look up Semantic Web and Linked Data / linked data on wikipedia you will find entries flagged as having multiple issues. It does rather feel like we’re edging close to holy war territory here. But having said that I do enjoy a good holy war as long as I’m watching safely from the sidelines.

So what’s it all about? As far as I can make out much of the debate boils down to whether Linked Data must adhere to the four principles outlined in Tim Berners Lee’s Linked Data Design Issues, and in particular whether use of RDF and SPARQL is mandatory. Some argue that RDF is integral to Linked Data, other suggest that while it may be desirable, use of RDF is optional rather than mandatory. Some reserve the capitalized term Linked Data for data that is based on RDF and SPARQL, preferring lower case “linked data”, or “linkable data”, for data that uses other technologies.

The fact that the Linked Data Design Issues paper is a personal note by Tim Berners Lee, and is not formally endorsed by W3C also contributes to the ambiguity. The note states:

  1. Use URIs as names for things
  2. Use HTTP URIs so that people can look up those names.
  3. When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL)
  4. Include links to other URIs. so that they can discover more things.

I’ll refer to the steps above as rules, but they are expectations of behaviour. Breaking them does not destroy anything, but misses an opportunity to make data interconnected. This in turn limits the ways it can later be reused in unexpected ways. It is the unexpected re-use of information which is the value added by the web. (Berners Lee, http://www.w3.org/DesignIssues/LinkedData.html)

In the course of trying to untangle some of the arguments both for and against the necessity of using RDF and SPARQL I’ve read a lot of very thoughtful blog posts which it may be useful to link to here for future reference. Clearly these are not the only, or indeed the most recent, posts that discuss this most topical of topics, these happen to be the ones I have read and which I believe present a balanced over view of the debate in such a way as to be of relevance to the JISC CETIS community.

Linked data vs. Web of data vs. …
– Andy Powell, Eduserv, July 2009

The first useful post I read on this particular aspect of the debate is Andy Powell’s from July 2009. This post resulted from the following question Andy raised on twitter;

is there an agreed name for an approach that adopts the 4 principles of #linkeddata minus the phrase, “using the standards (RDF, SPARQL)” ??

Andy was of the opinion that Linked Data “implies use of the RDF model – full stop” adding:

“it’s too late to re-appropriate the “Linked Data” label to mean anything other than “use http URIs and the RDF model”.”

However he is unable to provide a satisfactory answer to his own question, i.e. what do you call linked data that does not use the RDF model, and despite exploring alternative models he concludes by professing himself to be worried about this.

Andy returned to this theme in a more recent post in January 2010, Readability and linkability which ponders the relative emphasis given to readability and linkability by initiatives such as the JISC Information Environment. Andy’s general principles have not changed but he presents term machine readable data (MRD) as a potential answer to the question he originally asked in his earlier post.

Does Linked Data need RDF?
– Paul Miller, The Cloud of Data, July 2009

Paul Miller’s post is partially a response to Andy’s query. Paul begins by noting that while RDF is key to the Semantic Web and

“an obvious means of publishing — and consuming — Linked Data powerfully, flexibly, and interoperably.”

he is uneasy about conflating RDF with Linked Data and with assertions that

“‘Linked Data’ can only be Linked Data if expressed in RDF.”

Paul discusses the wording an status of Tim Berners Lee’s Linked Data Design Issues and suggest that it can be read either way. He then goes on to argue that by elevating RDF from the best mechanism for achieving Linked Data to the only permissible approach we risk barring a large group

“with data to share, a willingness to learn, and an enthusiasm to engage.”

Paul concludes by asking the question:

“What are we after? More Linked Data, or more RDF? I sincerely hope it’s the former.”

No data here – just Linked Concepts and Linked, open, semantic?
– Paul Walk, UKOLN, July & November 2009

Paul Walk has published two useful posts on this topic; the first summarising and commenting on the debate sparked by the two posts above, and the second following the Giant Global Graph session at the CETIS 2009 Conference. This latter post presents a very useful attempt at disambiguating the terms Open data , Linked Data and Semantic Web. Paul also tries to untangle the relationship between these three memes and helpfully notes:

  • data can be open, while not being linked
  • data can be linked, while not being open
  • data which is both open and linked is increasingly viable
  • the Semantic Web can only function with data which is both open and linked

So What Is It About Linked Data that Makes it Linked Data™?
– Tony Hirst, Open University, March 2010

Much more recently Tony Hirst published this post which begins with a version of the four Linked Data principles cut from wikipedia. This particular version makes no mention of either RDF or SPARQL. Tony goes on to present a very neat example of data linked using HTTP URI and Yahoo Pipes and asks

“So, the starter for ten: do we have an example of Linked Data™ here?”

Tony broadly believes the answer is yes and is of a similar opinion to Paul Miller that too rigid adherence to RDF and SPARQL

“will put a lot of folk who are really excited about the idea of trying to build services across distributed (linkable) datasets off…”

Perhaps more controversially Tony questions the necessity of universal unique URIs that resolve to content suggesting that:

“local identifiers can fulfil the same role if you can guarantee the context as in a Yahoo Pipe or a spreadsheet”

Tony signs off with:

“My name’s Tony Hirst, I like linking things together, but RDF and SPARQL just don’t cut it for me…”

Meshing up a JISC e-learning project timeline, or: It’s Linked Data on the Web, stupid
– Wilbert Kraan, JISC CETIS, March 2009

Back here at CETIS Wilbert Kraan has been experimenting with linked data meshups of JISC project data held in our PROD system. In contrast to the approach taken by Tony, Wilbert goes down the RDF and SPARQL route. Wilbert confesses that he originally believed that:

“SPARQL endpoints were these magic oracles that we could ask anything about anything.”

However his attempts to mesh up real data sets on the web highlighted the fact that SPARQL has no federated search facility.

“And that the most obvious way of querying across more than one dataset – pulling in datasets from outside via SPARQL’s FROM – is not allowed by many SPARQL endpoints. And that if they do allow FROM, they frequently cr*p out.”

Wilbert concludes that:

“The consequence is that exposing a data set as Linked Data is not so much a matter of installing a SPARQL endpoint, but of serving sensibly factored datasets in RDF with cool URLs, as outlined in Designing URI Sets for the UK Public Sector (pdf).”

And in response to a direct query regarding the necessity of RDF and SPARQL to Linked Data Wilbert answered

“SPARQL and RDF are a sine qua non of Linked Data, IMHO. You can keep the label, widen the definition out, and include other things, but then I’d have to find another label for what I’m interested in here.”

Which kind of brings us right back to the question that Andy Powell asked in July 2009!

So there you have it. A fascinating but currently inconclusive debate I believe. Apologies for the length of this post. Hopefully one day this will go on to accompany our “Semantic Web and Linked Data” briefing paper.

Semantic technologies: which way now?

Cast your mind back to the CETIS Conference 2007 and you may remember a session on Semantic Technologies for Teaching and Learning. This session sought to introduce current developments in semantic technologies, explore their potential application to the domain of teaching and learning and facilitate discussion between these two apparently disparate communities. The case for the relevance and potential of semantic technologies was ably presented by a range of international experts through a series of short position papers which formed the basis for a wide ranging discussion. Following this discussion there seemed to be general consensus that it would be valuable for JISC to facilitate further exploration of the affordances of semantic technologies to the domain of education.

JISC responded to this requirement by issuing an ITT for a scoping study to:

“…investigate how applications which use semantic technologies can add value to learning and teaching.”

This study was awarded to the SemTech Project at the University of Southampton and at the same time CETIS established the Semantic Technology Working Group. The remit of this group was firstly to act as an expert working group for the SemTech Project, and secondly to develop recommendations for potential future work based on the outputs of the project.

The SemTech project successfully concluded in July 2009 having undertaken an extensive survey of semantic technologies relevant to learning and teaching and an investigation of the use and uptake of related tools and services by UK HE institutions. In addition to producing a comprehensive report the SemTech Project has also drafted a roadmap for semantic technology adoption by the UK F/HE community.

Semantic technologies appeared again at this year’s CETIS Conference, this time in the guise of linked data which was discussed in both the Find and Seek and Giant Global Graph sessions. The latter session has already generated a number of blog posts by Adam Cooper, Paul Walk and Andy Powell.

In order to disseminate and discuss the SemTech roadmap, the outputs of the CETIS conference and potential future activities in the area of semantic technologies for teaching and learning CETIS are holding a public meeting of the Semantic Technologies Working Group on the 10th of December at the University of Strathclyde. This meeting will:

  • Review the outputs of the SemTech project.
  • Consider the roadmap and recommendations to JISC.
  • Respond to these recommendations and explore future directions.
  • Investigate ways that CETIS can raise awareness of the potential affordances of semantic technologies to the teaching and learning sector.
  • Discuss future activities in this areas that CETIS could potentially engage in.

The meeting is open to all those with an interest in semantic technologies and their potential application to the domain of teaching and learning. We will be actively seeking comments and feedback from the community and would encourage colleagues to join the discussion.

To register for this meeting and for further information please visit the CETIS events page.