Question and Test tools demonstrate interoperability

As the QTI 2.1 specification gets ready for final release, and new communities start picking it up, conforming tools demonstrated their interoperability at the JISC – CETIS 2012 conference.

The latest version of the world’s only open computer aided assessment interoperability specification, IMS’ QTI 2.1, has been in public beta for some time. That was time well spent, because it allowed groups from across at least eight nations across four continents to apply it to their assessment tools and practices, surface shortcomings with the spec, and fix them.

Nine of these groups came together at the JISC – CETIS conference in Nottingham this year to test a range of QTI packages with their tools, ranging from the very simple to the increasingly specialised. In the event, only three interoperability bugs were uncovered in the tools, and those are being vigorously stamped on right now.

Where it gets more complex is who supports what part of the specification. The simplest profile, provisionally called CC QTI, was supported by all players and some editors in the Nottingham bash. Beyond that, it’s a matter of particular communities matching their needs to particular features of the specification.

In the US, the Accessible Portable Item Profile (APIP) group brings together a group of major test and tool vendors, that are building a profile for summative testing in schools. Their major requirement is the ability to finely adjust the presentation of questions to learners with diverse needs, which is why they have accomplished by building an extension to QTI 2.1. The material also works in QTI tools that haven’t been built explicitly for APIP yet.

A similar group has sprung up in the Netherlands, where the goal is to define all computer aided high stakes school testing in the country in QTI 2.1 That means that a fairly large infrastructure of authoring tools and players is being built at the moment. Since the testing material covers so many subjects and levels, there will be a series of profiles to cover them all.

An informal effort has also sprung up to define a numerate profile for higher education, that may yet be formalised. In practice, it already works in the tools made by the French MOCAH project, and the JISC Assessment and Feedback sponsored QTI-DI and Uniqurate projects.

For the rest of us, it’s likely that IMS will publish something very like the already proven CC QTI as the common core profile that comes with the specification.

More details about the tools that were demonstrated are available at the JISC – CETIS conference pages.

Approaches to building interoperability and their pros and cons

System A needs to talk to System B. Standards are the ideal to achieve that, but pragmatics often dictate otherwise. Let’s have a look at what approaches there are, and their pros and cons.

When I looked at the general area of interoperability a while ago, I observed that useful technology becomes ubiquitous and predictable enough over time for the interoperability problem to go away. The route to get to such commodification is largely down to which party – vendors, customers, domain representatives – is most powerful and what their interests are. Which describes the process very nicely, but doesn’t help solve the problem of connecting stuff now.

So I thought I’d try to list what the choices are, and what their main pros and cons are:

A priori, global
Also known as de jure standardisation. Experts, user representatives and possibly vendor representatives get together to codify whole or part of a service interface between systems that are emerging or don’t exist yet; it can concern either the syntax, semantics or transport of data. Intended to facilitate the building of innovative systems.
Pros:

  • Has the potential to save a lot of money and time in systems development
  • Facilitates easy, cheap integration
  • Facilitates structured management of network over time

Cons:

  • Viability depends on the business model of all relevant vendors
  • Fairly unlikely to fit either actually available data or integration needs very well

A priori, local
i.e. some type of Service Oriented Architecture (SOA). Local experts design an architecture that codifies syntax, semantics and operations into services. Usually built into agents that connect to each other via an ESB.
Pros:

  • Can be tuned for locally available data and to meet local needs
  • Facilitates structured management of network over time
  • Speeds up changes in the network (relative to ad hoc, local)

Cons:

  • Requires major and continuous governance effort
  • Requires upfront investment
  • Integration of a new system still takes time and effort

Ad hoc, local
Custom integration of whatever is on an institution’s network by the institution’s experts in order to solve a pressing problem. Usually built on top of existing systems using whichever technology is to hand.
Pros:

  • Solves the problem of the problem owner fastest in the here and now.
  • Results accurately reflect the data that is actually there, and the solutions that are really needed

Cons:

  • Non-transferable beyond local network
  • Needs to be redone every time something changes on the local network (considerable friction and cost for new integrations)
  • Can create hard to manage complexity

Ad hoc, global
Custom integration between two separate systems, done by one or both vendors. Usually built as a separate feature or piece of software on top of an existing system.
Pros:

  • Fast point-to-point integration
  • Reasonable to expect upgrades for future changes

Cons:

  • Depends on business relations between vendors
  • Increases vendor lock-in
  • Can create hard to manage complexity locally
  • May not meet all needs, particularly cross-system BI

Post hoc, global
Also known as standardisation, consortium style. Service provider and consumer vendors get together to codify a whole service interface between existing systems; syntax, semantics, transport. The resulting specs usually get built into systems.
Pros:

  • Facilitates easy, cheap integration
  • Facilitates structured management of network over time

Cons:

  • Takes a long time to start, and is slow to adapt
  • Depends on business model of all relevant vendors
  • Liable to fit either available data or integration needs poorly

Clearly, no approach offers instant nirvana, but it does make me wonder whether there are ways of combining approaches such that we can connect short term gain with long term goals. I suspect if we could close-couple what we learn from ad hoc, local integration solutions to the design of post-hoc, global solutions, we could improve both approaches.

Let me know if I missed anything!

PROD; a practical case for Linked Data

Georgi wanted to know what problem Linked Data solves. Mapman wanted a list of all UK universities and colleges with postcodes. David needed a map of JISC Flexible Service Delivery projects that use Archimate. David Sherlock and I got mashing.

Linked Data, because of its association with Linked Open Data, is often presented as an altruistic activity, all about opening up public data and making it re-usable for the benefit of mankind, or at least the tax-payers who facilitated its creation. Those are a very valid reasons, but they tend to obscure the fact that there are some sound selfish reasons for getting into the web of data as well.

In our case, we have a database of JISC projects and their doings called PROD. It focusses on what JISC-CETIS focusses on: what technologies have been used by these projects, and what for. We also have some information on who was involved with the projects, and were they worked, but it doesn’t go much beyond bare names.

In practice, many interesting questions require more information than that. David’s need to present a map of JISC Flexible Service Delivery projects that use Archimate is one of those.

This presents us with a dilemma: we can either keep adding more info to PROD, make ad-hoc mash-ups, or play in the Linked Data area.

The trouble with adding more data is that there is an unending amount of interesting data that we could add, if we had infinite resources to collect and maintain it. Which we don’t. Fortunately, other people make it their business to collect and publish such data, so you can usually string something together on the spot. That gets you far enough in many cases, but it is limited by having to start from scratch for virtually every mashup.

Which is where Linked Data comes in: it allows you to link into the information you want but don’t have.

For David’s question, the information we want is about the geographical position of institutions. Easily the best source for that info and much more besides is the dataset held by the JISC’s Monitoring Unit. Now this dataset is not available as Linked Data yet, but one other part of the Linked Data case is that it’s pretty easy to convert a wide variety of data into RDF. Especially when it is as nicely modelled as the JISC MU’s XML.

All universities with JISC Flexible Delivery projects that use Archimate. Click for the google map

All universities with JISC Flexible Delivery projects that use Archimate. Click for the google map

Having done this once, answering David’s question was trivial. Not just that, answering Mapmans’ interesting question on a list of UK universities of colleges with postcodes was a piece of cake too. That answer prompted Scott and Tony’s question on mapping UCAS and HESA codes, which was another five second job. As was my idle wonder whether JISC projects by Russell group universities used Moodle more or less than those led by post ’92 institutions (answer: yes, they use it more).

Russell group led JISC projects with and without Moodle

Russell group led JISC projects with and without Moodle

Post 92 led JISC projects with or without Moodle

Post '92 led JISC projects with or without Moodle

And it doesn’t need to stop there. I know about interesting datasets from UKOLN and OSSWatch that I’d like to link into. Links from the PROD data to the goodness of Freebase.com and dbpedia.org already exist, as do links to MIMAS’ names project. And each time such a link is made, everyone else (including ourselves!) can build on top of what has already been done.

This is not to say that creating Linked Data out of PROD was for free, nor that no effort is involved in linking datasets. It’s just that the effort seems less than with other technologies, and the return considerably more.

Linked Data also doesn’t make your data automatically usable in all circumstances or bug free. David, for example, expected to see institutions on the map that do use the Archimate standard, but not necessarily as a part of a JISC project. A valid point, and a potential improvement for the PROD dataset. It may also have never come to light if we hadn’t been able to slice and dice our data so readily.

An outline with recipes and ingredients is to follow.

The cloud is for the boring

Members of the Strategic Technologies Group of the JISC’s FSD programme met at King’s Anatomy Theatre to, ahem, dissect the options for shared services and the cloud in HE.

The STG’s programme included updates on projects of the members as well as previews of the synthesis of the Flexible Service Delivery programme of which the STG is a part, and a preview of the University Modernisation Fund programme that will start later in the year.

The main event, though, was a series of parallel discussions on business problems where shared services or cloud solutions could make a difference. The one I was at considered a case from the CUMULUS project; how to extend rather than replace a Student Record System in a modular way.

View from the King's anatomy theatre up to the clouds

View from the King's anatomy theatre up to the clouds

In the event, a lot of the discussion revolved around what services could profitably be shared in some fashion. When the group looked at what is already being run on shared infrastructure and what has proven very difficult, the pattern is actually very simple: the more predictable, uniform, mature, well understood and inessential to the central business of research and education, the better. The more variable, historically grown, institution specific and bound up with the real or perceived mission of the institution or parts thereof, the worse.

Going round the table to sort the soporific cloudy sheep from the exciting, disputed, in-house goats, we came up with following lists:

Cloud:

  • email
  • Travel expenses
  • HR
  • Finance
  • Student network services
  • Telephone services
  • File storage
  • Infrastructure as a Service

In house:

  • Course and curriculum management (including modules etc)
  • Admissions process
  • Research processes

This ought not to be a surprise, of course: the point of shared services – whether in the cloud or anywhere else – is economies of scale. That means that the service needs to be the same everywhere, doesn’t change much or at all, doesn’t give the users a competitive advantage and has well understood and predictable interfaces.

ArchiMate modelling bash outcomes

What’s more effective than taking two days out and focus on a new practice with peers and experts?

Following the JISC’s FSD programme, an increasing number of UK Universities started to use the ArchiMate Enterprise Architecture modelling language. Some people have had some introductions to the language and its uses, others even formal training in it, others still visited colleagues who were slightly further down the road. But there was a desire to take the practice further for everyone.

For that reason, Nathalie Czechowski of Coventry University took the initiative to invite anyone with an interest in ArchiMate modelling (not just UK HE), to come to Coventry for a concentrated two days together. The aims were:

1) Some agreed modelling principles

2) Some idea whether we’ll continue with an ArchiMate modeller group and have future events, and in what form

3) The models themselves

With regard to 1), work is now underway to codify some principles in a document, a metamodel and an example architecture. These principles are based on the existing Coventry University standards and the Twente University metamodel, and the primary aim of them is to facilitate good practice by enabling sharing of, and comparability between, models from different institutions.

With regard to 2), the feeling of the ‘bash participants was that it was well worth sustaining the initiative and organise another bash in about six months’ time. The means of staying in touch in the mean time have yet to be established, but one will be found.

As to 3), a total of 15 models were made or tweaked and shared over the two days. Varying from some state of the art, generally applicable samples to rapidly developed models of real life processes in universities, they demonstrate the diversity of the participants and their concerns.

All models and the emerging community guidelines are available on the FSD PBS wiki.

Jan Casteels also blogged about the event on Enterprise Architect @ Work

Enterprise Architecture throws out bath water, saves baby in the nick of time

Enterprise architecture started as a happily unreconstituted techy activity. When that didn’t always work, a certain Maoist self-criticism kicked in, with an exaltation of “the business” above all else, and taboos on even thinking about IT. Today’s Open Group sessions threatened to take that reaction to its logical extreme. Fortunately, it didn’t quite end up that way.

The trouble with realising that getting anywhere with IT involves changing the rest of the organisation as well, is that it gets you out of your assigned role. Because the rest of the organisation is guaranteed to have different perspectives on how it wants to change (or not), what the organisation’s goals are and how to think about its structure, communication is likely to be difficult. Cue frustration on both sides.

That can be addressed by going out of your way to go to “the business”, talk it’s language, worry about its concerns and generally go as native as you can. This is popular to the point of architects getting as far away from dirty, *dirty* IT as possible in the org chart.

So when I saw the sessions on “business architecture”, my heart sank. More geeks pretending to be suits, like a conference hall full of dogs trying to walk on their hind legs, and telling each other how it’s the future.

When we got to the various actual case reports in the plenary and business transformation track, however, EA self-negation is not quite what’s happening in reality. Yes, speaker after speaker emphasised the need to talk to other parts of the organisation in their own language, and the need to only provide relevant information to them. Tom Coenen did a particularly good job of stressing the importance of listening while the rest of the organisation do the talking.

But, crucially, that doesn’t negate that – behind the scenes – architects still model. Yes, for their own sake, and solely in order to deliver the goals agreed with everyone else, but even so. And, yes, there are servers full of software artefacts in those models, because they are needed to keep the place running.

This shouldn’t be surprising. Enterprise architects are not hired to decide what the organisation’s goals are, what its structure should be or how it should change. Management does that. EA can merely support by applying its own expertise in its own way, and worry about the communication with the rest of the organisation both when requirements go in and a roadmap comes out (both iteratively, natch).

And ‘business architecture’? Well, there still doesn’t appear to be a consensus among the experts what it means, or how it differs from EA. If anything, it appears to be a description of an organisation using a controlled vocabulary that looks as close as possible to non-domain specific natural language. That could help with intra-disciplinary communication, but the required discussion about concepts and the word to refer to them makes me wonder whether having a team who can communicate as well as they can model might not be quicker and more precise.

Bare bones TOGAF

Do stakeholder analysis. Cuddle the uninterested powerful ones, forget about the enthusiasts without power. Agree goal. Deliver implementable roadmap. The rest is just nice-to-have.

That was one message from today’s slot on The Open Group’s Architecture Framework (TOGAF) at the Open Group’s quarterly meeting in Amsterdam. In one session, two self-described “evil consultants” ran a workshop on how to extract most value from an Enterprise Architecture (EA) to institutional change.

While they agreed about the undivided primacy of keeping the people with power happy when doing EA, the rest of their approach differed more markedly.

Dave Hornford zero-ed in mercilessly on the do-able roadmap as the centre of the practice. But before that, find those all-powerful stakeholders and get them to agree on the organisational vision and its goal. If there is no agreement: celebrate. You’ve just saved the organised an awful lot of money in an expensive and unimplementable EA venture.

Once past that hurdle, Dave contended that the roadmap should identify what the organisation really needs – which may not always be sensible or pretty.

Jason Uppal took a slightly wider view, by focussing on the balance between quick wins and how to EA the norm in an organisation.

The point about ‘quick wins’ is that both ‘quick’ and ‘win’ are relative. It is possible to go after a long term value proposition with a particular change, as long as you have a series of interim solutions that provide value now. Even if you throw them away again later. And the first should preferably have no cost.

That way, EA can become part of the organisation’s practice: by providing value. This does pre-suppose that the EA practice is neither a project, nor a programme- just a practice.

An outline of the talks on the Open Group’s website

IMS Question and Test Interoperability 2.1 tools demonstrate interoperability

While most of Europe was on the beach, a dedicated group of QTI vendors gathered in Koblenz, Germany to demo what a standard should do: enable interoperability between a variety of software tools.

A total of twelve tools were demonstrated for the attendees of the IMS quarterly meeting that was being held at the University of Koblenz-Landau. The vendors and projects ranged from a variety of different communities in Poland, Korea, France, Germany and the UK, and their tools included:

All other things being equal, the combination of such a diversity of purposes with the comprehensive expressiveness of QTI, means that there is every chance that a set of twelve tools will implement different, non-overlapping subsets of the specification. This is why the QTI working group is currently working on the definition of two profiles: CC (Common Cartridge) QTI and what is provisionally called the Main profile.

The CC QTI profile is very simple and follows the functionality of the QTI 1.2 profile that is currently used in the IMS Common Cartridge educational content exchange format. Nine out of the twelve tools had implemented that profile, and they all happily played, edited or validated the CC QTI reference test.

With that milestone, the group is well on the way to the final, public release of the QTI 2.1 specification. Most of the remaining work is around the definition of the Main profile.

Initial discussion in Koblenz suggested an approach that encompasses most of the specification, with the possible exclusion of some parts that are of interest to some, but not all subjects or communities. To make sure the profile is adequate and implementable, more input is sought from publishers, qualification authorities and others with large collections of question and test items. Fortunately, a number of these have already come forward.

How to meshup eportfolios, learning outcomes and learning resources using Linked Data, and why

After a good session with the folks from the Achievement Standards Network (ASN), and earlier discussions with Link Affiliates, I could see the potential of linking LEAP2a portfolios with ASN curriculum information and learning resources. So I implemented a proof of concept.

Fortunately, almost all the information required is already available as RDF: the ASN makes its machine readable curricula available in that format, and Zotero (my bibliography tool of choice) happily puts out its data in RDF too. What still needed to be done was the ability to turn LEAP2a eportfolios into RDF.

That took some doing, but since LEAP2a is built around the IETF Atom newsfeed format, there were at least some existing XSL transformations to build on. I settled on the one included in the open source OpenLink Virtuoso data management server, since that’s what I used for the subsequent Linked Data meshing too. Also, the OpenLink Virtuoso Atom-to-RDF XSLT came out of their ‘sponger’ middleware layer, which allows you to treat all kinds of structured data as if they were RDF datasources. That means that it ought to be possible to built a wee LEAP2a sponger cartridge around my leap2rdf.xslt, which then allows OpenLink Virtuoso to treat any LEAP2a portfolio as RDF.

The result still has limitations: the leap2rdf.xslt only works on LEAP2a records with the new, proper namespace, and it only works well on those records that use URIs, but not those that use Compact URIs (CURIEs). Fixing these things is perfectly possible, but would take two or three more days that I didn’t have.

So, having spotted my ponds of RDF triples and filled one up, it’s time to go fishing. But for what and why?

Nigel Ward and Nick Nicholas of Link Affiliates have done an excellent job in explaining the why of machine readable curriculum data, so I’ve taken the immediate advantages that they identified, and illustrated them with noddy proof-of-concept hows:

1. Learning resources can be easily and unambiguously tagged with relevant learning outcomes.
For this one, I made a query that looks up a work (Robinson Crusoe) in my Zotero bibliographic database and gets a download link for it, then checks whether the work supports any known learning outcomes (in my own 6-lines-of-RDF repository), and then gets a description of that learning outcome from the ASN. You can see the results in CSV.

It ought to have been possible to use a bookmarking service for the learning resource to learning outcome mapping, but hand writing the equivalent of

‘this book’ ‘aligns to’ ‘that learning outcome’

seemed easier :-)

2. A student’s progress can be easily and unambiguously mapped to the curriculum.
To illustrate this one, I’ve taken Theophilus Thistledown’s LEAP2a example portfolio, and added some semi-appropriate Californian K-12 learning outcomes from the ASN against the activities Theophilus recorded in his portfolio. (Anyone can add such ASN statements very easily and legally within the scope of the LEAP2a specification, by the way) I then RDFised the lot with my leap2rdf XSLT.

I queried the resulting RDF portfolio to see what learning outcomes were supported by one particular learning activity, and I then got descriptions of each of these learning outcomes from the ASN, and also got a list of other learning outcomes that belong to the same curriculum standard. That is, related learning outcomes that Theophilus could still work on. This is what the SPARQL looks like, and the results can be seen here. Beware that a table is not the most helpful way of presenting this information- a line and a list would be better.

3. Lesson plans and learning paths to be easily and unambiguously mapped to the curriculum.
This is what I think of as the classic case: I’ve taken an RDFised, ASN enhanced LEAP2a eportfolio, and looked for the portfolio owner’s name, any relevant activities that had a learning outcome mapped against them, then fished out the identifier of that learning outcome and a description of same from the ASN. Here’s the SPARQL, and there’s the result in CSV.

Together, these give a fairly good of what Robinson Crusoe was up to, according to the Californian K-12 curriculum, and gives a springboard for further exploration of things like comparison of the learning outcomes he aimed for then with later statements of the same outcome or the links between the Californian outcomes with those of other jurisdictions.

4. The curriculum can drive content discovery: teachers and learners want to find online resources matching particular curriculum outcomes they are teaching.
While sitting behind his laptop, Robinson might be wondering whether he can get hold of some good learning resources for the learning activities he’s busy with. This query will look at his portfolio for activities with ASN learning outcomes and check those outcomes against the outcome-to-resource mapping repository I mentioned earlier. It will then look up some more information about the resources from the Zotero bibliographic database, including a download link- like so

The nice thing is that this approach should scale up nicely all the way from my six lines of RDF to a proper repository.

5. Other e-learning applications can be configured to use the curriculum structure to share information.
A nice and simple example could be a tool that lets you discover other learners with the same learning outcome as a goal in their portfolio. This sample query looks through both Theophilus and Robinson’s eportfolios, identifies any ASN learning outcomes they have in common, and then gets some descriptions of that outcome from the ASN, with this result.

Lessons learned

Of all the steps in this and other meshups, deriving decent RDF from XML is easily the hardest and most time consuming. Deriving RDF from spreadsheets or databases seems much easier, and once you have all your source data in RDF, the rest is easy.

Even using the distributed graph pattern I described in a previous post, querying across several datasets can still be a bit slow and cumbersome. As you may have noticed if you follow the sample query links, uriburner.com (the hosted version of OpenLink Virtuoso) will take it’s time in responding to a query if it hasn’t got a copy of all relevant datasets downloaded, parsed and stored. Using a SPARQL endpoint on your own machine clearly makes a lot of sense.

Perhaps more importantly, all the advantages of machine readable curricula that Nigel and Nick outlined are pretty easily achievable. The queries and the basic tables they produce took me one evening. The more long term advantages Nigel and Nick point out – persistence of curricula, mapping different curricula to each other, and dealing with differences in learning outcome scope – are all equally do-able using the linked data stack.

Most importantly, though, are the meshups that no-one has dreamed of yet.

What’s next

For other people to start coming up with those meshups, though, some further development needs to happen. For one, the leap2rdf.xslt needs to deal with a greater variety of LEAP2a eportfolios. A bookmark service that lets you assert simple triples with tags, and expose those triples as RDF with URIs (rather than just strings) would be great. The query results could look a bit nicer too.

The bigger deal is the data: we need more eportfolios to be available in either LEAP2a or LEAP2r formats as a matter of course, and more curricula need be described using the ASN.

Beyond that, the trickier question is who will do the SPARQL querying and how. My sense is that the likeliest solution is for people to interact with the results of pre-fabbed SPARQL queries, which they can manipulate a bit using one or two parameters via some nice menus. Perhaps all that the learners, teachers, employers and others will really notice is more relevant, comprehensive and precisely tailored information in convenient websites or eportfolio systems.

Resources

The leap2rdf.xslt is also available here. Please be patient with its many flaws- improvements are very welcome.

Meshing up a JISC e-learning project timeline, or: It’s Linked Data on the Web, stupid

Inspired by the VirtualDutch timeline, I wondered how easy it would be to create something similar with all JISC e-learning projects that I could get linked data for. It worked, and I learned some home truths about scalability and the web architecture in the process.

As Lorna pointed out, UCL’s VirtualDutch timeline is a wonderful example of using time to explore a dataset. Since I’d already done ‘place’ in my previous meshup, I thought I’d see how close I could get with £0.- and a few nights tinkering.

The plan was to make a timeline of all JISC e-learning projects, and the developmental and affinity relations between them that are recorded in CETIS’ PROD database of JISC projects. More or less like Scott Wilson and I did by hand in a report about the toolkits and demonstrators programme. This didn’t quite work; SPARQLing up the data is trivial, but those kinds of relations don’t fit well into the widely used Simile timeline from both a technical and usability point of view. I’ll try again with some other diagram type.

What I do have is one very simple, and one not so simple timeline for e-learning projects that get their data from the intensely useful rkb explorer Linked Data knowledge base of JISC projects and my own private PROD stash:

Recipe for the simple one

  • Go to the SPARQL proxy web service
  • Tell it to ask this question
  • Tell the proxy to ask that question of the RKB by pointing it at the RKB SPARQL endpoint (http://jisc.rkbexplorer.com/sparql/), and order the results as CSV
  • Copy the URL of the CSV results page that the proxy gives you
  • Stick that URL in a ‘=ImportData(“{yourURL}”)’ function inside a fresh Google Spreadsheet
  • Insert timeline gadget into Google spreadsheet, and give it the right spreadsheet range
  • Hey presto, one timeline coming up:
Screenshot of the simple mashup- click to go to the live meshup

Screenshot of the simple mashup- click to go to the live meshup

Recipe for the not so simple one

For this one, I wanted to stick a bit more in the project ‘bubbles’, and I also wanted the links in the bubbles to point to the PROD pages rather than the RKB resource pages. Trouble is, RKB doesn’t know about data in PROD, and for some very sound reasons I’ll come to in a minute, won’t allow the pulling in of external datasets via SPARQL’s ‘FROM’ operator either. All other SPARQL endpoints in the web that I know of that allow FROM couldn’t handle my query- they either hung or conked out. So I did this instead:

  • Download and install your own SPARQL endpoint (I like Leigh Dodds’ simple but powerful Twinkle)
  • Feed it this query
  • Copy and paste the results into a spreadsheet and fiddle with concatenation
  • Hoist spreadsheet into Google docs
  • Insert timeline gadget into Google spreadsheet, and give it the right spreadsheet range
  • Hey presto, a more complex timeline:
Click to go to the live meshup

Click to go to the live meshup

It’s Linked Data on the Web, stupid

When I first started meshing up with real data sets on the web (as opposed to poking my own triple store), I had this inarticulate idea that SPARQL endpoints were these magic oracles that we could ask anything about anything. And then you notice that there is no federated search facility built into the language. None. And that the most obvious way of querying across more than one dataset – pulling in datasets from outside via SPARQL’s FROM – is not allowed by many SPARQL endpoints. And that if they do allow FROM, they frequently cr*p out.

The simple reason behind all this is that federated search doesn’t scale. The web does. Linked Data is also known as the Web of Data for that reason- it has the same architecture. SPARQL queries are computationally expensive at the best of times, and federated SPARQL queries would be exponentially so. It’s easy to come up with a SPARQL query that either takes a long time, floors a massive server (or cloud) or simply fails.

That’s a problem if you think every server needs to handle lots of concurrent queries all the time, especially if it depends on other servers on a (creaky) network to satisfy those queries. By contrast, chomping on the occasional single query is trivial for a modern PC, just like parsing and rendering big and complex html pages is perfectly possible on a poky phone these days. By the same token, serving a few big gobs of (RDF XML) text that sits at the end of a sensible URL is an art that servers have perfected over the past 15 years.

The consequence is that exposing a data set as Linked Data is not so much a matter of installing a SPARQL endpoint, but of serving sensibly factored datasets in RDF with cool URLs, as outlined in Designing URI Sets for the UK Public Sector (pdf). That way, servers can easily satisfy all comers without breaking a sweat. That’s important if you want every data provider to play in Linked Data space. At the same time, consumers can ask what they want, without constraint. If they ask queries that are too complex or require too much data, then they can either beef up their own machines, or live with long delays and the occasional dead SPARQL engine.