Universities and Colleges in the Giant Global Graph

Earlier this week I facilitated a session at the 2009 CETIS Conference to explore some of the opportunities that Linked Data might offer to universities and colleges and to help us (CETIS in particular but also JISC and our peer “innovation support centre”, UKOLN) work out what work should be done to move us closer to realising benefits in the management and delivery of Higher and Further Education from adoption of Linked Data principles either by universities and colleges or by other public or private bodies that form part of the overall H/FE system. An approximately factual account of the session, relevant links etc is available from the Universities and Colleges in the Giant Global Graph session page. This post contains more personal rambling opinion.

Paul Walk seems to be the first of the session participants to blog. He captures one of the axes of discussion well and provides an attractive set of distinctions. It seems inevitable that the semantic vultures will circle about terminological discussions and it is probably best for us to spell out what we mean rather than use sweeping terms like “Linked Data” as I do in the opening paragraph. My inclination is not to be quite as hard-line as Paul about the necessity for RDF to be used to call it “Linked Data”. Vultures circle. In future I’ll draw more clear distinctions between the affordances of open data vs linked data vs semantic web, although I tried to put whitespace between linked data and semantic web in my session intro (PPT). Maybe it would be more clear for some audiences to consider the affordances of particular acts such as selecting a recognised data licence of various types, assigning persistent URIs to things, … but this is not useful for all discourse. Talking of “recognised data licence[s]” may also allow us to sidestep the “open” meme conflation: open access, open source, open process, open-for-reuse…

Actually, I’m rather inclined to insert a further distinction for and use the (fantasy) term “Minty Data” for linked-data-without-requiring-RDF (see another of Paul Walk’s posts on this). Why? Well: it seems that just posting CSV, while that might be better than nothing from an open data point of view doesn’t promise the kind of network effects that 4-rules linked data (i.e. Berners-Lee rules) offers. On the other hand it does seem to me that we might be able to get quite a long way without being hard core and are a lot less likely to frighten people away. I’m also aware that there is likely to be a paradigm shift for many in thinking and working with web architecture, in spite of the ubiquitousness of the web.

Minty Data rules, kind-of mongrel of ROA and 4-Rules Linked Data:

  1. Assign URIs to things people are likely to want to refer to
    • having first thought through what the domain model behind them is (draw a graph)
    • make the URIs hackable, predictable, structured
    • consider Logical Types (ref Bertrand Russell)
    • don’t change them until Hell freezes over
  2. use HTTP URIs for the usual reasons
  3. Return something machine-readable e.g. JSON, Atom
      • and something human-readable (but this ISN’T Minty Data)

      For Extra-strong mints:

      1. Link to other things using their URIs, especially if they were minted by you
      2. When returning information about a thing, indicate what class(es) of things it belongs to
      3. If the “thing” is also described in one or more encyclopedia or other compendium of knowledge, express that link in a well-known way.
        • and if it isn’t described but should be, get it added if you can

        There was a bit of discussion in the conference session about the perceived investment necessary to make Linked Data available. I rather felt that this shouldn’t necessarily be the case given software such as D2R and Triplify. At least, the additional effort required to make Minty Data available having first thought though the domain model (information architecture) shouldn’t be much. This is, of course, not a universally-trivial pre-requisite but it is an objective with quite a lot of literature to justify the benefits to be accrued from getting to grips with it. It would be a mistake to suggest boiling the ocean; the conclusion I make is that a readiness-criterion for anyone considering exposing Linked/Minty Data is that consideration of the domain model related to that data has been considered or is judged to be feasible or desirable for other reasons.

        The BBC approach, described in many places but quoted from Tom Scott and Michael Smethurst in Talis Nodalities here,  seems to reflect the above:

        “I’d like to claim that when we set out to develop [bbc.co.uk]/programmes we had the warm embrace of the semantic web in mind. But that would be a lie. We were however building on very similar philosophical foundations.

        In the work leading up to bbc.co.uk/programmes we were all too aware of the importance of persistent web identifiers, permanent URIs and the importance of links as a way to build meaning. To achieve all this we broke with BBC tradition by designing from the domain model up rather than the interface down. The domain model provided us with a set of objects (brands, series, episodes, versions, ondemands, broadcasts etc) and their sometimes tangled interrelationships.”

        On the other hand, I do perceive a threat arising from the ready availability of software to add a sprinkle of RDF or SPARQL endpoint to an existing web application or scrape HTML to RDF, especially if RDF is the focus of the meme. A sprinkle of RDF misses the point if it isn’t also based on a well-principled approach to URIs and their assignment and the value of links; a URI isn’t just the access point for a cranky API returning structured data. The biggest threat to the Linked Data meme may be a deluge of poor quality RDF rather than an absence of it.