Over the last few months I’ve been exploring and detailing a provisional binding of the InLOC spec to JSON-LD (spec; site). My conclusion is that JSON is better matched to linked data than XML is, if you understand how to structure JSON in the JSON-LD way. Here are my reflections, which I hope add something to the JSON-LD official documentation.
Let’s start with XML, as it is less unfamiliar to most non-programmers, due to similarities with HTML. XML offers two kinds of structures: elements and attributes. Elements are the the pieces of XML that are bounded by start and end tags (or are simply empty tags). They may nest inside other elements. Attributes are name-value pairs that exist only within element start tags. The distinction is useful for marking up text documents, as the tags, along with their attributes, are added to the underlying text, without altering it. But for data, the distinction is less helpful. In fact, some XML specifications use almost no attributes. Generally, if you are using XML to represent data, you can change attributes into elements, with the attribute name as a contained element name, and the attribute value as text contained within the new element.
Confused? You’d be in good company. Many people have complained about this aspect of XML. It gives you more than enough “rope to hang yourself with”.
Now, if you’re writing a specification that might be even remotely relevant to the world of linked data, it is really important that you write your specification in a way that clearly distinguishes between the names of things – objects, entities, etc. – and the names of their properties, attributes, etc. It’s a bit like, in natural language, distinguishing nouns from adjectives. “Dog” is a good noun, “brown” is a good adjective, and we want to be able to express facts such as “this dog is of the colour brown”. The word “colour” is the name of the property; the word “brown” is the value of the property.
The bit of linked data that is really easy to visualise and grasp is its graphical representation. In a linked data graph, customarily, you have ovals that represent things – the nouns, objects, entities, etc. – labelled arrows to represent the property names (or “predicates”); and rectangles to represent literal values.
Given the confusion above, it’s not surprising that when you want to represent linked data using XML, it can be particularly confusing. Take a look at this bit of the RDF/XML spec. You can see the node and arc diagram, and the “striped” XML that is needed to represent it. “Striping” means that as you work your way up or down the document tree, you encounter elements that represent alternately (a) things and (b) the names of properties of these things.
Give up? So do most people.
But wait. Compared to RDF/XML, representing linked data in JSON-LD is a doddle! How so?
Basics of how JSON-LD works
Well, look at the remarkably simple JSON page to start with. There you see it: the most important JSON structure is the “object”, which is “an unordered set of name/value pairs”. Don’t worry about arrays for now. Just note that a value can also be an object, so that objects can nest inside each other.
To map this onto linked data, just look carefully at the diagram, and figure that…
- a JSON object represents a thing, object, entity, etc.
- property names are represented by the strings.
In essence, there you have it!
But in practice, there is a bit more to the formal RDF view of linked data.
- Objects in RDF have an associated unique URI, which is what allows the linking. (No need to confuse things with blank nodes right now.)
- To do this in JSON, objects must have a special name/value pair. JSON-LD uses the name “@id” as the special name, and its value must be the URI of the object.
- Predicates – the names of properties – are represented in RDF by URIs as well.
- To keep JSON-LD readable, the names stay as short and meaningful labels, but they need to be mapped to URIs.
- If a property value is a literal, it stays as a plain value, and isn’t an object in its own right.
- In RDF, literal values can have a data type. JSON-LD allows for this, too.
JSON-LD manages these tricks by introducing a section called the “context”. It is in the “context” that the JSON names are mapped to URIs. Here also, it is possible to associate data types with each property, so that values are interpreted in the way intended.
What of JSON arrays, then? In JSON-LD, the JSON array is used specifically to give multiple values of the same property. Essentially, that’s all. So each property name, for a given object, is only used once.
Applying this to InLOC
At this point, it is probably getting hard to hold in one’s head, so take a look at the InLOC JSON-LD binding, where all these issues are illustrated.
InLOC is a specification designed for the representation of structures of learning outcomes, competence definitions, and similar kinds of thing. Using InLOC, authorities owning what are often called “frameworks” or (confusingly) “standards” can express their structures in a form that is completely explicit and machine processable, without the common reliance on print-style layout to convey the relationships between the different concepts. One of the vital characteristics of such structures is that one, higher-level competence can be decomposed in terms of several, lower-level competences.
InLOC was planned to able to be linked data from the outset. Following many good examples, including the revered Dublin Core, the InLOC information model is expressed in terms of classes and properties. Thus, it is clear from the outset that there is a mapping to a linked data style model.
To be fully multilingual, InLOC also takes advantage of the “language map” feature of JSON-LD. Instead of just giving one text value to a property, the value of any human-language property is an object, within which the keys are the two-letter language codes, and the values are the property value in that language.
To take home…
If you want to use JSON-LD, ensure that:
- anything in your model that looks like a predicate is represented as a name in JSON object name/value pairs;
- anything in your model that looks like a value is represented as the value of a JSON name/value pair;
- you only use each property name once – if there are multiple values of that property, use a JSON array;
- any entities, objects, things, or whatever you call them, that have properties, are represented as JSON objects;
- and then, following the spec, carefully craft the JSON-LD context, to map the names onto URIs, and to specify any data types.
Try it and see. If you follow me, I think it will make sense – more sense than XML. It’s now (January 2014) a W3C Recommendation.