Doing XML semantically

When looking at XML specifications, first look for what are the resources, or objects, or entities. When you have one of these contained in another, ask, what is their relationship? That will help inform a sensible version of the XML spec, if you really must have one.

Didn’t I do well getting the core ideas into less than however many words? OK, now for the full version…

Yesterday we (Scott and I) were visited by Karim Derrick of TAG Learning. Karim and TAG are championing a BSI initiative, scheduled to be BS 8518, for the transfer of assessment data – particularly focused on coursework. They are being generous: they are doing the development work, based on their own and their clients’ needs, and handing it over to BSI for standardisation, so that all can benefit.

One of the things that we are keen on in CETIS is doing standards and specifications in a sensible way. We have long had a strong line in discouraging people from doing ill-advised things (perhaps a bit like the supposed Google message of not being evil) but I’m not very well-adapted for that, so I welcome the complementary approach of positively trying to encourage people to do sensible things, which I think is gaining strength in CETIS. The inherent challenge is coming to some kind of collective view on how to standardise the subject matter in hand – even if this is, wait (until something happens), and only then, do it. Within this line of doing good things, one that we seem to agree on is to do with XML specifications. And so I come back to the main thrust of this post.

Doing XML semantically is what has happened in XCRI (thanks to Scott Wilson and others) and now, with my involvement, in LEAP2A. It is easy in an Atom-based specification to follow this pattern, because Atom’s simple basic structure invites any kind of portfolio item to be an entry, and the relationships between them to be Atom links. For the same reason, Atom tends to be easy to read. But it is not too difficult to do this as well in your own XML language, if you just take a little care. You should look at every element, to see whether it is a thing, a relationship, or data – in RDF terms, a resource, a property or predicate, or a literal. TAG’s draft specification has pupils, as it is designed primarily for schools, rather than students. Pupils are things, in these terms! It has centres, which are often where the teaching and the coursework assessment takes place. What is the relationship between a student and a centre? Just taking leave of the TAG proposal for a minute, and thinking of other possibilities, if there were always only one centre, and all the students belonged to that centre, there would be no need even to represent the students within (in XML terms) the centre. If there are different groups of students within a centre, it might make sense to have within the centre element, elements defining what the relationship is between the centre and this particular group of students.

Then, one part of the draft has pupil elements containing marksheets. Again, what is the relationship? If there is only one possible, you don’t need a container element standing between the pupil and individual marksheet elements. If there is more than one possible relationship, then it would make sense for to have a pupil element containing a wrapper for marksheets, and that wrapper would be associated with the relationship (properly; predicate in RDF terms).

I hope that gives some kind of hint, at least, on how to do XML in a way that makes sense both from the domain point of view, and semantically. The payoff is this. If the mapping to RDF is clear, then someone should be able, without too much difficulty, to create an XSLT to do the transform. Then, if someone else wants to do a different XML spec, or has already done so, and it also transforms to RDF, there is a good basis for knowing whether similar information presented in the two XML specs is actually the same, or not.

One particularly attractive version of this is to have an RDFa representation, which of course of its very nature yeilds RDF on transformation. So you can present exactly the same information in XHTML, readable by anyone in a browser, and formatted to make it easy to read and to understand, and still have all the information just as machine-processable as any XML spec. That’s just what I want to do for LEAP2.

All this is an extension on what I wrote earlier