Doing XML semantically

When looking at XML specifications, first look for what are the resources, or objects, or entities. When you have one of these contained in another, ask, what is their relationship? That will help inform a sensible version of the XML spec, if you really must have one.

Didn’t I do well getting the core ideas into less than however many words? OK, now for the full version…

Yesterday we (Scott and I) were visited by Karim Derrick of TAG Learning. Karim and TAG are championing a BSI initiative, scheduled to be BS 8518, for the transfer of assessment data – particularly focused on coursework. They are being generous: they are doing the development work, based on their own and their clients’ needs, and handing it over to BSI for standardisation, so that all can benefit.

One of the things that we are keen on in CETIS is doing standards and specifications in a sensible way. We have long had a strong line in discouraging people from doing ill-advised things (perhaps a bit like the supposed Google message of not being evil) but I’m not very well-adapted for that, so I welcome the complementary approach of positively trying to encourage people to do sensible things, which I think is gaining strength in CETIS. The inherent challenge is coming to some kind of collective view on how to standardise the subject matter in hand – even if this is, wait (until something happens), and only then, do it. Within this line of doing good things, one that we seem to agree on is to do with XML specifications. And so I come back to the main thrust of this post.

Doing XML semantically is what has happened in XCRI (thanks to Scott Wilson and others) and now, with my involvement, in LEAP2A. It is easy in an Atom-based specification to follow this pattern, because Atom’s simple basic structure invites any kind of portfolio item to be an entry, and the relationships between them to be Atom links. For the same reason, Atom tends to be easy to read. But it is not too difficult to do this as well in your own XML language, if you just take a little care. You should look at every element, to see whether it is a thing, a relationship, or data – in RDF terms, a resource, a property or predicate, or a literal. TAG’s draft specification has pupils, as it is designed primarily for schools, rather than students. Pupils are things, in these terms! It has centres, which are often where the teaching and the coursework assessment takes place. What is the relationship between a student and a centre? Just taking leave of the TAG proposal for a minute, and thinking of other possibilities, if there were always only one centre, and all the students belonged to that centre, there would be no need even to represent the students within (in XML terms) the centre. If there are different groups of students within a centre, it might make sense to have within the centre element, elements defining what the relationship is between the centre and this particular group of students.

Then, one part of the draft has pupil elements containing marksheets. Again, what is the relationship? If there is only one possible, you don’t need a container element standing between the pupil and individual marksheet elements. If there is more than one possible relationship, then it would make sense for to have a pupil element containing a wrapper for marksheets, and that wrapper would be associated with the relationship (properly; predicate in RDF terms).

I hope that gives some kind of hint, at least, on how to do XML in a way that makes sense both from the domain point of view, and semantically. The payoff is this. If the mapping to RDF is clear, then someone should be able, without too much difficulty, to create an XSLT to do the transform. Then, if someone else wants to do a different XML spec, or has already done so, and it also transforms to RDF, there is a good basis for knowing whether similar information presented in the two XML specs is actually the same, or not.

One particularly attractive version of this is to have an RDFa representation, which of course of its very nature yeilds RDF on transformation. So you can present exactly the same information in XHTML, readable by anyone in a browser, and formatted to make it easy to read and to understand, and still have all the information just as machine-processable as any XML spec. That’s just what I want to do for LEAP2.

All this is an extension on what I wrote earlier

Interoperability through semantics

I was on a call this afternoon, with the HR-XML people discussing that old chestnut, contact information. The really interesting comment that came up was that many people don’t get any kind of intermediate domain model – rather, they just want to see their implementation (or practice) reflected directly in the specification, and so they are disappointed when (inevitably) it doesn’t happen. The HR-XML solution may be serviceable in the end, but what interested me more was the process which is really needed to do interoperability properly. I’ve been going on about semantic web approaches to interoperability for a while, but I hadn’t really thought through the processes which are necessary to implement it. So it’s a step forward for me.

Here’s how I now see it. Lots of people start off with their own way of seeing, thinking about, or conceptualising things. The primary task of the interoperability analyst or consultant (inventing a term that I’d feel comfortable with for myself) is to create a model into which all the initial variants can be mapped, one way or another. We don’t want one single uniform model into which everyone’s concepts are forced to fit, but rather a common model containing all the differences of view. Now, as I see it, that’s one of the big advantages of the semantic web: it’s so flexible and adaptable that you really can make a model which is a superset of just about any set of models that you can pick. Just what sort of model is needed becomes clearer when we think of the detailed steps of its creation and use.

If one group of people have a particular way of seeing things, the mapping to this common model must be acceptable to them. It won’t always be so immediately, so one has to allow for an educational process, possibly a Socratic one, of leading them to that acceptance. But you don’t have to show them all the other mappings at the same time, just theirs. Relating to other models comes later.

From the mappings to the common model, it is possible, likely even, that there will be some correspondence between concepts, so that different people can recognise they are talking about the same thing. One way of confirming this is to show the various people user interfaces of their systems, dealing with that kind of information. You could easily get remarks such as “yes, we have that, too”. Though one has to look out for, and evaluate, the “except that…” riders.

On the other hand, there are bound to be several concepts which don’t match directly in the common model. To complete the road to interoperability, what is needed is to ascertain, and get agreed, the logical connections between the common model concepts into which the original people’s concepts map. This, of course, is the area of ontologies, but it has a very different feel to the normal one of formalising the logical relationships between the concepts in just one group’s model. We are aiming at a common ontology, not in the sense that everyone must understand and use all the concepts, but that everyone agrees on the way that the concepts interrelate; the way that “their” concepts touch on “foreign” concepts, all within the same ontology.

Once the implications have been agreed between the different concepts in the common model, the way is open to create transforms between information seen in one view and information seen in another view. Each different group can, if they want, keep their own XML schemas to represent their own way of conceptualising the domain, but there will be (approximate) ways of translating this to other conceptualisations, perhaps via an intermediate RDF form. But, perhaps more ambitiously, once these implications are agreed, it is likely that people will be free to migrate towards more coherent views of the domain – actually to change the way they see things.

It is potentially a long process, and supporting it is not straightforward. I could imagine a year’s full-time postgraduate study – an MSc if you like – being needed to study, understand and put together the different roots and branches of philosophy, logic, communication, consensus process, IT, and education that are needed. But if we had trained, not just the naturally gifted, practitioners in this area, perhaps we could have enough people to get beyond the pitfalls of processes that are too often bogged down in mutual misunderstanding or incomprehension, or just plain amateurishness.

TRACE project, Brussels, 2007-11-19

Monday 19th November: I was invited as an expert to the final meeting of the TRACE project, held in Brussels. TRACE stands for Transparent Competences in Europe. The project web site is meant to be at http://trace.education-observatories.net/ . I didn’t realise how many competence projects there were in Europe at the moment, as well as TEN Competence which some CETIS people are involved with.

The meeting consisted of some presentations of the project work, followed by a general discussion which particularly involved the invited experts.

TRACE has created a prototype system to illustrate the competence transparency concept. In essence, this does employment matching based on inferences using domain knowledge embedded in an ontology, as well as job offers on the one side, and and CV-based personal competence profiles on the other. They didn’t try to do the full two-way matching thing as the Dutch Centre for Work and Income do. On the surface, the TRACE matching looks like a simpler version of what is done by the Belgian company Actonomy.

The meeting seemed to recognise that factors other than competences are also important in employment matching, but this has not been explored in the context of the TRACE project; nor has the idea that a system which can be used for competence-based matching in the employment domain could easily and advantageously be used for several other applications. It would be good to get a wider framework together, and this might go some way towards countering social exclusion worries.

Karsten Lundqvist, working at Reading with the project leader Prof. Keith Baker, was mainly responsible for the detailed ontology work, and he recognises that the relationships chosen to represent in the top-level ontology are vitally essential to what the ontology can support, and what domain ontologies can represent. They have a small number of relationships in their ontology:

  • has part
  • part of
  • more specific
  • more general
  • synonym
  • antonym

While these are reasonable first guesses at useful relationships, some of my previous work (presented at a TEN Competence meeting) proposes slightly different ones. I made the point in this meeting that it would be a good idea to check the relevance, appropriateness and meaningfulness of chosen relationships with people engaged in the domain itself. I’d say it is important in this kind of system to gain the trust of the end users by itself being transparently understandable.

But further than this, comprehensible relationships as well as terms are vital to the end of getting communities to take responsibility for ontologies. People in the community must be able to discuss the ontology. And, if the ontology is worked in to a structure to support communications, by being the basis of tags, people that work in the field will have plenty of motivation to understand the ontology. Put the motivation to understand together with structures and concepts that are easily understandable, and there is nothing in the way of widespread use of ontologies by communities, for a variety of purposes.

Putting together the main points that occurred to me, most of which I was able to say at the meeting:

  • relationships chosen for a top-level ontology for competence are vitally central, providing the building blocks for domain ontologies where the common knowledge of a community is represented;
  • we need further exploration about which relationships are most suitable and comprehensible for the community;
  • this will enable community development and maintenance of their own ontologies;
  • the UK already has some consensus-building communities, in the Sector Skills Councils;
  • SSCs produce National Occupational Standards, and it is worthwhile studying what is already produced and how, rather than reinventing the complete set of wheels (see my work for ioNW2);
  • to get practical success, we should acknowledge the human tendency for everyone to produce their own knowledge structures, including domain ontologies;
  • but we need to help people interrelate different domain ontologies, by providing in particular relationships suited to cross-link nodes in different ontologies (see my previous work on this)

All in all, an interesting and stimulating meeting.