linked portfolios? | Simon Grant

There’s been continued development of interest within CETIS around the issue of linked data. Most people seem to start from the assumption that linked data is public data, and of course that isn’t going to work in e-portfolio land. (See e.g. this W3C guide in construction.) I see it as a creative challenge for CETIS to get hold of the issue of linking personal data, the issues it involves, and perhaps leading on to initial guidance for others implementing systems. This is perhaps needed to make progress with Leap2R.

Wilbert Kraan was in the Bolton office today, and I had a brief chat that opened up some of these issues to me. (He is a CETIS Semantic Web authority.) We could approach linked personal data in at least two ways:

named graphs with permissions attached;
security policies for particular URIs.

The named graph approach would seem to fit well with the way that e-portfolio systems make information available. Mahara has “views”, PebblePad has “webfolios”, which are somewhat similar in structure. They are both the means for presenting subsets of one’s information to particular audiences. So, if an e-portfolio had a SPARQL query facility attached, it would have to give no information by default, but only information derived from the graphs specifically named in the query. It is, I am assured, quite possible to restrict permission to access particular named graphs in a way very similar to restricting access to any web document.

But does that give too little to those who want to write really interesting SPARQL queries involving personal information? Or would the necessary permission processes be too cumbersome? What if an individual could create permissions, or an access regime, for individual bits of his or her information? That might be more in keeping with the spirit of the Semantic Web. In which case, perhaps we could envisage two strengths of control:

filtering triples output from a SPARQL query to ensure that they only contained restricted URIs if the querying agent had permission to have those URIs;
filtering the inferencing process so that triples containing restricted URIs were only used in the inferencing process if they querying agent had permission to use them.

We would need to look into what the effects of these might be. Maybe we might conclude that the latter was an appropriate way of keeping sensitive data really private, while the former might be OK for personal information that was not sensitive? That is no more than a guess. If this approach proved to be feasible, it might provide a way, not only for the principled permission to use particular personal information, but a really effective approach to keeping data private while still allowing it to be linked where allowed.

The point here is just to open up the agenda. If we are to take the future of linked data and the Semantic Web seriously, in any case we need to think through what we do to link personal information. Just assuming that no one will want to link personal data is very unlikely to work in the long run.