Open : data : co-op

A very interesting event in Manchester on Monday (2014-10-20) called “Open : Data : Cooperation” was focused around the idea of “building a data cooperative”. The central idea was the cooperative management of personal information.

Related ideas have been going round for a long time. In 1999 I first came across a formulation of the idea of managing personal informaton in the book called “Net Worth“. Ten years ago I started talking about personal information brokerage with John Harrison, who has devoted years to this cause. In 2008, Michel Bauwens was writing about “The business case for a User Data Commons“.

A simple background story emerges from following the money. People spend money, whether their own or other people’s, and influence others in their spending of money. Knowing what people are ready to spend money on is valuable, because businesses with something to sell can present their offerings at an opportune moment. Thus, information which might be relevant to anyone buying anything is valuable, and can be sold. Naturally, the more money is at stake, the higher the price of information relevant to that purchase. Some information about a person can be used in this way over and over again.

Given this, it should be possible for people themselves to profit from giving information about themselves. And in small ways, they already do: store cards give a little return for the information about your purchases. But once the information is gathered by someone else, it is open for sale to others. One worry is that, maybe in the future if not right away, that information might enable some “wrong” people to know what you are doing, when you don’t want them to know.

Can an individual manage all that information about themselves better, both to keep it out of the wrong hands, and to get a better price for it from those to whom it is entrusted? Maybe; but it looks like a daunting task. As individuals, we generally don’t bother. We give away information that looks trivial, perhaps, for very small benefits, and we lose control of it.

It’s a small step from these reflections to the idea of people grouping together, the better to control data about themselves. What they can’t practically do separately, there is a chance of doing collectively, with enough efficiencies of scale to make it worthwhile, financially as well as in terms of peace of mind. You could call such a grouping a “personal data cooperative” or a “personal information mutual”, or any of a range of similar names.

Compared with gathering and holding data about the public domain, personal information is much more challenging. There are the minefields of privacy law, such as the Data Protection Act in the UK.

In Manchester on Monday we had some interesting “lightning” talks (I gave one myself – here are the slides on Slideshare,) people wrote sticky notes on relevant topics they were concerned about, and there were six areas highlighted for discussion:

  • security
  • governance
  • participation & inclusivity
  • technical
  • business model
  • legislative

I joined the participation and the technical group discussions. Both fascinated me, in different ways.

The participation discussion led to thoughts about why people would join a cooperative to manage their personal data. They need specific motivation, which could come from the kind of close-knit networks that deal with particular interests. There are many examples of closely knit on-line groups around social or political campaigns, about specific medical issues, or other matters of shared personal concern. Groups of these kinds may well generate enough trust for people to share their personal information, but they are generally not large enough to have much commercial impact, so they might struggle to be sustainable as personal data co-ops. What if, somehow, a whole lot of these minority groups could get together in an umbrella organisation?

Curiously, this has much in common with my personal living situation in a cohousing project. Despite many people’s yearnings (if not cravings) for secure acceptance of their minority positions, to me it looks like our cohousing project is too large and diverse a group for any one “cause” to be a key part of the vision for everyone. What we realistically have is a kind of umbrella in which all these good and worthy causes may thrive. Low carbon footprints; local, organic food; veganism; renewable energy; they’re all here. All these interest groups live within a co-operative kind of structure, where the governance is as far as possible by consensus.

So, my current living situation has resonances with this “participation” – and my current work is highly relevant to the “technical” discussion. But the technical discussion proved to be hard!

If you take just one area of personal-related information, and manage to create a business model using that information, the technicalities start to be conceivable.

For instance, Cetis (particularly my colleague Scott Wilson) has been involved in the HEAR (Higher Education Achievement Report) for quite some time. Various large companies are interested in using the HEAR for recruiting graduates. Sure, that’s not a cooperative scenario, but it does illustrate a genuine business case for using personal data gathered from education. Then one can think about how that information is structured; how it is represented in some transferable format; how the APIs for fetching such information should work. There is definite progress in this direction for HEAR information in the UK – I was closely involved in the less established but wider European initiative around representing the Diploma Supplement, and more can be found under the heading European Learner Mobility.

While the HEAR is progressing towards viability, The “ecosystem” around learner information more widely is not very mature, so there are still questions about how effective our current technical formats are. I’ve been centrally involved in two efforts towards standardization: Leap2A and InLOC. Both have included discussion about the conceptual models, which has never been fully resolved.

More mature areas are more likely to have stable technical solutions. Less mature areas may not have any generally agreed conceptual, structural models for the data; there may be no established business models for generating revenues or profits; and there may be no standards specifically designed for the convenient representation of that kind of data. Generic standards like RDF can cover any linked data, but they are not necessarily convenient or elegant, and may or may not lead to workable practical applications.

Data sources mentioned at this meeting included:

quantified self data
that’s all about your physiological data, and possibly related information
energy (or other utility) usage data
coming from smart meters in the home
purchasing data
from store cards and online shops
communication data
perhaps from your mobile device
learner information
in conjunction with learning technology, as I introduced

I’m not clear how mature any of these particular areas are, but they all could play a part in a personal data co-op. And because of the diversity of this data, as well as its immaturity, there is little one can say in general about technical solutions.

What we could do would be to set out just a strategy for leading up to technical solutions. It might go something like this.

  1. Agree the scope of the data to be held.
  2. Work out a viable business model with that data.
  3. Devise models of the data that are, as far as possible, intuitively understandable to the various stakeholders.
  4. Consider feasible technical architectures within which this data would be used.
  5. Start considering APIs for services.
  6. Look at existing standards, including generic ones, to see whether any existing standard might suffice. If so, try using it, rather than inventing a new one.
  7. If there really isn’t anything else that works, get together a good, representative selection of stakeholders, with experience or skill in consensus standardization, and create your new standard.

It’s all a considerable challenge. We can’t ignore the technical issues, because ignoring them is likely to lead just to good ideas that don’t work in practice. On the other hand, solving the technical issues is far from the only challenge in personal data co-ops. Long experience with Cetis suggests that the technical issues are relatively easy, compared to the challenges of culture and habit.

Give up, then? No, to me the concept remains very attractive and worth working on. Collaboratively, of course!

Portfolios need verifiability

Having verified information included in learner-owned portfolios looks attractive to employers and others, but perhaps it would be better to think in terms of verifiable information, and processes that can arrange verification on demand.

Along with Scott Wilson and others, I was at a meeting recently with a JISC-funded project about doing electronic certificates, somewhat differently from the way that Digitary do them. Now, the best approach to certifying portfolio information is far from obvious. But Higher Education is interested in providing information to various people about activities and results of those who have attended their institution, and employers and others are keen to know what can be officially certified. When people start by imagining an electronic transcript in terms their understanding of a paper transcript, inevitably the question of how to make it “secure” will come up, echoing questions of how to prevent forgery of paper certificates.

Lately, I have been giving people my opinion that portfolio information and institutionally (“primary source”) verified information are different, and don’t need to interact too closely. Portfolio holders may write what they like, just as they do in CVs, and if certificates or verification are needed, perhaps the unverified portfolio information can provide a link to a verified electronic certificate of achievement information (like the HEAR, the UK Higher Education Achievement Report, under development). This meeting moved my understanding forward from this fairly simple view, but there are still substantial gaps, so I’ll try to set out what I do understand, and ask readers for kind suggestions about what I don’t.

As Scott could tell you much more ably than me, there are plenty of problems with providing digitally signed certificates for graduates to keep in their own storage. I won’t go into those, just to say that the problem is a little like banknotes: you can introduce a new clever technology that is harder to forge, but sooner or later the crooks will catch up with you, and you have to move on to ever more complex and sophisticated techniques. So, in what perhaps I may call “our” view, it seems normally preferable to keep original verified information at source, or at some trusted service provider delegated by the primary source. There are then several ways in which this information can be shown to the people who wish to rely on its verified authority. In principle, these are set out in Scott’s page on possible architectures for the HEAR. But in detail, again at the meeting I realised something I hadn’t figured out before.

We have already proposed in outline a way in which each component part of an achievement document could have its own URI, so that links could be made to particular parts, and differential permissions given to each part. (See e.g. the CEN EuroLMAI PDF.) If each part of an achievement document is separately referenceable, the person to whom the document refers (let’s call this person the holder again) could allow different people to view different parts, for different times, etc., providing that achievement information servers can store that permission information alongside the structured achievement information itself.

Another interesting technical approach, possible at least in PebblePad (Shane Sutherland was helpfully contributing to the meeting), is transparently to include information from other servers, to view and manage in your portfolio tool. The portfolio holder would directly see what he or she was making available for others to view. The portfolio system itself might have general permission to access any information on the achievement information server, with the onward permissions managed by the portfolio system. Two potential issues might arise.

  1. What does giving general permission to an e-portfolio system mean for security? Would this be too much like leaving an open door into the achievement information server?
  2. As the information is presented by the portfolio server, how would the viewer know that the information really comes from the issuer’s server, and is thus validated? A simple mark may not be convincing.

A potential solution to the second point might start with the generation of a permission token on the issuer’s server whenever a new view is put together on the portfolio system. Then the viewer could request a certificate that combined just the information that was presented in that view. But, surely, there must be other more general solutions?

The approach outlined above might be satisfactory just for one achievement information server, but if the verified information covering a portfolio is distributed across several such servers, the process might be rather cumbersome, confusing even, as several part certificates would have to be shown. Better to deal with such certificates only as part of a one-off verification process, perhaps as part of induction to a new opportunity. Instead, if the holder were able to point from a piece of information to the one or more parts of the primary records that backed it up, and then to set permissions within the portfolio system for the viewer to be able to follow that link, the viewer could be given the permission to see the verified information behind any particular piece of information. Stepping back a little, it might look like this. Each piece of information in a portfolio presentation or system is part of a web of evidence. Some of that evidence is provided by other items in the portfolio, but some refers to primary trustable sources. The method of verification can be provided, at the discretion of the portfolio holder, for permitted viewers to follow, for each piece of information.

One last sidestep: the nice thing about electronic information is that it is very easy to duplicate exactly. If there is a piece of information on a trusted server, belonging to a portfolio holder, it is in principle easy for the holder to reproduce that piece of information in the holder’s own personal portfolio system. Given this one-to-one correspondence, for that piece of information there is exactly one primary source of verification, which is the achievement information server’s version of just that piece of information. The information in the portfolio can be marked as “verifiable”, and associated with its means of verification. A big advantage of this is that one can query a trusted server in the least revealing way possible: simply to say, does this person have this information associated with them? The answer would be, “yes”, or “no”, or “not telling” (if the viewer is not permitted to see that information).

Stepping back again, we no longer need any emphasis on representing “verified” information within a portfolio itself, but instead the emphasis is on representing “verifiable” information. The task of looking after this information then becomes one of making sure that the the verification queries are successful just when they should be. What does this entail? These are the main things that I am unclear about in this vision, and would be grateful to know. How do we use and transform personal information while retaining its verifiability? What is required to maintain that verifiability?