Doing analytics with open source linked data tools

Like most places, the University of Bolton keeps its data in many stores. That’s inevitable with multiple systems, but it makes getting a complete picture of courses and students difficult. We test an approach that promises to integrate all this data, and some more, quickly and cheaply.

Integrating a load of data in a specialised tool or data warehouse is not new, and many institutions have been using them for a while. What Bolton is trying in its JISC sponsored course data project is to see whether such a warehouse can be built out of Linked Data components. Using such tools promises three major advantages over existing data warehouse technology:

It expects data to be messy, and it expects it to change. As a consequence, adding new data sources, or coping with changes in data sources, or generating new reports or queries should not be a big deal. There are no schemas to break, so no major re-engineering required.

It is built on the same technology as the emergent web of data. Which means that increasing numbers of datasets – particularly from the UK government – should be easily thrown into the mix to answer bigger questions, and public excerpts from Bolton’s data should be easy to contribute back.

It is standards based. At every step from extracting the data, transforming it and loading it to querying, analysing and visualising it, there’s a choice of open and closed source tools. If one turns out not to be up to the job, we should be able to slot another in.

But we did spend a day kicking the tires, and making some initial choices. Since the project is just to pilot a Linked Enterprise Data (LED) approach, we’ve limited ourselves to evaluate just open source tools. We know there plenty of good closed source options in any of the following areas, but we’re going to test the whole approach before deciding on committing to license fees.

Data sources

D2RQ

Google Refine logo

Before we can mash, query and visualise, we need to do some data extraction from the sources, and we’ve come down on two tools for that: Google Refine and D2RQ. They do slightly different jobs.

Refine is Google’s power tool for anyone who has to deal with malformed data, or who just wants to transform or excerpt from format to another. It takes in CSV or output from a range of APIs, and puts it in table form. In that table form, you can perform a wide range of transformations on the data, and then export in a range of formats. The plug-in from DERI Galway, allows you to specify exactly how the RDF – the linked data format, and heart of the approach – should look when exported.

What Refine doesn’t really do (yet?) is transform data automatically, as a piece of middleware. All your operations are saved as a script that can be re-applied, but it won’t re-apply the operations entirely automagically. D2RQ does do that, and works more like middleware.

Although I’ve known D2RQ for a couple of years, it still looks like magic to me: you download, unzip it, tell it where your common or garden relational database is, and what username and password it can use to get in. It’ll go off, inspect the contents of the database, and come back with a mapping of the contents to RDF. Then start the server that comes with it, and the relational database can be browsed and queried like any other Linked Data source.

Since practically all relevant data in Bolton are in a range of relational databases, we’re expecting to use D2R to create RDF data dumps that will be imported into the data warehouse via a script. For a quick start, though, we’ve already made some transforms with Refine. We might also use scripts such as Oxford’s XCRI XML to RDF transform.

Storage, querying and visualisation

Callimachus project logo

We expected to pick different tools for each of these functions, but ended up choosing one, that does it all- after a fashion. Callimachus is designed specifically for rapid development of LED applications, and the standard download includes a version of the Sesame triplestore (or RDF database) for storage. Other triple stores can also be used with Callimachus, but Sesame was on the list anyway, so we’ll see how far that takes us.

Callimachus itself is more of a web application on top that allows quick visualisations of data excerpts- be they straight records of one dataset or a collection of data about one thing from multiple sets. The queries that power the Callimachus visualisations have limitations – compared to the full power of SPARQL, the linked data query language – but are good enough to knock up some pages quickly. For the more involved visualisations, Callimachus SPARQL 1.1 implementation allows the results a query to be put out as common or garden JSON, for which many different tools exist.

Next steps

We’ve made some templates already that pull together course information from a variety of sources, on which I’ll report later. While that’s going on, the main other task will be to set up the processes of extracting data from the relational databases using D2R, and then loading it into Callimachus using timed scripts.

Reviewing the future for Leap2

JISC commissioned a Leap2A review report (PDF), carried out early in 2012, that has now been published. It is available along with other relevant materials from the e-Portfolio interoperability JISC page. For anyone following the fortunes of Leap2A, it is highly worthwhile reading. Naturally, not all possible questions were answered (or asked), and I’d like to take up some of these, with implications for the future direction of Leap2 more generally.

The summary recommendations were as follows — these are very welcome!

  1. JISC should continue to engage with vendors in HE who have not yet implemented Leap2A.
  2. Engagement should focus on communities of practice that are using or are likely to use e-portfolios, and situations where e-portfolio data transfer is likely to have a strong business case.
  3. JISC should continue to support small-scale tightly focused developments that are likely to show immediate impact.
  4. JISC should consider the production of case studies from PebblePad and Mahara that demonstrate the business case in favour of Leap2A.
  5. JISC should consider the best way of encouraging system vendors to provide seamless import services.
  6. JISC should consider constructing a standardisation roadmap via an appropriate BSI or CEN route.

That tallies reasonably with the outcome of the meeting back in November last year, where we reckoned that Leap2A needs: more adoption; more evidence of utility; to be taken more into the professional world; good governance; more examples; and for the practitioner community to build around it models of lifelong development that will justify its existence.

Working backwards up the list for the Leap2A review report, recommendation 6 is one for the long term. It could perhaps be read in the context of the newly formed CETIS position on the recent Government Open Standards Consultation. There we note:

Established public standards bodies (such as ISO, BSI and CEN), while doing valuable work, have some aspects that would benefit from modernisation to bring them more into line with organisations such as W3C and OASIS.

The point then elaborated is that the community really needs open standards that are freely available as well as royalty-free and unencumbered. The de jure standards bodies normally still charge for copies of their standards, as part of their business model, which we see as outdated. If we can circumvent that issue, then BSI and CEN would become more attractive options.

It is the previous recommendation, number 5 in the list above, that I will focus on more, though. Here is the fuller version of that recommendation (appearing as paragraph 81).

One of the challenges identified in this review is to increase the usability of data exchange with the Leap2A specification, by removing the current necessity for separate export and import. This report RECOMMENDS that JISC considers the best way of encouraging system vendors to provide seamless data exchange services between their products, perhaps based on converging practice in the use of interoperability and discovery technologies (for example future use of RDF). It is recognised that this type of data exchange may require co-ordinated agreement on interoperability approaches across HEIs, FECs and vendors, so that e-portfolio data can be made available through web services, stressing ease of access to the learner community. In an era of increasing quantities of open and linked data, this recommendation seems timely. The current initiatives around courses information — XCRI-CAP, Key Information Sets (KIS) and HEAR — may suggest some suitable technical approaches, even though a large scale and expensive initiative is not recommended in the current financially constrained circumstances.

As an ideal, that makes perfect sense from the point of view of an institution transferring a learner’s portfolio information to another institution. However, seamless transfer is inherently limited by the compatibility (or lack of it) between the information stored in each system. There is also a different scenario, that has always been in people’s minds when working on Leap2A. It is that learners themselves may want to be able to download their own information, to keep for use, at an uncertain time in the future, in various ways that are not necessarily predictable by the institutions that have been hosting their information. In any case, the predominant culture in the e-portfolio community is that all the information should be learner-ownable, if not actually learner-owned. This is reflected in the report’s paragraph 22, dealing with current usage from PebblePad.

The implication of the Leap2A functionality is that data transfer is a process of several steps under the learner’s control, so the learner has to be well-motivated to carry it out. In addition Leap2A is one of several different import/export possibilities, and it may be less well understood than other options. It should perhaps be stressed here that PebblePad supports extensive data transfer methods other than Leap2A, including zip archives, native PebblePad transfers of whole or partial data between accounts, and similarly full or partial export to HTML.

This is followed up in the report’s paragraph 36, part of the “Challenges and Issues” section.

There also appears to be a gap in promoting the usefulness of data transfer specifically to students. For example in the Mahara and PebblePad e-portfolios there is an option to export to a Leap2A zip file or to a website/HTML, without any explanation of what Leap2A is or why it might be valuable to export to that format. With a recognisable HTML format as the other option, it is reasonable to assume that students will pick the format that they understand. Similarly it was suggested that students are most likely to export into the default format, which in more than one case is not the Leap2A specification.

The obvious way to create a simpler interface for learners is to have just one format for export. What could that format be? It should be noted first that separate files that are attached to or included with a portfolio will always remain separate. The issue is the format of the core data, which in normal Leap2A exports is represented by a file named “leap2a.xml”.

  1. It could be plain HTML, but in this case the case for Leap2A would be lost, as there is no easy way for plain HTML to be imported into another portfolio system without a complex and time-consuming process of choosing where each single piece of information should be put in the new system.
  2. It could be Leap2A as it is, but the question then would be, would this satisfy users’ needs? Users’ own requirements for the use of exports is not spelled out in the report, and it does not appear to have been systematically investigated anywhere, but it would be reasonable to expect that one use case would be that users want to display the information so that it can be cut and pasted elsewhere. Leap2A supports the display of media files within text, and formatting of text, only through the inclusion of XHTML within the content of entries, in just the same way as Atom does. It is not unreasonable to conclude that limiting exports to plain Leap2A would not fully serve user export needs, and therefore it is and will continue to be unreasonable to expect portfolio systems to limit users to Leap2A export only.
  3. If there were a format that fully met the requirements both for ease of viewing and cut-and-paste, and for relatively easy and straightforward importing to another portfolio system (comparable to Leap2A currently), it might then be reasonable to expect portfolio systems to have this as their only export format. Then, users would not have to choose, would not be confused, and the files which they could view easily and fully through a browser on their own computer system would also be able to be imported to another portfolio system to save the same time and effort that is currently saved through the use of Leap2A.

So, on to the question, what could that format be? What follows explains just what the options are for this, and how it would work.

The idea for microformats apparently originated in 2000. The first sentence of the Wikipedia article summarises nicely:

A microformat (sometimes abbreviated µF) is a web-based approach to semantic markup which seeks to re-use existing HTML/XHTML tags to convey metadata and other attributes in web pages and other contexts that support (X)HTML, such as RSS. This approach allows software to process information intended for end-users (such as contact information, geographic coordinates, calendar events, and the like) automatically.

In 2004, a more sophisticated approach to similar ends was proposed in RDFa. Wikipedia has “RDFa (or Resource Description Framework –in– attributes) is a W3C Recommendation that adds a set of attribute-level extensions to XHTML for embedding rich metadata within Web documents.”

In 2009 the WHATWG were developing Microdata towards its current form. The Microformats community sees Microdata as having grown out of Microformats ideas. Wikipedia writes “Microdata is a WHATWG HTML specification used to nest semantics within existing content on web pages. Search engines, web crawlers, and browsers can extract and process Microdata from a web page and use it to provide a richer browsing experience for users.”

Wikipedia quotes the Schema.org originators (launched on 2 June 2011 by Bing, Google and Yahoo!) as stating that it was launched to “create and support a common set of schemas for structured data markup on web pages”. It provides a hierarchical vocabulary, in some cases drawing on Microformats work, that can be used within the RDFa as well as Microdata formats.

Is it possible to represent Leap2A information in this kind of way? Initial exploratory work on Leap2R has suggested that it is indeed possible to identify a set of classes and properties that could be used more or less as they are with RDFa, or could be correlated with the schema.org hierarchy for use with Microdata. However, the solution needs detail adding and working through.

In principle, using RDFa or Microdata, any portfolio information could be output as HTML, with the extra information currently represented by Leap2A added into the HTML attributes, which is not directly displayed, and so does not interfere with human reading of the HTML. Thus, this kind of representation could fully serve all the purposes currently served by HTML export of Leap2A. It seems highly likely that practical ways of doing this can be devised that can convey the complete structure currently given by Leap2A. The requirements currently satisfied by Leap2A would be satisfied by this new format, which might perhaps be called “Leap2H5″, for Leap2 information in HTML5, or maybe alternatively “Leap2XR”, for Leap2 information in XHTML+RDFa (in place of Leap2A, meaning Leap2 information in Atom).

Thus, in principle it appears perfectly possible to have a single format that simultaneously does the job both of HTML and Leap2A, and so could serve as a plausible principal export and import format, removing that key obstacle identified in paragraph 36 of the Leap2A review report. The practical details may be worked out in due course.

There is another clear motivation in using schema.org metadata to mark up portfolio information. If a web page uses schema.org semantics, whether publicly displayed on a portfolio system or on a user’s own site, Google and others state that the major search engines will create rich snippets to appear under the search result, explaining the content of the page. This means, potentially, that portfolio presentations would be more easily recognised by, for instance, employers looking for potential employees. In time, it might also mean that the search process itself was made more accurate. If portfolio systems were to adopt export and import using schema.org in HTML, it could also be used for all display of portfolio information through their systems. This would open the way to effective export of small amounts of portfolio information simply by saving a web page displayed through normal e-portfolio system operation; and could also serve as an even more effective and straightforward method for transferring small amounts of portfolio information between systems.

Having recently floated this idea of agreeing Leap2 semantics in schema.org with European collaborators, it looks like gaining substantial support. This opens up yet another very promising possibility: existing European portfolio related formats could be harmonised through this new format, that is not biased towards any of the existing ones — as well as Leap2A, there is the Dutch NTA 2035 (derived from IMS ePortfolio), and also the Europass CV format. (There is more about this strand of unfunded work through MELOI.) All of these are currently expressed using XML, but none have yet grasped the potential of schema.org in HTML through microdata or RDFa. To restate the main point here, this means having the semantics of portfolio information embedded in machine-processable ways, without interfering with the human-readable HTML.

I don’t want to be over-optimistic, as currently money tends only to go towards initiatives with a clear business case, but I am hopeful that in the medium term, people will recognise that this is an exciting and powerful potential development. When any development of Leap2 gets funded, I’m suggesting that this is what to go for, and if anyone has spare resource to work on Leap2 in the meanwhile, this is what I recommend.

VLE commodification is complete as Blackboard starts supporting Moodle and Sakai

Unthinkable a couple of years ago, and it still feels a bit April 1st: Blackboard has taken over the Moodlerooms and NetSpot Moodle support companies in the US and Australia. Arguably as important is that they have also taken on Sakai and IMS luminary Charles Severance to head up Sakai development within Blackboard’s new Open Source Services department. The life of the Angel VLE Blackboard acquired a while ago has also been extended.

For those of us who saw Blackboard’s aggressive acquisition of commercial competitors WebCT and Angel, and seen the patent litigation they unleashed against Desire 2 Learn, the idea of Blackboard pledging to be a good open source citizen may seem a bit … unsettling, if not 1984ish.

But it has been clear for a while that Blackboard’s old strategy of ‘owning the market’ just wasn’t going to work. Whatever the unique features are that Blackboard has over Moodle and Sakai, they aren’t enough to convince every institution to pay for the license. Choosing between VLEs was largely about price and service, not functionality. Even for those institutions where price and service were not an issue, many departments had sometimes not entirely functional reasons for sticking with one or another VLE that wasn’t Blackboard.

In other words, the VLE had become a commodity. Everyone needs one, and they are fairly predictable in their functionality, and there is not that much between them, much as I’ve outlined in the past.

So it seems Blackboard have wisely decided to switch focus from charging for IP to becoming a provider of learning tool services. As Blackboard’s George Kroner noted, “It does kinda feel like @Blackboard is becoming a services company a la IBM under Gerstner

And just as IBM has become quite a champion of Open Source Software, there is no reason to believe that Blackboard will be any different. Even if only because the projects will not go away, whatever they do to the support companies they have just taken over. Besides, ‘open’ matters to the education sector.

Interoperability

Blackboard had already abandoned extreme lock-in by investing quite a bit in open interoperability standards, mostly through the IMS specifications. That is, users of the latest versions of Blackboard can get their data, content and external tool connections out more easily than in the past- it’s no longer as much of a reason to stick with them.

Providing services across the vast majority of VLEs (outside of continental Europe at least) means that Blackboard has even more of an incentive to make interoperability work across them all. Dr Chuck Severance’s appointment also strongly hints at that.

This might need a bit of watching. Even though the very different codebases, and a vested interest in openness, means that Blackboard sponsored interoperability solutions – whether arrived at through IMS or not – are likely to be applicable to other tools, this is not guaranteed. There might be a temptation to cut corners to make things work quickly between just Blackboard Learn, Angel, Moodle 1.9/2.x and Sakai 2.x.

On the other hand, the more pressing interoperability problems are not so much between the commodified VLEs anymore, they are between VLEs and external learning tools and administrative systems. And making that work may just have become much easier.

The Blackboard press releases on Blackboard’s website.
Dr Chuck Severance’s post on his new role.

Approaches to building interoperability and their pros and cons

System A needs to talk to System B. Standards are the ideal to achieve that, but pragmatics often dictate otherwise. Let’s have a look at what approaches there are, and their pros and cons.

When I looked at the general area of interoperability a while ago, I observed that useful technology becomes ubiquitous and predictable enough over time for the interoperability problem to go away. The route to get to such commodification is largely down to which party – vendors, customers, domain representatives – is most powerful and what their interests are. Which describes the process very nicely, but doesn’t help solve the problem of connecting stuff now.

So I thought I’d try to list what the choices are, and what their main pros and cons are:

A priori, global
Also known as de jure standardisation. Experts, user representatives and possibly vendor representatives get together to codify whole or part of a service interface between systems that are emerging or don’t exist yet; it can concern either the syntax, semantics or transport of data. Intended to facilitate the building of innovative systems.
Pros:

  • Has the potential to save a lot of money and time in systems development
  • Facilitates easy, cheap integration
  • Facilitates structured management of network over time

Cons:

  • Viability depends on the business model of all relevant vendors
  • Fairly unlikely to fit either actually available data or integration needs very well

A priori, local
i.e. some type of Service Oriented Architecture (SOA). Local experts design an architecture that codifies syntax, semantics and operations into services. Usually built into agents that connect to each other via an ESB.
Pros:

  • Can be tuned for locally available data and to meet local needs
  • Facilitates structured management of network over time
  • Speeds up changes in the network (relative to ad hoc, local)

Cons:

  • Requires major and continuous governance effort
  • Requires upfront investment
  • Integration of a new system still takes time and effort

Ad hoc, local
Custom integration of whatever is on an institution’s network by the institution’s experts in order to solve a pressing problem. Usually built on top of existing systems using whichever technology is to hand.
Pros:

  • Solves the problem of the problem owner fastest in the here and now.
  • Results accurately reflect the data that is actually there, and the solutions that are really needed

Cons:

  • Non-transferable beyond local network
  • Needs to be redone every time something changes on the local network (considerable friction and cost for new integrations)
  • Can create hard to manage complexity

Ad hoc, global
Custom integration between two separate systems, done by one or both vendors. Usually built as a separate feature or piece of software on top of an existing system.
Pros:

  • Fast point-to-point integration
  • Reasonable to expect upgrades for future changes

Cons:

  • Depends on business relations between vendors
  • Increases vendor lock-in
  • Can create hard to manage complexity locally
  • May not meet all needs, particularly cross-system BI

Post hoc, global
Also known as standardisation, consortium style. Service provider and consumer vendors get together to codify a whole service interface between existing systems; syntax, semantics, transport. The resulting specs usually get built into systems.
Pros:

  • Facilitates easy, cheap integration
  • Facilitates structured management of network over time

Cons:

  • Takes a long time to start, and is slow to adapt
  • Depends on business model of all relevant vendors
  • Liable to fit either available data or integration needs poorly

Clearly, no approach offers instant nirvana, but it does make me wonder whether there are ways of combining approaches such that we can connect short term gain with long term goals. I suspect if we could close-couple what we learn from ad hoc, local integration solutions to the design of post-hoc, global solutions, we could improve both approaches.

Let me know if I missed anything!

Business Adopts Archi Modelling Tool

Many technologies and tools in use in universities and colleges are not developed for educational settings. In the classroom particularly teachers have become skilled at applying new technologies such as Twitter to educational tasks. But technology also plays a crucial role behind the scenes in any educational organisation in supporting and managing learning, and like classroom tools these technologies are not always developed with education in mind. So it is refreshing to find an example of an application developed for UK Higher and Further education being adopted by the commercial sector.

Archi is an open source ArchiMate modelling tool developed as part of JISC’s Flexible Service Delivery programme to help educational institutions take their first steps in enterprise architecture modelling. ArchiMate is a modelling language hosted by the Open Group who describe it as “a common language for describing the construction and operation of business processes, organizational structures, information flows, IT systems, and technical infrastructure”. Archi enforces all the rules of ArchiMate so that the only relationships that can be established are those allowed by the language.

Since the release of version 1.0 in June 2010 Archi has built up a large user base and now gets in excess of 1000 downloads per month. Of course universities and colleges are not the only organisations that need a better understanding of their internal business processes, we spoke to Phil Beauvoir, Archi developer at JISC CETIS, about the tool and why it has a growing number of users in the commercial world.

Christina Smart (CS): Can you start by giving us a bit of background about Archi and why was it developed?

Phil Beauvoir (PB): In summer of 2009 Adam Cooper asked whether I was interested in developing an ArchiMate modelling tool. Some of the original JISC Flexible Service Delivery projects had started to look at their institutional enterprise architectures, and wanted to start modelling. Some projects had invested in proprietary tools, such as BiZZdesign’s Architect, and it was felt that it would be a good idea to provide an open source alternative. Alex Hawker (the FSD Programme manager) decided to invest six months of funding to develop a proof of concept tool to model using the ArchiMate language. The tool would be aimed at the beginner, be open source, cross-platform and would have limited functionality. I started development on Archi in earnest in January 2010 and by April had the first alpha version 0.7 ready. Version 1.0 was released in June 2010, it grew from there.

CS: How would you describe Archi?
PB: The web site describes Archi as: “A free, open source, cross platform, desktop application that allows you to create and draw models using the ArchiMate language”. Users who can’t afford proprietary software, would use standard drawing tools such as Omnigraffle or Visio for modelling. Archi is positioned somewhere between those drawing tools and a tool like BiZZdesign’s Architect. It doesn’t have all the functionality and enterprise features of the BiZZdesign tool, but it has more than just plain drawing tools. Archi also has hints and helps and user assistance technology built into it, so when you’re drawing elements there are certain ArchiMate rules about which connections you can make, if you try to make a connection that’s not allowed you get an explanation why not. So for the beginner it is a great way to start understanding ArchiMate. We keep the explanations simple because we aim to make things easier for those users who beginners in ArchiMate. As the main developer I try to keep Archi simple, because there’s always a danger that you can keep adding on features and that would make it unusable. I try to steer a course between usability and features.

Archi screenshot

Archi screenshot

Another aspect of Archi is the way it supports the modelling conversation. Modelling is not done in isolation; it’s about capturing a conversation between key stakeholders in an organisation. Archi allows you to sketch a model and take notes in a Sketch View before you add the ArchiMate enterprise modelling rules. A lot of people use the Sketch View. It enables a capture of a conversation, the “soft modelling” stage before undertaking “hard modelling”.

CS: How many people are using it within the Flexible Service Delivery programme?
PB: I’m not sure, I know the King’s College, Staffordshire and Liverpool John Moores projects were using it. Some of the FSD projects tended to use both Architect and Archi. If they already had one licence for BiZZdesign Architect they would carry on using it for their main architect, whereas other “satellite” users in the institution would use Archi.

CS: Archi has a growing number of users outside education, who are they and how did they discover Archi?
PB: Well the first version was released in June 2010, and people in the FSD programme were using it. Then in July 2010 I got an email from a large Fortune 500 insurance company in the US, saying they really liked the tool and would consider sponsoring Archi if we implemented a new feature. I implemented the feature anyway and we’ve built up the relationship with them since then. I know that this company has in the region of 100 enterprise architects and they’ve rolled Archi out as their standard enterprise architecture modelling tool.

I am also aware of other commercial companies using it, but how did they discover it? Well I think it’s been viral. A lot of businesses spend a lot of money advertising and pushing products, but the alternate strategy is pull, when customers come to you. Archi is of the pull variety, because there is a need out there, we haven’t had to do very much marketing, people seem to have found Archi on their own. Also the TOGAF (The Open Group Architecture Framework) developed by the Open Group is becoming very popular and I guess Archi is useful for people adopting TOGAF.

In 2010 BiZZdesign were I think concerned about Archi being a competitor in the modelling tool space. However now they’re even considering offering training days on Archi, because Archi has become the de facto free enterprise modelling tool. Archi will never be a competitor to BiZZdesign’s Architect, they have lots of developers and there’s only me working on Archi, it would be nuts to try to compete. So we will focus on the aspects of Archi that make it unique, the learning aspects, the focus on beginners and the ease of use, and clearly forge out a path between the two sets of tools.

Many people will start with Archi and then upgrade to BiZZdesign’s Architect, so we’re working on that upgrade path now.

CS: Why do you think it is so popular with business users?
PB: I’m end-user driven, for me Archi is about the experience of the end users, ensuring that the experience is first class and that it “just works”. It’s popular with business users firstly because it’s free, secondly because it works on all platforms, thirdly because it’s aimed at those making their first steps with ArchiMate.

CS: What is the immediate future for Archi?
PB: We’re seeking sponsorship deals and other models of sustainability because obviously JISC can’t go on supporting it forever. One of the models of sustainability is to get Archi adopted by something like the Eclipse Foundation. But you have to be careful that development continues in those foundations, because there is a risk of it becoming a software graveyard, if you don’t have the committers who are prepared to give their time. There is a vendor who has expressed an interest in collaborating with us to make sure that Archi has a future.

Lots of software companies now have service business models, so you provide the tool for free but charge for providing services on top of the free tool. The Archi tool will always be free, anyone could package it up and sell it. I know they’re doing that in China because I’ve had emails from people doing it, they’ve translated it and are selling it and that’s ok because that’s what the licence model allows.

In terms of development we’re adding on some new functionality. A new concept of a Business Model Canvas is becoming popular, where you sketch out your new business models. The canvas is essentially a nine box grid which you add various key partners, stakeholders etc to. We’re adding a canvas construction kit to Archi, so people can design their own canvas for new business models. The canvas construction kit is aimed at the high level discussions that people have when they start modelling their organisations.

CS: You’ve developed a number of successful applications for the education sector over the years, including, Colloquia, Reload and ReCourse, how do you feel the long term future for Archi compares with those?
PB: Colloquia was the first tool I developed back in 1998, and I don’t really think it’s used anymore. But really Colloquia was more a proof of concept to demonstrate that you could create a learning environment around the conversational model, which supported learning in a different way from the VLEs that were emerging at the time. Its longevity has been as a forerunner to social networking and to the concept of the Personal Learning Environment.

Reload was a set of tools for doing content packaging and SCORM. They’re not meant for teachers, but they’re still being used.

The ReCourse Learning Design tool developed for a very niche audience of those people developing scripted learning designs.

I think the long term future for Archi is better than those, partly because there’s a very large active community using it, and partly because it can be used by all enterprises and isn’t just a specific tool for the education sector. I think Archi has an exciting future.

User feedback
Phil has received some very positive feedback about Archi via email from JISC projects as well as those working in the commercial world.

JISC projects
“The feeling I get from Archi is that it’s helping me to create shapes, link and position them rather than jumping around dictating how I can work with it. And the models look much nicer too… I think Archi will allow people to investigate EA modelling cost free to see whether it works for them, something that’s not possible at the moment.”

“So why is Archi significant? It is an open source tool funded by JISC based on the ArchiMate language that achieves enough of the potential of a tool like BiZZdesign Architect to make it a good choice for relatively small enterprises, like the University of Bolton to develop their modelling capacity without a significant software outlay.” [15] Stephen Powell from the Co-educate project (JISC Curriculum Design Programme).

Commercial
“I’m new to EA world, but Archi 1.1 makes me fill like at home! So easy to use and so exciting…”

“Version 1.3 looks great! We are rolling Archi out to all our architects next week. The ones who have tried it so far all love it.”

Find Out More
If this interview has whetted your appetite, more information about Archi, and the newly released version 2.0 is available at http://archi.cetis.org.uk. For those in the north, there will be an opportunity to see Archi demonstrated at the forthcoming 2nd ArchiMate Modelling Bash being held in St Andrews on the 1st and 2nd November.

PROD; a practical case for Linked Data

Georgi wanted to know what problem Linked Data solves. Mapman wanted a list of all UK universities and colleges with postcodes. David needed a map of JISC Flexible Service Delivery projects that use Archimate. David Sherlock and I got mashing.

Linked Data, because of its association with Linked Open Data, is often presented as an altruistic activity, all about opening up public data and making it re-usable for the benefit of mankind, or at least the tax-payers who facilitated its creation. Those are a very valid reasons, but they tend to obscure the fact that there are some sound selfish reasons for getting into the web of data as well.

In our case, we have a database of JISC projects and their doings called PROD. It focusses on what JISC-CETIS focusses on: what technologies have been used by these projects, and what for. We also have some information on who was involved with the projects, and were they worked, but it doesn’t go much beyond bare names.

In practice, many interesting questions require more information than that. David’s need to present a map of JISC Flexible Service Delivery projects that use Archimate is one of those.

This presents us with a dilemma: we can either keep adding more info to PROD, make ad-hoc mash-ups, or play in the Linked Data area.

The trouble with adding more data is that there is an unending amount of interesting data that we could add, if we had infinite resources to collect and maintain it. Which we don’t. Fortunately, other people make it their business to collect and publish such data, so you can usually string something together on the spot. That gets you far enough in many cases, but it is limited by having to start from scratch for virtually every mashup.

Which is where Linked Data comes in: it allows you to link into the information you want but don’t have.

For David’s question, the information we want is about the geographical position of institutions. Easily the best source for that info and much more besides is the dataset held by the JISC’s Monitoring Unit. Now this dataset is not available as Linked Data yet, but one other part of the Linked Data case is that it’s pretty easy to convert a wide variety of data into RDF. Especially when it is as nicely modelled as the JISC MU’s XML.

All universities with JISC Flexible Delivery projects that use Archimate. Click for the google map

All universities with JISC Flexible Delivery projects that use Archimate. Click for the google map

Having done this once, answering David’s question was trivial. Not just that, answering Mapmans’ interesting question on a list of UK universities of colleges with postcodes was a piece of cake too. That answer prompted Scott and Tony’s question on mapping UCAS and HESA codes, which was another five second job. As was my idle wonder whether JISC projects by Russell group universities used Moodle more or less than those led by post ’92 institutions (answer: yes, they use it more).

Russell group led JISC projects with and without Moodle

Russell group led JISC projects with and without Moodle

Post 92 led JISC projects with or without Moodle

Post '92 led JISC projects with or without Moodle

And it doesn’t need to stop there. I know about interesting datasets from UKOLN and OSSWatch that I’d like to link into. Links from the PROD data to the goodness of Freebase.com and dbpedia.org already exist, as do links to MIMAS’ names project. And each time such a link is made, everyone else (including ourselves!) can build on top of what has already been done.

This is not to say that creating Linked Data out of PROD was for free, nor that no effort is involved in linking datasets. It’s just that the effort seems less than with other technologies, and the return considerably more.

Linked Data also doesn’t make your data automatically usable in all circumstances or bug free. David, for example, expected to see institutions on the map that do use the Archimate standard, but not necessarily as a part of a JISC project. A valid point, and a potential improvement for the PROD dataset. It may also have never come to light if we hadn’t been able to slice and dice our data so readily.

An outline with recipes and ingredients is to follow.

W3C Opens UK & Ireland Office

Yesterday I attended the launch event of the new W3C UK & Ireland office in Oxford, hosted by Nominet (who are hosting the office, not just the launch event).

It was a relatively short event (half a day) but packed full with some interesting talks, showcasing the work that is being done with the web by various parties in collaboration with the W3C. The talks did a nice job of giving us a look at how central the web is in fields like mobile delivery (MobileAware & Vodafone), future media (from the BBC), Internet & television (BBC R&D) and, underpinning much of this, was the importance and role of the web in sociological terms, with Prof. Bill Dutton, Director of the Oxford Internet Institution, rounding off things with a look at Freedom of Connection & Freedom of Expression. Prof. Dutton highlighted elements of a forthcoming UNESCO report that provides a new perspective on the social and political dynamics behind threats to freedom of expression using the Internet and the web through digital rights issues and how technical, legal and regulatory measures might be constraining the freedom that many of us see the Internet allowing us today. A line that stood out for me in particular was:

Freedom of expression is not an inevitable outcome of technological innovation

Sir Tim Berners Lee kicked off proceedings with a bit of history behind his invention of the web and the subsequent creation of the W3C, whose goal, Sir Tim told us, is to “lead the web to its full potential”. Around 20-25% of the globe now uses the web but now we have reached a point where we need to look at why the other 75-80% don’t. The W3C Web Foundation (http://www.w3.org/2009/Talks/0318_bratt_WebFoundation/WebFoundation.pdf) is there to tackle this issue and figure out ways to accelerate the take up of the web in the parts of the world that still don’t have it.

Sir Tim Berners Lee

Sir Tim Berners Lee

Sir Tim talked about the role of the web in supporting justice and democracy too (something that the UNESCO report investigates as I wrote previously) and asked the question of how we can optimise the web to support wider and more efficient democracy. Science too. How do we design the web to more easily bring together part formed ideas across people and countries to help these ideas feed off each other and evolve. And how can the web – in this new age of social networking – help us work more effectively and communicate wider than simply “friends of friends”, breaking through traditional social barriers and forming new relationships that may not normally occur?

An interesting question from the audience was the one around temporal bubble and how to ensure we can still view the web as we have now in decades to come – after all, so much content from 10 years ago cannot now be viewed (without a painstaking process of content conversion). It was a timely revisit to that question as on the train down I was reading about the hundreds of thousands of photographs shared on the fotopic.net have recently simply vanished due to fotopic going into liquidation. Then the day after I read that Google is now telling users of their Google Video service that they need to move them off there as, while it hasn’t supported new uploads for quite some time, Google will actually be folding the whole thing and putting up the closed sign.

So that was all just in the opening talk!

HTML5 Logo

HTML5 Logo

We went on to hear about the W3C’s Open Web Platform and how HTML5 and related web standards are extending and evolving the power of the web, making it central to areas like mobile, gaming, government and social networking. On the topic of mobile, J Alan Bird of the W3C stated that,

The open web platform is the new mobile operating system

and the W3C’s work is ongoing to make it as robust as possible.

Dr. Adrian Woolard of BBC R&D talked about their work in Internet TV and how they are looking to free this from the set-top box, while focusing on the accessibility of New Broadcasting products and services. We’ve had the web on our televisions for a few years now, well, those of us with a Wii or Playstation 3 that is. But the Internet will be moving into the TV itself. On this topic the W3C recently formed the Web & Television Interest Group (January 2011) to start looking at requirements that will then form recommendations and a Working Group that will approach the standards issue in this space – see http://www.w3.org/2010/09/webTVIGcharter.html. This is something that I want to take a bit further in a future article, around the web in a Post-PC world. We’ve had the web on PCs for over a decade now, we have it, increasingly, in powerful mobile devices in our pockets, tablets, and now…that bastion of the living room…the TV!

Dan Appelquist of Vodafone outlined the company’s commitment to working with the W3C in the mobile space and nicely highlighted some of the reasons why Vodafone look to work with the W3C, contributing to web standards. Something Dan mentioned (kind of in passing) that I didn’t know about was around the social networking space. One was OneSocialWeb project (http://onesocialweb.org/), a free decentralised approach to the social network (in fact I’ve just this minute found they have an iPhone app that I’ll be duly installing after writing this) and something more grounded in the CETIS Standards space – oStatus, an open standard for distributed status updates, across networks. See http://ostatus.org/about

Ralph Rivera, Director of BBC Future Media talked to us about how the BBC is looking at the digital public space it inhabits as much as the programmes and services it creates and outlined what digital public space means to the BBC, and how the W3C and BBC can work in partnership. Ralph said a couple of things that really stood out for me. One was that the BBC is looking at the 2012 Olympics and planning their digital products & services around it to do for online broadcasting what the Coronation did for television. I thought that was pretty cool. He also said this, and I’ll round off the article with this…

There is no more important digital space than the web itself

I like that.

And the Winner Is … The UK

I have spent this week at the IMS Learning Impact Conference in Long Beach California. I’ve enjoyed the conference and sensed a remarkably fresh approach, amongst delegates and IMS alike, to standards and their role in educational technology. Overall I’d suggest a strong re- affirmation that the direction of travel we have been following in CETIS is very much on course. Lots of talk of openness, collaboration and Learner centred approaches (I’ll reflect on this in my next blog post). As is custom at this event the final activity, before workshops and working group meetings, is the annual Learning Impact Awards. It was something akin to the British (music) invasion of the early 1960′s with the UK dominating the platinum awards across all categories winners included The BBC for their accessibility tool kit ASK, Pebblepad and the Nottingham Xerte online toolkit Three out of the four main awards to the UK with two of these being accessibility tools.

XCRI Support Project wraps up

March sees the end of the JISC funded XCRI Support Project as it signs off leaving the development of the XCRI (eXchanging Course Related Information) specification for sharing (and advertising) course information looking very healthy indeed.

The support project picked up where the original XCRI Reference Model project left off. Having identified the marketing and syndication of course descriptions as a significant opportunity for innovation – due to the general practice in this area being one of huge efforts around re-typing of information to accommodate various different systems, sites and services…then to have that information maintained separately in various places – the XCRI Reference Model project mapped out the spaces of course management, curriculum development and course marketing and provided the community with a common standard for exchanging course related information. This would streamline approaches to the syndication of such information and give us the benefits of cost savings when it comes to collecting and managing the data and opens up the opportunities for a more sustainable approach to lifelong learning services that rely on course information from learning providers.

Over the course of the next three years the XCRI Support project developed the XCRI Course Advertising Profile (XCRI-CAP), an XML specification designed to enable course offerings to be shared by providers (rather like an RSS feed) and by other services such as lifelong learning sites, course search sites and other services that support learners in trying to find the right courses for them. Through the supervision and support of several institutional implementation projects the support project – a partnership between JISC CETIS at the University of Bolton (http://bit.ly/PZdKw), Mark Stubbs of Manchester Metropolitan University (http://bit.ly/PZdKw) and Alan Paull of APS Limited (http://bit.ly/cF6Fhd) – promoted the uptake and sustainability of XCRI through engagement with the European standards process and endorsement by the UK Information Standards Board. Through this work the value of XCRI-CAP was demonstrated so successfully as to ensure it was placed on the strategic agenda of national agencies.

Hotcourses manages the National Learning Directory under contract from the Learning and Skills Council (LSC). With over 900,000 course records and 10,000 learning providers the NLD is possibly the largest source of information about learning opportunities in the UK, which learners and advisers can access through dozens of national, regional and local web portals. Working with a number of Further Education colleges Hotcourses is now developing and piloting ‘bulk upload’ facilities using XCRI to ease the burden on learning providers supplying and maintaining their information on the NLD. UCAS also continues to make progress towards XCRI adoption. Most recently, at the ISB Portfolio Learning Opportunities and Transcripts Special Interest Group on January 27, 2010, UCAS colleagues described a major data consolidation project that should pave the way for a data transfer initiative using XCRI, and cited growing demand from UK HEIs for data transfer rather than course-by-course data entry through UCAS web-link. The project is a two-phase one, with XCRI implementation in phase II, which is due to deliver sometime in 2011.

Having ensured that the specification gained traction and uptake the project has worked extensively at developing the core information used by XCRI into a European Norm with harmonisation from other standards that addressed this space developed elsewhere across Europe. It is this process which has seen the evolution of XCRI from a standalone specification to a UK application profile of a recognised international standard. This could now be transitioned to an actual British Standard through BSI IST 43 (the committee of the British Standards Institution which looks at technical standards for learning education and training). At the same time adoption of the specifications were continued to be supported through engagement with policymakers and suppliers while the technical tools developed for adopters continued to be updated and maintained.

XCRI Aggregator DemoA couple of key tools were developed by the support project to assist implementers of XCRI. An aggregator engine was setup and maintained by the project and is demonstrated at http://www.xcri.org/aggregator/. This shows how its possible to deploy an aggregator setup that pulls in courses from several providers, and offers a user interface with basic features such as searching, browsing, bookmarking, tags and so on. It also demonstrates some value-added aspects such as geocoding the course venues and displaying them on Google Maps. Once you’ve had a look at the demonstrator you can get hold of the code for it at http://bit.ly/9eViM2
The project also developed an XCRI Validator to help implementers check their data structure and content. This goes beyond structural validation to also analyse content and provide advice on common issues such as missing information. Currently the development of this is very much at a beta stage but implementers can try out this early proof-of-concept at http://bit.ly/aeLArY. Accompanying this is a blog post describing how to use the validator at http://bit.ly/aHoJtH

Up to press there have been around 15-20 “mini-projects” which were funded to pilot implementation of XCRI within institutions. These looked at developing course databases using the specification, extending existing systems and methods to map to XCRI and the general implementation of generating the information and the exporting of this via web services. Not to say that this was the only project activity around XCRI. Various other Lifelong Learning projects have had an XCRI element to them along the way and all these have contributed to forming an active community around the development and promotion of the spec.

This community’s online activity is centred around a wiki and discussion forum on the XCRI Support Project website at http://xcri.org and while the support project is now officially at an end, the website will stay around as long as there is a community using it – currently its maintained by CETIS. Some XCRI.org content may move to JISC Advance as XCRI moves from innovation into mainstream adoption. However, as long as people are trying out new things with XCRI – whether thats vocabularies and extensions or new exchange protocols – then XCRI.org provides a place to talk about it, with the LLL & WFD project at Liverpool (CCLiP – http://www.liv.ac.uk/cll/cclip/) currently looking at how to improve the site and provide more information for non-technical audiences.

More information on the XCRI projects can be found at the JISC website, specifically at http://bit.ly/awevwQ