The cloud is for the boring

Members of the Strategic Technologies Group of the JISC’s FSD programme met at King’s Anatomy Theatre to, ahem, dissect the options for shared services and the cloud in HE.

The STG’s programme included updates on projects of the members as well as previews of the synthesis of the Flexible Service Delivery programme of which the STG is a part, and a preview of the University Modernisation Fund programme that will start later in the year.

The main event, though, was a series of parallel discussions on business problems where shared services or cloud solutions could make a difference. The one I was at considered a case from the CUMULUS project; how to extend rather than replace a Student Record System in a modular way.

View from the King's anatomy theatre up to the clouds

View from the King's anatomy theatre up to the clouds

In the event, a lot of the discussion revolved around what services could profitably be shared in some fashion. When the group looked at what is already being run on shared infrastructure and what has proven very difficult, the pattern is actually very simple: the more predictable, uniform, mature, well understood and inessential to the central business of research and education, the better. The more variable, historically grown, institution specific and bound up with the real or perceived mission of the institution or parts thereof, the worse.

Going round the table to sort the soporific cloudy sheep from the exciting, disputed, in-house goats, we came up with following lists:

Cloud:

  • email
  • Travel expenses
  • HR
  • Finance
  • Student network services
  • Telephone services
  • File storage
  • Infrastructure as a Service

In house:

  • Course and curriculum management (including modules etc)
  • Admissions process
  • Research processes

This ought not to be a surprise, of course: the point of shared services – whether in the cloud or anywhere else – is economies of scale. That means that the service needs to be the same everywhere, doesn’t change much or at all, doesn’t give the users a competitive advantage and has well understood and predictable interfaces.

Meshing up a JISC e-learning project timeline, or: It’s Linked Data on the Web, stupid

Inspired by the VirtualDutch timeline, I wondered how easy it would be to create something similar with all JISC e-learning projects that I could get linked data for. It worked, and I learned some home truths about scalability and the web architecture in the process.

As Lorna pointed out, UCL’s VirtualDutch timeline is a wonderful example of using time to explore a dataset. Since I’d already done ‘place’ in my previous meshup, I thought I’d see how close I could get with £0.- and a few nights tinkering.

The plan was to make a timeline of all JISC e-learning projects, and the developmental and affinity relations between them that are recorded in CETIS’ PROD database of JISC projects. More or less like Scott Wilson and I did by hand in a report about the toolkits and demonstrators programme. This didn’t quite work; SPARQLing up the data is trivial, but those kinds of relations don’t fit well into the widely used Simile timeline from both a technical and usability point of view. I’ll try again with some other diagram type.

What I do have is one very simple, and one not so simple timeline for e-learning projects that get their data from the intensely useful rkb explorer Linked Data knowledge base of JISC projects and my own private PROD stash:

Recipe for the simple one

  • Go to the SPARQL proxy web service
  • Tell it to ask this question
  • Tell the proxy to ask that question of the RKB by pointing it at the RKB SPARQL endpoint (http://jisc.rkbexplorer.com/sparql/), and order the results as CSV
  • Copy the URL of the CSV results page that the proxy gives you
  • Stick that URL in a ‘=ImportData(“{yourURL}”)’ function inside a fresh Google Spreadsheet
  • Insert timeline gadget into Google spreadsheet, and give it the right spreadsheet range
  • Hey presto, one timeline coming up:
Screenshot of the simple mashup- click to go to the live meshup

Screenshot of the simple mashup- click to go to the live meshup

Recipe for the not so simple one

For this one, I wanted to stick a bit more in the project ‘bubbles’, and I also wanted the links in the bubbles to point to the PROD pages rather than the RKB resource pages. Trouble is, RKB doesn’t know about data in PROD, and for some very sound reasons I’ll come to in a minute, won’t allow the pulling in of external datasets via SPARQL’s ‘FROM’ operator either. All other SPARQL endpoints in the web that I know of that allow FROM couldn’t handle my query- they either hung or conked out. So I did this instead:

  • Download and install your own SPARQL endpoint (I like Leigh Dodds’ simple but powerful Twinkle)
  • Feed it this query
  • Copy and paste the results into a spreadsheet and fiddle with concatenation
  • Hoist spreadsheet into Google docs
  • Insert timeline gadget into Google spreadsheet, and give it the right spreadsheet range
  • Hey presto, a more complex timeline:
Click to go to the live meshup

Click to go to the live meshup

It’s Linked Data on the Web, stupid

When I first started meshing up with real data sets on the web (as opposed to poking my own triple store), I had this inarticulate idea that SPARQL endpoints were these magic oracles that we could ask anything about anything. And then you notice that there is no federated search facility built into the language. None. And that the most obvious way of querying across more than one dataset – pulling in datasets from outside via SPARQL’s FROM – is not allowed by many SPARQL endpoints. And that if they do allow FROM, they frequently cr*p out.

The simple reason behind all this is that federated search doesn’t scale. The web does. Linked Data is also known as the Web of Data for that reason- it has the same architecture. SPARQL queries are computationally expensive at the best of times, and federated SPARQL queries would be exponentially so. It’s easy to come up with a SPARQL query that either takes a long time, floors a massive server (or cloud) or simply fails.

That’s a problem if you think every server needs to handle lots of concurrent queries all the time, especially if it depends on other servers on a (creaky) network to satisfy those queries. By contrast, chomping on the occasional single query is trivial for a modern PC, just like parsing and rendering big and complex html pages is perfectly possible on a poky phone these days. By the same token, serving a few big gobs of (RDF XML) text that sits at the end of a sensible URL is an art that servers have perfected over the past 15 years.

The consequence is that exposing a data set as Linked Data is not so much a matter of installing a SPARQL endpoint, but of serving sensibly factored datasets in RDF with cool URLs, as outlined in Designing URI Sets for the UK Public Sector (pdf). That way, servers can easily satisfy all comers without breaking a sweat. That’s important if you want every data provider to play in Linked Data space. At the same time, consumers can ask what they want, without constraint. If they ask queries that are too complex or require too much data, then they can either beef up their own machines, or live with long delays and the occasional dead SPARQL engine.

SOA only really works webscale

Just sat through a few more SOA talks today, and, as usual, the presentations circled ’round to governance pretty quick and stayed there.

The issue is this: soa promises to make life more pleasant by removing duplication of data and functionality. Money is saved and information is more accurate and flows more freely, because we tap directly into the source systems, via their services.

So far the theory. The problem is that organisations in soa exercises have a well documented tendency to re-invent their old monolithic applications as sets of isolated services that make most sense to themselves. And here goes the re-use argument: everyone uses their own set of services, with lots of data and functionality duplication.

Unless, of course, your organisation has managed to set up a Governance Police that makes everyone use the same set of centrally sanctioned services. Which is, let’s say, not always politically feasible.

Which made me think of how this stuff works on the original service oriented architecture: the web. The most obvious attribute of the web, of course, is that there is no central authority over service provision and use. People just use what is most useful to them- and that is precisely the point. Instead of governance, the web has survival of the fittest: the search engine that gives the best answers gets used by everyone.

Trying to recreate that sort of Darwinian jungle within the enterprise seems both impossible and a little misguided. No organisation has the resources to just punt twenty versions of a single service in the full knowledge that at least nineteen will fail.

Or does it? Once you think about the issue webscale, such a trial-and-error approach begins to look more do-able. For a start, an awful lot of current services are commodities that are the same across the board: email, calendars, CRM etc. These are already being sourced from the web, and there are plenty more that could be punted by entrepreneurial -shared – service providers with a nous for the education system (student record system, HR etc.)

That leaves the individual HE institutions to concentrate on those services that provide data and functionality that are unique to themselves. Those services will survive, because users need them, and they’re also so crucial that institutions can afford to experiment before a version is found that does the job best.

I’ll weasel out of naming what those services will be: I don’t know. But I suspect it will be those that deal with the institution’s community (‘social network’ if you like) itself.

Prof. Zhu’s presentation on e-education in China

Initially, it’s hard to get past the eye-popping numbers (1876 universities, 17 million students and so on) but once you do, you’ll see that the higher education sector in China is facing remarkably familiar challenges with some interesting solutions.

We were very fortunate here at IEC that Prof Zhu Zhiting and colleagues from Eastern China Normal University and the China e-Learning Technology Standardization Committee agreed to visit our department after attending the JISC CETIS conference yesterday. He kindly agreed to let us publish his slides, which are linked below.

The two most noticeable aspects of prof. Zhu’s presentation are the nature of planning e-education in China, and the breadth of interests in Prof. Zhu’s Distance Education College & e-Educational System Engineering Research Center.

Because the scale of education in China is so vast, any development has to be based on multiple layers of initiatives. The risks involved mean that the national ministery of education needs to plan at very high, strategic levels, that set out parameters for regional and local governments to follow. This is not new per se, but it leads to a thoroughness and predictability in infrastructure that others could learn from.

The department in Shanghai, though, is another matter. Their projects range from international standardisation right down to the development of theories that integrate short term and long term individual memory with group memory. Combined with concrete projects such as the roll-out of a lifelong learning platform for the citizens of Shanghai, that leads to some serious synergies.

Learn more from Prof. Zhu’s slides

More about IEC and what it does.

If Enterprise Architecture is about the business, where are the business people?

The Open Group Enterprise Architecture conference in Munich last month saw a first meeting of Dutch and British Enterprise Architecture projects in Higher Education.

Probably the most noticeable aspect of the enterprise architecture in higher education session was the commonality of theme not just between the Dutch and British HE institutions, but also between the HE contingent and the enterprise architects of the wider conference. There are various aspects to the theme, but it really boils down to one thing: how does a bunch of architects get a grip on and then re-fashion the structure of a whole organisation?

In the early days, the answer was simply to set the scope of an architecture job to the expertise and jurisdiction of the typical enterprise architect team: IT systems. In both the notional goal of architecting work as well as its practice, that focus on just IT seems to be too limiting. Even a relatively narrow interpretation of the frequently cited goal of enterprise architecture – to better align systems to the business – presupposes a heavy involvement of all departments and the clout to change practices across the organisation.

A number of strategies to overcome the conundrum were reported on by the HE projects. One popular method is to focus on one concrete project development at a time, and evolve an architecture iteratively. Another is to involve everyone by letting them determine and agree on a set of principles that underpin the architecture work before it starts. Yet other organisations tackle the scope and authority issue head-on and sort governance structures before tackling the structure of the organisation; much like businesses tend to do.

In either of these cases, though, architects remain mostly focussed on IT systems, while remaining wholly reliant on the rest of the organisation for what the systems actually look like and clues about what they should do.

Presentations can be seen on the JISC website

AJAX alliance to start interoperability work

Funny how, after an initial development rush, a community around a new technology will hit some interoperability issues, and then start to address it via some kind of specification initiative. AJAX, the browser-side interaction technique that brought you google maps, is in that phase right now.

Making Asynchronous JavaScript and XML (AJAX) work smoothly matters, not only because it can help make webpages more engaging and responsive, but also because it is one of the most rapid ways to build front ends to web services.

It may seem a bit peculiar at first that there should be any AJAX interoperability issues at all, since it is built on a whole stack of existing, mature, open standards: the W3C’s XML and DOM for data and data manipulation, and XHTML for webpages, ECMAScript (JavaScript) for scripting and much more besides. Though there a few compliance issues with those standards in modern browsers, that’s not the actually the biggest interoperability problem.

That lies more in the fact that most AJAX libraries have been written with the assumption that they’ll be the only ones on the page. That is, in a typical AJAX application, an ECMAScript library is loaded along with the webpage, and starts to control the fetching and sending of data, and the recording of user clicks, drags, drops and more, depending on how exactly the whole application is set up.

This is all nice and straightforward unless there’s another library loaded that also assumes that it’s the only game in town, and starts manipulating the state of objects before the other library can do its job, or starts manipulating completely different objects that happen to have the same name.

Making sure that JavaScript libraries play nice is the stated aim of the OpenAjax alliance. Formed earlier this year, the alliance now has a pretty impressive roster of all the major open source projects in the area as well as major IT vendors such as Sun, IBM, Adobe and Google (OpenAjax Alliance). Pretty much everyone but Microsoft…

The main, concrete way in which the alliance wants to make sure that AJAX JavaScript libraries play nice with each other is by building the OpenAjax hub. This is a set of standard JavaScript functions that address issues such as load order and component naming, but also means of addressing each other’s functionality in a standard way.

For that to happen, the alliance first intends to build an open source reference implementation of the hub (OpenAjax Alliance). This piece of software is meant to control the load and execution order of libraries, and serve as a runtime registry of the libraries’ methods so that each can call on the other. This software is promised to appear in early 2007 (Infoworld), but the SourceForge filestore and subversion tree are still eerily empty (SourceForge).

It’d be a shame if the hub would remain vapourware, because it is easy to see the benefits of a way to get a number of mature and focussed JavaScript libraries to work in a single Ajax application. Done properly, it would make it much easier to string together such components rather then write all that functionality from scratch. This, in turn, could make it much easier to realise the mash-ups and composite applications made possible by the increasing availability of webservices.

Still, at least the white paper (OpenAjax Alliance) is well worth a look for a thorough non-techy introduction to AJAX.