Wilbert Kraan » SOA

The cloud is for the boring

Wilbert Kraan — Fri, 15 Apr 2011 15:37:25 +0000

Members of the Strategic Technologies Group of the JISC’s FSD programme met at King’s Anatomy Theatre to, ahem, dissect the options for shared services and the cloud in HE.

The STG’s programme included updates on projects of the members as well as previews of the synthesis of the Flexible Service Delivery programme of which the STG is a part, and a preview of the University Modernisation Fund programme that will start later in the year.

The main event, though, was a series of parallel discussions on business problems where shared services or cloud solutions could make a difference. The one I was at considered a case from the CUMULUS project; how to extend rather than replace a Student Record System in a modular way.

View from the King's anatomy theatre up to the clouds

In the event, a lot of the discussion revolved around what services could profitably be shared in some fashion. When the group looked at what is already being run on shared infrastructure and what has proven very difficult, the pattern is actually very simple: the more predictable, uniform, mature, well understood and inessential to the central business of research and education, the better. The more variable, historically grown, institution specific and bound up with the real or perceived mission of the institution or parts thereof, the worse.

Going round the table to sort the soporific cloudy sheep from the exciting, disputed, in-house goats, we came up with following lists:

Cloud:

email
Travel expenses
HR
Finance
Student network services
Telephone services
File storage
Infrastructure as a Service

In house:

Course and curriculum management (including modules etc)
Admissions process
Research processes

This ought not to be a surprise, of course: the point of shared services – whether in the cloud or anywhere else – is economies of scale. That means that the service needs to be the same everywhere, doesn’t change much or at all, doesn’t give the users a competitive advantage and has well understood and predictable interfaces.

Meshing up a JISC e-learning project timeline, or: It’s Linked Data on the Web, stupid

Wilbert Kraan — Fri, 12 Mar 2010 14:49:40 +0000

Inspired by the VirtualDutch timeline, I wondered how easy it would be to create something similar with all JISC e-learning projects that I could get linked data for. It worked, and I learned some home truths about scalability and the web architecture in the process.

As Lorna pointed out, UCL’s VirtualDutch timeline is a wonderful example of using time to explore a dataset. Since I’d already done ‘place’ in my previous meshup, I thought I’d see how close I could get with £0.- and a few nights tinkering.

The plan was to make a timeline of all JISC e-learning projects, and the developmental and affinity relations between them that are recorded in CETIS’ PROD database of JISC projects. More or less like Scott Wilson and I did by hand in a report about the toolkits and demonstrators programme. This didn’t quite work; SPARQLing up the data is trivial, but those kinds of relations don’t fit well into the widely used Simile timeline from both a technical and usability point of view. I’ll try again with some other diagram type.

What I do have is one very simple, and one not so simple timeline for e-learning projects that get their data from the intensely useful rkb explorer Linked Data knowledge base of JISC projects and my own private PROD stash:

Recipe for the simple one

Go to the SPARQL proxy web service
Tell it to ask this question
Tell the proxy to ask that question of the RKB by pointing it at the RKB SPARQL endpoint (http://jisc.rkbexplorer.com/sparql/), and order the results as CSV
Copy the URL of the CSV results page that the proxy gives you
Stick that URL in a ‘=ImportData(“{yourURL}”)’ function inside a fresh Google Spreadsheet
Insert timeline gadget into Google spreadsheet, and give it the right spreadsheet range
Hey presto, one timeline coming up:

Screenshot of the simple mashup- click to go to the live meshup

Recipe for the not so simple one

For this one, I wanted to stick a bit more in the project ‘bubbles’, and I also wanted the links in the bubbles to point to the PROD pages rather than the RKB resource pages. Trouble is, RKB doesn’t know about data in PROD, and for some very sound reasons I’ll come to in a minute, won’t allow the pulling in of external datasets via SPARQL’s ‘FROM’ operator either. All other SPARQL endpoints in the web that I know of that allow FROM couldn’t handle my query- they either hung or conked out. So I did this instead:

Download and install your own SPARQL endpoint (I like Leigh Dodds’ simple but powerful Twinkle)
Feed it this query
Copy and paste the results into a spreadsheet and fiddle with concatenation
Hoist spreadsheet into Google docs
Insert timeline gadget into Google spreadsheet, and give it the right spreadsheet range
Hey presto, a more complex timeline:

Click to go to the live meshup

It’s Linked Data on the Web, stupid

When I first started meshing up with real data sets on the web (as opposed to poking my own triple store), I had this inarticulate idea that SPARQL endpoints were these magic oracles that we could ask anything about anything. And then you notice that there is no federated search facility built into the language. None. And that the most obvious way of querying across more than one dataset – pulling in datasets from outside via SPARQL’s FROM – is not allowed by many SPARQL endpoints. And that if they do allow FROM, they frequently cr*p out.

The simple reason behind all this is that federated search doesn’t scale. The web does. Linked Data is also known as the Web of Data for that reason- it has the same architecture. SPARQL queries are computationally expensive at the best of times, and federated SPARQL queries would be exponentially so. It’s easy to come up with a SPARQL query that either takes a long time, floors a massive server (or cloud) or simply fails.

That’s a problem if you think every server needs to handle lots of concurrent queries all the time, especially if it depends on other servers on a (creaky) network to satisfy those queries. By contrast, chomping on the occasional single query is trivial for a modern PC, just like parsing and rendering big and complex html pages is perfectly possible on a poky phone these days. By the same token, serving a few big gobs of (RDF XML) text that sits at the end of a sensible URL is an art that servers have perfected over the past 15 years.

The consequence is that exposing a data set as Linked Data is not so much a matter of installing a SPARQL endpoint, but of serving sensibly factored datasets in RDF with cool URLs, as outlined in Designing URI Sets for the UK Public Sector (pdf). That way, servers can easily satisfy all comers without breaking a sweat. That’s important if you want every data provider to play in Linked Data space. At the same time, consumers can ask what they want, without constraint. If they ask queries that are too complex or require too much data, then they can either beef up their own machines, or live with long delays and the occasional dead SPARQL engine.

SOA only really works webscale

Wilbert Kraan — Thu, 11 Dec 2008 00:04:58 +0000

Just sat through a few more SOA talks today, and, as usual, the presentations circled ’round to governance pretty quick and stayed there.

The issue is this: soa promises to make life more pleasant by removing duplication of data and functionality. Money is saved and information is more accurate and flows more freely, because we tap directly into the source systems, via their services.

So far the theory. The problem is that organisations in soa exercises have a well documented tendency to re-invent their old monolithic applications as sets of isolated services that make most sense to themselves. And here goes the re-use argument: everyone uses their own set of services, with lots of data and functionality duplication.

Unless, of course, your organisation has managed to set up a Governance Police that makes everyone use the same set of centrally sanctioned services. Which is, let’s say, not always politically feasible.

Which made me think of how this stuff works on the original service oriented architecture: the web. The most obvious attribute of the web, of course, is that there is no central authority over service provision and use. People just use what is most useful to them- and that is precisely the point. Instead of governance, the web has survival of the fittest: the search engine that gives the best answers gets used by everyone.

Trying to recreate that sort of Darwinian jungle within the enterprise seems both impossible and a little misguided. No organisation has the resources to just punt twenty versions of a single service in the full knowledge that at least nineteen will fail.

Or does it? Once you think about the issue webscale, such a trial-and-error approach begins to look more do-able. For a start, an awful lot of current services are commodities that are the same across the board: email, calendars, CRM etc. These are already being sourced from the web, and there are plenty more that could be punted by entrepreneurial -shared – service providers with a nous for the education system (student record system, HR etc.)

That leaves the individual HE institutions to concentrate on those services that provide data and functionality that are unique to themselves. Those services will survive, because users need them, and they’re also so crucial that institutions can afford to experiment before a version is found that does the job best.

I’ll weasel out of naming what those services will be: I don’t know. But I suspect it will be those that deal with the institution’s community (‘social network’ if you like) itself.

Prof. Zhu’s presentation on e-education in China

Wilbert Kraan — Thu, 27 Nov 2008 16:17:58 +0000

Initially, it’s hard to get past the eye-popping numbers (1876 universities, 17 million students and so on) but once you do, you’ll see that the higher education sector in China is facing remarkably familiar challenges with some interesting solutions.

We were very fortunate here at IEC that Prof Zhu Zhiting and colleagues from Eastern China Normal University and the China e-Learning Technology Standardization Committee agreed to visit our department after attending the JISC CETIS conference yesterday. He kindly agreed to let us publish his slides, which are linked below.

The two most noticeable aspects of prof. Zhu’s presentation are the nature of planning e-education in China, and the breadth of interests in Prof. Zhu’s Distance Education College & e-Educational System Engineering Research Center.

Because the scale of education in China is so vast, any development has to be based on multiple layers of initiatives. The risks involved mean that the national ministery of education needs to plan at very high, strategic levels, that set out parameters for regional and local governments to follow. This is not new per se, but it leads to a thoroughness and predictability in infrastructure that others could learn from.

The department in Shanghai, though, is another matter. Their projects range from international standardisation right down to the development of theories that integrate short term and long term individual memory with group memory. Combined with concrete projects such as the roll-out of a lifelong learning platform for the citizens of Shanghai, that leads to some serious synergies.

Learn more from Prof. Zhuâ€™s slides

More about IEC and what it does.

If Enterprise Architecture is about the business, where are the business people?

Wilbert Kraan — Tue, 18 Nov 2008 12:01:44 +0000

The Open Group Enterprise Architecture conference in Munich last month saw a first meeting of Dutch and British Enterprise Architecture projects in Higher Education.

Probably the most noticeable aspect of the enterprise architecture in higher education session was the commonality of theme not just between the Dutch and British HE institutions, but also between the HE contingent and the enterprise architects of the wider conference. There are various aspects to the theme, but it really boils down to one thing: how does a bunch of architects get a grip on and then re-fashion the structure of a whole organisation?

In the early days, the answer was simply to set the scope of an architecture job to the expertise and jurisdiction of the typical enterprise architect team: IT systems. In both the notional goal of architecting work as well as its practice, that focus on just IT seems to be too limiting. Even a relatively narrow interpretation of the frequently cited goal of enterprise architecture – to better align systems to the business – presupposes a heavy involvement of all departments and the clout to change practices across the organisation.

A number of strategies to overcome the conundrum were reported on by the HE projects. One popular method is to focus on one concrete project development at a time, and evolve an architecture iteratively. Another is to involve everyone by letting them determine and agree on a set of principles that underpin the architecture work before it starts. Yet other organisations tackle the scope and authority issue head-on and sort governance structures before tackling the structure of the organisation; much like businesses tend to do.

In either of these cases, though, architects remain mostly focussed on IT systems, while remaining wholly reliant on the rest of the organisation for what the systems actually look like and clues about what they should do.

Presentations can be seen on the JISC website

Semantic tech finds its niches and gets productive

Wilbert Kraan — Tue, 03 Jun 2008 12:10:36 +0000

Rather than the computer science foundation, the annual semantic technology conference in San Jose focusses on the practical applications of the approach. Visitor numbers are growing at a ‘double digit’ clip, and vendors are starting to include big names such as Oracle. We take a look.

It seems that we’re through the trough of disillusionment about the fact that the semantic web as outlined by the Tim Berners Lee in 1999 has not exactly materialised (yet). It’s not news that we do not all have intelligent agents that can seek out all kinds of data on the ‘net, and integrate it to satisfy our specific queries and desires. What we do have is a couple of interesting and productive approaches to the mixing and matching of disparate information that hint at a slope of enlightenment, heading to a plateau of productivity.

Easily the clearest example from the conference is the case of Blue Shield of California, a sizeable health care insurer in the US. They faced the familiar issue of a pile of legacy applications with custom connections, that were required to do things they were never designed to do. As a matter of fact, customer and policy data (core data, clearly) were spread over two systems of different vintage, making a unified view very difficult.

In contrast to what you might expect, the solution they built leaves the data in the existing databases- nothing is replicated in a separate store. Instead, the integration takes place in a ‘semantic layer’. In essence, that means that users can ask pretty complex and detailed questions of any information that is held in any system, in terms of a set of models of the business. These queries end up at the same old systems, where they get mapped from semantic web query form into good old SQL database queries.

This kind of approach doesn’t look all that different from the Enterprise Service Bus (ESB) in principle, but takes takes the insulation of service consumers from the details of service providers rather further. Service consumers in a semantic layer have just one API for any kind of data (the W3C’s SPARQL query language) and one datamodel (RDF, though output in XML or JSON is common). Most importantly, the meaning of the data is modelled in a set of ontologies, not in the documentation of the service providers, or the heads of their developers.

While the Blue Shield of California case was done by BBN, other vendors that exhibited in San Jose have similar approaches, often built on open source components. The most eye catching of those components (and also used by BBN) is netkernel: the overachieving offspring of the web and unix. It’s not necessarily semantic tech per se, but more of a language agnostic application environment that competes with J2EE.

Away from the enterprise, and more towards the webby (and educational!) end of things, the state of semantic technology becomes less clear. There are big web apps such as the Twine social network where the triples are working very much in the background, or powerset, where it is much more in your face, but to little apparent effect.

Much less polished, but much, much more interesting is dbpedia.org- an attempt to put all public domain knowledge in a triple store. Yes, that includes wikipedia, but also the US census and much more. DBpedia is accessible via a number of interfaces, including SPARQL. The result is the closest thing yet to a live instance of the semantic web as originally conceived, where it really is possible to ask questions like “give me all football players with number 11 shirts from clubs with stadiums with more than 40000 seats, who were born in a country with more than 10M inhabitants“. Because of the inherent flexibility of a triple store and with the power of the SPARQL interface, dbpedia could end up powering all kinds of other web applications and user interfaces.

Nearer to a real semantic web, though, is Yahoo’s well publicised move to start supporting relevant standards. While the effect isn’t yet so obvious as semantic integration in the enterprise or dbpedia. it could end up being significant, for the simple reason that it focusses on the organisational model. It does that by processing data in various ‘semantic web light’ formats that are embedded in webpages in the structuring and presentation of search results. If you’d want to present a nice set of handles on your site’s content in a yahoo search results page -links to maps, contact info, schedules etc- it’s time to start using RDFa or microformats.

Beyond the specifics of semantic standards or technologies of this point in time, though, lies the increasing demands for such tools. The volume and heterogeneity of data is increasing rapidly, not least because means of fishing structured data out of unstructured data are improving. At the same time, the format of structured data (its syntax) is much less of an issue than it once was, as is the means of shifting that data around. What remains is making sense of that data, and that requires semantic technology.

Resources

The semantic conference site gives an overview, but not any presentations, alas.

The California Blue Shield architecture was built with BBN’s Asio tool suite

More about the netkernel resource oriented computing platform can be found on the 1060 website

Twine is still in private beta, but will open up in overhauled form in the autumn.

Powerset is wikipedia with added semantic sauce.

DBpedia is the community effort to gather all public domain knowledge in a triple store. There’s a page that outlines all ways of accessing it over the web.

Yahoo’s SearchMonkey is the project to utilise semweb light specs in search results.

The e-framework, social software and mashups

Wilbert Kraan — Wed, 10 Oct 2007 09:44:56 +0000

The e-framework has just opened a wiki, and will happily accommodate Web APIs and mashups. We show how the former works with the submission of an example of the latter.

The e-framework is all about sharing service oriented solutions in a way that can be easily replicated by other people in a similar situation. The situation of the Design for Learning programme is simple, and probably quite familiar: a group of very diverse JISC projects want to be able to share resources between each other, but also with a number of other communities. Some of these communities have pretty sophisticated sharing infrastructures built around institutional, national or even global repositories that do not interoperate with each other out of the box.

There are any number of sophisticated solutions to that issue, several of which are also being documented as an e-framework Service Usage Model (SUM). Sometimes, though, a simple, low-cost solution -preferably one that doesn’t require agreement between all the relevant infrastructures beforehand- is good enough.

Simplicity and low cost are the essence of the Design for Learning sharing SUM. It achieves that by concentrating on a collection of collaboratively authored bookmarks that point to the Design for Learning resources. That way, the collection can happily sit alongside any number of more sophisticated infrastructures that need to store the actual learning resource. It also make the solution flexible, future-proof and applicable to any resource that can be addressed with a URL.

There is, of course, slightly more to it than that, and that’s what the SUM is designed to draw out. The various headings that make up a SUM draw out all the information that’s needed to either find the SUM’s parts (i.e. services), or to replicate, adapt or incorporate the SUM itself.

For example, the Design for Learning SUM shows how to link a bookmark store such as del.icio.us to a community website that can render a newsfeed. That is: it calls for a Syndicate service. This SUM doesn’t say which kind of Syndicate service exactly, but it does show where you have the choice and roughly what the choices are.

By the same token, the SUM can be taken up and tweaked to meet different needs with a minimum of effort. Accommodating different bookmark stores at the same time, for example, is a matter of adding one of several mash-up builders between the Syndicate service providers and the community website. Or you could simply refer to the whole SUM as one part of a much wider and more ambitious resource sharing strategy.

Fortunately, completing a SUM is a bit easier now that there’s a specialised wiki. Bits and bobs can be added gradually and collaboratively until the point that it can be submitted as the official version. Once that’s done, feedback from the wider e-framework community will follow, and the SUM hosted in the e-framework registry. You should be able to see how that works by watching the Design for Learning SUM page in the coming weeks.

The Design for Learning sharing SUM wiki page.
The Design for Learning support pages, which will have an implementation of the sharing SUM soon.
How you can start your own e-framework page is outlined on the wiki itself.

AJAX alliance to start interoperability work

Wilbert Kraan — Sun, 08 Oct 2006 23:24:12 +0000

Funny how, after an initial development rush, a community around a new technology will hit some interoperability issues, and then start to address it via some kind of specification initiative. AJAX, the browser-side interaction technique that brought you google maps, is in that phase right now.

Making Asynchronous JavaScript and XML (AJAX) work smoothly matters, not only because it can help make webpages more engaging and responsive, but also because it is one of the most rapid ways to build front ends to web services.

It may seem a bit peculiar at first that there should be any AJAX interoperability issues at all, since it is built on a whole stack of existing, mature, open standards: the W3C’s XML and DOM for data and data manipulation, and XHTML for webpages, ECMAScript (JavaScript) for scripting and much more besides. Though there a few compliance issues with those standards in modern browsers, that’s not the actually the biggest interoperability problem.

That lies more in the fact that most AJAX libraries have been written with the assumption that they’ll be the only ones on the page. That is, in a typical AJAX application, an ECMAScript library is loaded along with the webpage, and starts to control the fetching and sending of data, and the recording of user clicks, drags, drops and more, depending on how exactly the whole application is set up.

This is all nice and straightforward unless there’s another library loaded that also assumes that it’s the only game in town, and starts manipulating the state of objects before the other library can do its job, or starts manipulating completely different objects that happen to have the same name.

Making sure that JavaScript libraries play nice is the stated aim of the OpenAjax alliance. Formed earlier this year, the alliance now has a pretty impressive roster of all the major open source projects in the area as well as major IT vendors such as Sun, IBM, Adobe and Google (OpenAjax Alliance). Pretty much everyone but Microsoft…

The main, concrete way in which the alliance wants to make sure that AJAX JavaScript libraries play nice with each other is by building the OpenAjax hub. This is a set of standard JavaScript functions that address issues such as load order and component naming, but also means of addressing each other’s functionality in a standard way.

For that to happen, the alliance first intends to build an open source reference implementation of the hub (OpenAjax Alliance). This piece of software is meant to control the load and execution order of libraries, and serve as a runtime registry of the libraries’ methods so that each can call on the other. This software is promised to appear in early 2007 (Infoworld), but the SourceForge filestore and subversion tree are still eerily empty (SourceForge).

It’d be a shame if the hub would remain vapourware, because it is easy to see the benefits of a way to get a number of mature and focussed JavaScript libraries to work in a single Ajax application. Done properly, it would make it much easier to string together such components rather then write all that functionality from scratch. This, in turn, could make it much easier to realise the mash-ups and composite applications made possible by the increasing availability of webservices.

Still, at least the white paper (OpenAjax Alliance) is well worth a look for a thorough non-techy introduction to AJAX.