SOA only really works webscale

Posted on December 11, 2008 by Wilbert Kraan

Just sat through a few more SOA talks today, and, as usual, the presentations circled ’round to governance pretty quick and stayed there.

The issue is this: soa promises to make life more pleasant by removing duplication of data and functionality. Money is saved and information is more accurate and flows more freely, because we tap directly into the source systems, via their services.

So far the theory. The problem is that organisations in soa exercises have a well documented tendency to re-invent their old monolithic applications as sets of isolated services that make most sense to themselves. And here goes the re-use argument: everyone uses their own set of services, with lots of data and functionality duplication.

Unless, of course, your organisation has managed to set up a Governance Police that makes everyone use the same set of centrally sanctioned services. Which is, let’s say, not always politically feasible.

Which made me think of how this stuff works on the original service oriented architecture: the web. The most obvious attribute of the web, of course, is that there is no central authority over service provision and use. People just use what is most useful to them- and that is precisely the point. Instead of governance, the web has survival of the fittest: the search engine that gives the best answers gets used by everyone.

Trying to recreate that sort of Darwinian jungle within the enterprise seems both impossible and a little misguided. No organisation has the resources to just punt twenty versions of a single service in the full knowledge that at least nineteen will fail.

Or does it? Once you think about the issue webscale, such a trial-and-error approach begins to look more do-able. For a start, an awful lot of current services are commodities that are the same across the board: email, calendars, CRM etc. These are already being sourced from the web, and there are plenty more that could be punted by entrepreneurial -shared – service providers with a nous for the education system (student record system, HR etc.)

That leaves the individual HE institutions to concentrate on those services that provide data and functionality that are unique to themselves. Those services will survive, because users need them, and they’re also so crucial that institutions can afford to experiment before a version is found that does the job best.

I’ll weasel out of naming what those services will be: I don’t know. But I suspect it will be those that deal with the institution’s community (‘social network’ if you like) itself.

If Enterprise Architecture is about the business, where are the business people?

Posted on November 18, 2008 by Wilbert Kraan

The Open Group Enterprise Architecture conference in Munich last month saw a first meeting of Dutch and British Enterprise Architecture projects in Higher Education.

Probably the most noticeable aspect of the enterprise architecture in higher education session was the commonality of theme not just between the Dutch and British HE institutions, but also between the HE contingent and the enterprise architects of the wider conference. There are various aspects to the theme, but it really boils down to one thing: how does a bunch of architects get a grip on and then re-fashion the structure of a whole organisation?

In the early days, the answer was simply to set the scope of an architecture job to the expertise and jurisdiction of the typical enterprise architect team: IT systems. In both the notional goal of architecting work as well as its practice, that focus on just IT seems to be too limiting. Even a relatively narrow interpretation of the frequently cited goal of enterprise architecture – to better align systems to the business – presupposes a heavy involvement of all departments and the clout to change practices across the organisation.

A number of strategies to overcome the conundrum were reported on by the HE projects. One popular method is to focus on one concrete project development at a time, and evolve an architecture iteratively. Another is to involve everyone by letting them determine and agree on a set of principles that underpin the architecture work before it starts. Yet other organisations tackle the scope and authority issue head-on and sort governance structures before tackling the structure of the organisation; much like businesses tend to do.

In either of these cases, though, architects remain mostly focussed on IT systems, while remaining wholly reliant on the rest of the organisation for what the systems actually look like and clues about what they should do.

Presentations can be seen on the JISC website

Semantic tech finds its niches and gets productive

Posted on June 3, 2008 by Wilbert Kraan

Rather than the computer science foundation, the annual semantic technology conference in San Jose focusses on the practical applications of the approach. Visitor numbers are growing at a ‘double digit’ clip, and vendors are starting to include big names such as Oracle. We take a look.

It seems that we’re through the trough of disillusionment about the fact that the semantic web as outlined by the Tim Berners Lee in 1999 has not exactly materialised (yet). It’s not news that we do not all have intelligent agents that can seek out all kinds of data on the ‘net, and integrate it to satisfy our specific queries and desires. What we do have is a couple of interesting and productive approaches to the mixing and matching of disparate information that hint at a slope of enlightenment, heading to a plateau of productivity.

Easily the clearest example from the conference is the case of Blue Shield of California, a sizeable health care insurer in the US. They faced the familiar issue of a pile of legacy applications with custom connections, that were required to do things they were never designed to do. As a matter of fact, customer and policy data (core data, clearly) were spread over two systems of different vintage, making a unified view very difficult.

In contrast to what you might expect, the solution they built leaves the data in the existing databases- nothing is replicated in a separate store. Instead, the integration takes place in a ‘semantic layer’. In essence, that means that users can ask pretty complex and detailed questions of any information that is held in any system, in terms of a set of models of the business. These queries end up at the same old systems, where they get mapped from semantic web query form into good old SQL database queries.

This kind of approach doesn’t look all that different from the Enterprise Service Bus (ESB) in principle, but takes takes the insulation of service consumers from the details of service providers rather further. Service consumers in a semantic layer have just one API for any kind of data (the W3C’s SPARQL query language) and one datamodel (RDF, though output in XML or JSON is common). Most importantly, the meaning of the data is modelled in a set of ontologies, not in the documentation of the service providers, or the heads of their developers.

While the Blue Shield of California case was done by BBN, other vendors that exhibited in San Jose have similar approaches, often built on open source components. The most eye catching of those components (and also used by BBN) is netkernel: the overachieving offspring of the web and unix. It’s not necessarily semantic tech per se, but more of a language agnostic application environment that competes with J2EE.

Away from the enterprise, and more towards the webby (and educational!) end of things, the state of semantic technology becomes less clear. There are big web apps such as the Twine social network where the triples are working very much in the background, or powerset, where it is much more in your face, but to little apparent effect.

Much less polished, but much, much more interesting is dbpedia.org- an attempt to put all public domain knowledge in a triple store. Yes, that includes wikipedia, but also the US census and much more. DBpedia is accessible via a number of interfaces, including SPARQL. The result is the closest thing yet to a live instance of the semantic web as originally conceived, where it really is possible to ask questions like “give me all football players with number 11 shirts from clubs with stadiums with more than 40000 seats, who were born in a country with more than 10M inhabitants“. Because of the inherent flexibility of a triple store and with the power of the SPARQL interface, dbpedia could end up powering all kinds of other web applications and user interfaces.

Nearer to a real semantic web, though, is Yahoo’s well publicised move to start supporting relevant standards. While the effect isn’t yet so obvious as semantic integration in the enterprise or dbpedia. it could end up being significant, for the simple reason that it focusses on the organisational model. It does that by processing data in various ‘semantic web light’ formats that are embedded in webpages in the structuring and presentation of search results. If you’d want to present a nice set of handles on your site’s content in a yahoo search results page -links to maps, contact info, schedules etc- it’s time to start using RDFa or microformats.

Beyond the specifics of semantic standards or technologies of this point in time, though, lies the increasing demands for such tools. The volume and heterogeneity of data is increasing rapidly, not least because means of fishing structured data out of unstructured data are improving. At the same time, the format of structured data (its syntax) is much less of an issue than it once was, as is the means of shifting that data around. What remains is making sense of that data, and that requires semantic technology.

Resources

The semantic conference site gives an overview, but not any presentations, alas.

The California Blue Shield architecture was built with BBN’s Asio tool suite

More about the netkernel resource oriented computing platform can be found on the 1060 website

Twine is still in private beta, but will open up in overhauled form in the autumn.

Powerset is wikipedia with added semantic sauce.

DBpedia is the community effort to gather all public domain knowledge in a triple store. There’s a page that outlines all ways of accessing it over the web.

Yahoo’s SearchMonkey is the project to utilise semweb light specs in search results.

Why compete with .doc?

Posted on January 9, 2007 by Wilbert Kraan

Given the sheer ubiquity of Microsoft Office documents, it may seem a bit quixotic to invent a competing set of document formats, and drag it through the standards bodies, all the way to ISO. The Open Document Format people have just accomplished that, and are now being hotly pursued by … Microsoft and its preferred Office Open XML specification.

If the creation of interoperability between similar but different programs is the sole purpose of a standard, office documents don’t look like much of a priority. In so far as there is any competition to Microsoft’s Office at all, the first requirement of such programs is to read from, and write to, Office’s file formats as if they are Office themselves. Most of these competitors have succeeded to such an extent that it has entrenched the formats even further. For example, if you’d want to use the same spreadsheet on a Palm and Google Spreadsheet, or send it to an OpenOffice using colleague as well as a complete stranger, Excel’s .xls is practically your only choice.

Yet the Open Document Format (ODF) has slowly wound its way from its origins in the OpenOffice file format through the OASIS specification body, and is now a full ISO standard; the first of its kind. But not necessarily the last: the confusingly named Office Open XML (OOXML) is already an Ecma specification, and the intention is to shift it to ISO too.

To understand why the ODF and OOXML standard steeple chase is happening at all, it is necessary to look beyond the basic interoperability scenario of lecturer A writing a handout that’s then downloaded and looked at by student B, and perhaps printed by lecturer C next year. That sort of user to user sharing was solved a long time ago, once Corel’s suite ceased to be a major factor. But there are some longer running issues with data exchange at enterprise level, and with document preservation.

Enterprise data exchange is perhaps the most pressing of the two. For an awful lot of information management purposes, it would be handy to have the same information in a predictable, structured format. Recent JISC project calls, for example, call for stores of course validation and course description data- preferably in XML. Such info needs to be written by practitioners, which usually means trying to extract that structured data from a pile of opaque, binary Word files. It’d be much easier to extract it from a set of predictable XML.

Provided you use a template or form, that’s precisely what the new XML office formats offer. Both ODF and OOXML try to separate information from presentation and metadata. Both store data as XML, and separate out other media such as images in separate directories, and stick the lot in a Zip compressed archive. Yes, that’s very similar indeed to IMS Content Packaging, so transforming either ODF and OOXML slides to that format for use in a VLE shouldn’t be that difficult either. It’d be much easier to automatically put both enterprise data and course content back into office documents as well.

The fact that the new formats are based on XML explains their suitability as a source for any number of data manipulation workflows, but it is the preservation angle that explains the standards bodies aspect. Even if access to current Office documents is not an issue for most users, that’s only because a number of companies are willing to do Microsoft’s bidding, and at least one set of open source developers have been prepared to spent countless tedious hours trying to reverse engineer the formats. One move by Microsoft and that whole expensive license or reverse engineer business could start again.

For most Governments, that’s not good enough. Access must be guaranteed over decades or more, to anyone who comes along without let or hindrance. It was this aspect that has driven most of the development of ODF in particular, and OOXML by extension. The Massachusetts state policy particularly, with its insistence on the storage of public information in non-proprietary formats, has led to Microsoft first giving much more complete description of its XML format, and later assurances that it wouldn’t assert discriminatory or royalty bearing patent licenses. The state is still going to use ODF, though; not OOXML.

On technical merit, you can see why that would be: OOXML is a pretty hideous concoction that looks like it closely encodes layers of Microsoft office legacy in the most obtuse XML possible. The spec is a 47 Mb whopper running to six thousand and thirty nine pages. On the upside, it’s single- or double-character terseness can make it more compact in certain cases, and Microsoft touts its support for legacy documents. That appears to be true mainly if the OOXML implementation is Microsoft Office, and if you believe legacy formats should be dealt with in a new format, rather than simply in a new converter.

ODF is much simpler, and far easier to read for anyone who has perused XHMTL or other common XML applications, and it is much more generalisable. It could be less efficient in some circumstances, though, because of its verbosity and because it allows mixed content (both characterdata and tags in a tag), and it doesn’t support some of the more esoteric programming extensions that, for example, Excel does.

All other things being equal, the simpler, comprehensible option should win every time. Alas for ODF, things aren’t equal, because OOXML is going to be the default file format in Microsoft Office 2007. That simple fact alone will probably ensure it succeeds. Whether it matters is probably going to depend on whether you need to work with the formats on a deeply technical level.

For most of us, it is probably good enough that the work we all create throughout our lives is that little bit more open and future proofed, and that little less tied to what one vendor chooses to sell us at the moment.

AJAX alliance to start interoperability work

Posted on October 9, 2006 by Wilbert Kraan

Funny how, after an initial development rush, a community around a new technology will hit some interoperability issues, and then start to address it via some kind of specification initiative. AJAX, the browser-side interaction technique that brought you google maps, is in that phase right now.

Making Asynchronous JavaScript and XML (AJAX) work smoothly matters, not only because it can help make webpages more engaging and responsive, but also because it is one of the most rapid ways to build front ends to web services.

It may seem a bit peculiar at first that there should be any AJAX interoperability issues at all, since it is built on a whole stack of existing, mature, open standards: the W3C’s XML and DOM for data and data manipulation, and XHTML for webpages, ECMAScript (JavaScript) for scripting and much more besides. Though there a few compliance issues with those standards in modern browsers, that’s not the actually the biggest interoperability problem.

That lies more in the fact that most AJAX libraries have been written with the assumption that they’ll be the only ones on the page. That is, in a typical AJAX application, an ECMAScript library is loaded along with the webpage, and starts to control the fetching and sending of data, and the recording of user clicks, drags, drops and more, depending on how exactly the whole application is set up.

This is all nice and straightforward unless there’s another library loaded that also assumes that it’s the only game in town, and starts manipulating the state of objects before the other library can do its job, or starts manipulating completely different objects that happen to have the same name.

Making sure that JavaScript libraries play nice is the stated aim of the OpenAjax alliance. Formed earlier this year, the alliance now has a pretty impressive roster of all the major open source projects in the area as well as major IT vendors such as Sun, IBM, Adobe and Google (OpenAjax Alliance). Pretty much everyone but Microsoft…

The main, concrete way in which the alliance wants to make sure that AJAX JavaScript libraries play nice with each other is by building the OpenAjax hub. This is a set of standard JavaScript functions that address issues such as load order and component naming, but also means of addressing each other’s functionality in a standard way.

For that to happen, the alliance first intends to build an open source reference implementation of the hub (OpenAjax Alliance). This piece of software is meant to control the load and execution order of libraries, and serve as a runtime registry of the libraries’ methods so that each can call on the other. This software is promised to appear in early 2007 (Infoworld), but the SourceForge filestore and subversion tree are still eerily empty (SourceForge).

It’d be a shame if the hub would remain vapourware, because it is easy to see the benefits of a way to get a number of mature and focussed JavaScript libraries to work in a single Ajax application. Done properly, it would make it much easier to string together such components rather then write all that functionality from scratch. This, in turn, could make it much easier to realise the mash-ups and composite applications made possible by the increasing availability of webservices.

Still, at least the white paper (OpenAjax Alliance) is well worth a look for a thorough non-techy introduction to AJAX.

Wilbert Kraan

Cetis blog

Category Archives: enterprise

SOA only really works webscale

If Enterprise Architecture is about the business, where are the business people?

Semantic tech finds its niches and gets productive

Why compete with .doc?

AJAX alliance to start interoperability work