Wilbert Kraan » enterprise http://blogs.cetis.org.uk/wilbert Cetis blog Wed, 22 Apr 2015 13:17:21 +0000 en-US hourly 1 http://wordpress.org/?v=4.1.22 Subject coding is changing from JACS3 to HECoS; here’s what’s different http://blogs.cetis.org.uk/wilbert/2015/04/22/subject-coding-is-changing-from-jacs3-to-hecos-heres-whats-different/ http://blogs.cetis.org.uk/wilbert/2015/04/22/subject-coding-is-changing-from-jacs3-to-hecos-heres-whats-different/#comments Wed, 22 Apr 2015 13:17:21 +0000 http://blogs.cetis.org.uk/wilbert/?p=255 From UCAS applications to HESA returns, and from league tables to the academic technology approval scheme, degree programmes and modules are classified by subject. JACS3 does that job now, but HECoS will do it in the future. Here are the main differences.

After many years of use, the Joint Academic Coding System (JACS) that’s pervasive in UK Higher Education data sets ran into some limits: it was running out of codes in some subject areas, and it was being used for many more purposes than it was originally designed to support.

That’s why the Higher Education Data and Information Improvement Programme (HEDIIP) commissioned CETIS, in collaboration with APS and Aspire, to consult with the sector on a replacement of the vocabulary. The result of that work is the Higher Education Coding of Subjects (HECoS) vocabulary. HECoS has now reached the penultimate stage in that a release candidate is out for consultation, as are proposals for the governance and adoption of the scheme.

The whole vocabulary can be seen on our tematres development site, and reports on the development of HECoS, as well as the proposals for governance and adoption are available from the consultation site.

Here are the main differences between JACS3 and HECoS in a nutshell, though;

One flat list, no hierarchies, and no memorable codes

This is easily the biggest and most noticeable change. HECoS itself is just a list of terms without any implied or given groupings. That doesn’t mean groupings and hierarchies aren’t important, quite the contrary: different organisations have different uses for subject information, and that means they can group subjects differently.

In a way, that follows on from what’s already happening with JACS3 in practice. The definition of what subjects constitutes biological sciences, for example, already differs between JACS3, HEFCE and what a typical university is likely to be able to offer. Different drivers and different contexts lead these organisations to group subjects differently, and HECoS is designed to enable different groupings to exist side by side, whilst still sharing the same subject terms.

HECoS with many hierarchies

A consequence of the approach is that the familiar JACS3 codes (“L3xx” is anything sociological etc.) are no longer valid. From the perspective of HECoS “sociolinguistics” will therefore have no defined link with “sociology”, which is why the code for the former is “101016” –or a URI that encodes that number such as http://hecos.hediip.ac.uk/terms/101016– and the code for the latter is “100505”.

For ease of navigation, however, HECoS will come with some common groupings. There is a “sociology group” that has both “sociolinguistics” and “sociology” in it. This is just to help people find terms, and nodes like “sociology group” cannot be used to classify a degree programme or module.

Terms are based on demonstrated use, need and distinguishability

While JACS was reviewed periodically, it hasn’t always had formal acceptance criteria either for the terms that were already in there, or for newly proposed ones. HECoS does have a proposal for it, which has already been applied in the development of the current draft.

The criteria for the first cut were, in short:

  1. is the term in JACS3?
  2. is there evidence of use of the term in HESA data returns?
  3. is the term’s definition and scope sufficiently clear and comprehensive to allow classification?
  4. is the term reliably distinguishable from other terms?

The first criterion comes out of a recognition that JACS has imposed a structure and created its own reality over the years. That’s a good thing, and worth preserving for time series analysis reasons alone. The second criterion addresses an issue that has bedevilled JACS for a while: many terms were sound in theory, but barely or never used in practice. This creates confusion and often makes coding unreliable: what good is a term if it groups one degree programme in one institution? For that reason, we looked at whether a term has at least two degree programmes in at least two institutions in HESA student data returns.

The third criterion has to do with the way some JACS terms were defined: some were incomplete –e.g. “history by topic” without specifying what that topic was– or where not sufficiently complete to determine what was in or out. The final criterion of distinguishability is related to that: we examined the HESA returns for consistency of coding. If the spread of similar degree programmes over several terms indicated that people were struggling to distinguish between terms, we’ve rearranged terms so that they follow the groupings that were obvious in the data as closely as possible. We’ve also started to test any such changes with sorting exercises to ensure that people can indeed distinguish between four related terms.

A commonly administered change process

Just like JACS evolved over the years, so will HECoS. The difference is that we are proposing to regularise the change and allow it to follow a predictable path. The main mechanism for that would be a registry for new terms. The diagram outlines how a new subject term can be discovered, or entered for consideration for inclusion, or discovery by others.

newTermProcess

The proposed criteria for accepting a new term into HECoS proper are similar the ones used for the first draft: a term has to be demonstrably in use, or fill a need, and be distinguishable by non-specialists. In each case, though, the HECoS governance body, which is designed to represent the whole sector, will have the ultimate say on which terms will be accepted or retired, and how often these changes will happen.

]]>
http://blogs.cetis.org.uk/wilbert/2015/04/22/subject-coding-is-changing-from-jacs3-to-hecos-heres-whats-different/feed/ 0
A simpler sourcing maturity assessment approach http://blogs.cetis.org.uk/wilbert/2013/11/29/a-simpler-sourcing-maturity-assessment-approach/ http://blogs.cetis.org.uk/wilbert/2013/11/29/a-simpler-sourcing-maturity-assessment-approach/#comments Fri, 29 Nov 2013 12:10:43 +0000 http://blogs.cetis.org.uk/wilbert/?p=220 Knowing how to procure your IT services, software and hardware is a vital function in any organisation. Assessing one’s maturity in this aspect can be complex, which is why SURF developed a simpler approach.

There are a number of perspectives to take on IT and its place in an organisation, but for further and higher education institutions, the procurement or sourcing of services – in the widest sense of the word ‘services’ – may be among the most important ones. With the ongoing move to cloud provisioning, determining where a particular service is going to come from and how it is managed is crucial.

A number of approaches to measure and improve an organisation’s maturity in this area exist, but, as Bert van Zomeren points out in the EUNIS paper that presents the SURF Sourcing Maturity Assessment Approach, these are quite complex. They can be so sophisticated that organisations hire consultancies that it do it for them. The SURF method doesn’t go quite as deep as those exercises, but is a much easier first step.

The heart of the approach is simple: a champion identifies the key stakeholders in the organisation with regard to the sourcing process, each of the stakeholders fills out the questionnaire, the results are analysed, the stakeholders meet, and appropriate adjustments to the process are agreed upon.

As in many of these approaches, the questions in the questionnaire describe an ideal situation, and respondents are asked to rank their organisation on how closely they think their organisation resembles that ideal on a scale. Some of these ideals may be uncontroversial, but it is certainly possible that others do provoke debate – adapting processes to suit services, rather than the other way round, for example. Still, such a debate can be a valuable input into the wider maturation process.

I’ve just translated the questionnaire into English, and it has been made available as a combination Google form and spreadsheet. To test it yourself, you need to sign into Google drive, put the form and spreadsheet into your drive, then make copies. The spreadsheet has two sheets: one that gathers the data and another that turns the data into a crude, but extensible report.

It’d probably be a good idea to read van Zomeren and Levinson’s short EUNIS paper before you start. There is a much more extensive guide to the approach in Dutch as well, but we thought we’d gather some feedback first before translating that as well. A guide of that sort will almost certainly be necessary in order to use the simpler sourcing maturity assessment approach in anger at an institution.

]]>
http://blogs.cetis.org.uk/wilbert/2013/11/29/a-simpler-sourcing-maturity-assessment-approach/feed/ 0
Doing analytics with open source linked data tools http://blogs.cetis.org.uk/wilbert/2012/05/28/doing-analytics-with-open-source-linked-data-tools/ http://blogs.cetis.org.uk/wilbert/2012/05/28/doing-analytics-with-open-source-linked-data-tools/#comments Sun, 27 May 2012 23:35:54 +0000 http://blogs.cetis.org.uk/wilbert/?p=168 Like most places, the University of Bolton keeps its data in many stores. That’s inevitable with multiple systems, but it makes getting a complete picture of courses and students difficult. We test an approach that promises to integrate all this data, and some more, quickly and cheaply.

Integrating a load of data in a specialised tool or data warehouse is not new, and many institutions have been using them for a while. What Bolton is trying in its JISC sponsored course data project is to see whether such a warehouse can be built out of Linked Data components. Using such tools promises three major advantages over existing data warehouse technology:

It expects data to be messy, and it expects it to change. As a consequence, adding new data sources, or coping with changes in data sources, or generating new reports or queries should not be a big deal. There are no schemas to break, so no major re-engineering required.

It is built on the same technology as the emergent web of data. Which means that increasing numbers of datasets – particularly from the UK government – should be easily thrown into the mix to answer bigger questions, and public excerpts from Bolton’s data should be easy to contribute back.

It is standards based. At every step from extracting the data, transforming it and loading it to querying, analysing and visualising it, there’s a choice of open and closed source tools. If one turns out not to be up to the job, we should be able to slot another in.

But we did spend a day kicking the tires, and making some initial choices. Since the project is just to pilot a Linked Enterprise Data (LED) approach, we’ve limited ourselves to evaluate just open source tools. We know there plenty of good closed source options in any of the following areas, but we’re going to test the whole approach before deciding on committing to license fees.

Data sources

D2RQ

Google Refine logo

Before we can mash, query and visualise, we need to do some data extraction from the sources, and we’ve come down on two tools for that: Google Refine and D2RQ. They do slightly different jobs.

Refine is Google’s power tool for anyone who has to deal with malformed data, or who just wants to transform or excerpt from format to another. It takes in CSV or output from a range of APIs, and puts it in table form. In that table form, you can perform a wide range of transformations on the data, and then export in a range of formats. The plug-in from DERI Galway, allows you to specify exactly how the RDF – the linked data format, and heart of the approach – should look when exported.

What Refine doesn’t really do (yet?) is transform data automatically, as a piece of middleware. All your operations are saved as a script that can be re-applied, but it won’t re-apply the operations entirely automagically. D2RQ does do that, and works more like middleware.

Although I’ve known D2RQ for a couple of years, it still looks like magic to me: you download, unzip it, tell it where your common or garden relational database is, and what username and password it can use to get in. It’ll go off, inspect the contents of the database, and come back with a mapping of the contents to RDF. Then start the server that comes with it, and the relational database can be browsed and queried like any other Linked Data source.

Since practically all relevant data in Bolton are in a range of relational databases, we’re expecting to use D2R to create RDF data dumps that will be imported into the data warehouse via a script. For a quick start, though, we’ve already made some transforms with Refine. We might also use scripts such as Oxford’s XCRI XML to RDF transform.

Storage, querying and visualisation

Callimachus project logo

We expected to pick different tools for each of these functions, but ended up choosing one, that does it all- after a fashion. Callimachus is designed specifically for rapid development of LED applications, and the standard download includes a version of the Sesame triplestore (or RDF database) for storage. Other triple stores can also be used with Callimachus, but Sesame was on the list anyway, so we’ll see how far that takes us.

Callimachus itself is more of a web application on top that allows quick visualisations of data excerpts- be they straight records of one dataset or a collection of data about one thing from multiple sets. The queries that power the Callimachus visualisations have limitations – compared to the full power of SPARQL, the linked data query language – but are good enough to knock up some pages quickly. For the more involved visualisations, Callimachus SPARQL 1.1 implementation allows the results a query to be put out as common or garden JSON, for which many different tools exist.

Next steps

We’ve made some templates already that pull together course information from a variety of sources, on which I’ll report later. While that’s going on, the main other task will be to set up the processes of extracting data from the relational databases using D2R, and then loading it into Callimachus using timed scripts.

]]>
http://blogs.cetis.org.uk/wilbert/2012/05/28/doing-analytics-with-open-source-linked-data-tools/feed/ 2
Approaches to building interoperability and their pros and cons http://blogs.cetis.org.uk/wilbert/2012/01/28/approaches-to-building-interoperability-and-their-pros-and-cons/ http://blogs.cetis.org.uk/wilbert/2012/01/28/approaches-to-building-interoperability-and-their-pros-and-cons/#comments Fri, 27 Jan 2012 23:21:38 +0000 http://blogs.cetis.org.uk/wilbert/?p=157 System A needs to talk to System B. Standards are the ideal to achieve that, but pragmatics often dictate otherwise. Let’s have a look at what approaches there are, and their pros and cons.

When I looked at the general area of interoperability a while ago, I observed that useful technology becomes ubiquitous and predictable enough over time for the interoperability problem to go away. The route to get to such commodification is largely down to which party – vendors, customers, domain representatives – is most powerful and what their interests are. Which describes the process very nicely, but doesn’t help solve the problem of connecting stuff now.

So I thought I’d try to list what the choices are, and what their main pros and cons are:

A priori, global
Also known as de jure standardisation. Experts, user representatives and possibly vendor representatives get together to codify whole or part of a service interface between systems that are emerging or don’t exist yet; it can concern either the syntax, semantics or transport of data. Intended to facilitate the building of innovative systems.
Pros:

  • Has the potential to save a lot of money and time in systems development
  • Facilitates easy, cheap integration
  • Facilitates structured management of network over time

Cons:

  • Viability depends on the business model of all relevant vendors
  • Fairly unlikely to fit either actually available data or integration needs very well

A priori, local
i.e. some type of Service Oriented Architecture (SOA). Local experts design an architecture that codifies syntax, semantics and operations into services. Usually built into agents that connect to each other via an ESB.
Pros:

  • Can be tuned for locally available data and to meet local needs
  • Facilitates structured management of network over time
  • Speeds up changes in the network (relative to ad hoc, local)

Cons:

  • Requires major and continuous governance effort
  • Requires upfront investment
  • Integration of a new system still takes time and effort

Ad hoc, local
Custom integration of whatever is on an institution’s network by the institution’s experts in order to solve a pressing problem. Usually built on top of existing systems using whichever technology is to hand.
Pros:

  • Solves the problem of the problem owner fastest in the here and now.
  • Results accurately reflect the data that is actually there, and the solutions that are really needed

Cons:

  • Non-transferable beyond local network
  • Needs to be redone every time something changes on the local network (considerable friction and cost for new integrations)
  • Can create hard to manage complexity

Ad hoc, global
Custom integration between two separate systems, done by one or both vendors. Usually built as a separate feature or piece of software on top of an existing system.
Pros:

  • Fast point-to-point integration
  • Reasonable to expect upgrades for future changes

Cons:

  • Depends on business relations between vendors
  • Increases vendor lock-in
  • Can create hard to manage complexity locally
  • May not meet all needs, particularly cross-system BI

Post hoc, global
Also known as standardisation, consortium style. Service provider and consumer vendors get together to codify a whole service interface between existing systems; syntax, semantics, transport. The resulting specs usually get built into systems.
Pros:

  • Facilitates easy, cheap integration
  • Facilitates structured management of network over time

Cons:

  • Takes a long time to start, and is slow to adapt
  • Depends on business model of all relevant vendors
  • Liable to fit either available data or integration needs poorly

Clearly, no approach offers instant nirvana, but it does make me wonder whether there are ways of combining approaches such that we can connect short term gain with long term goals. I suspect if we could close-couple what we learn from ad hoc, local integration solutions to the design of post-hoc, global solutions, we could improve both approaches.

Let me know if I missed anything!

]]>
http://blogs.cetis.org.uk/wilbert/2012/01/28/approaches-to-building-interoperability-and-their-pros-and-cons/feed/ 2
The cloud is for the boring http://blogs.cetis.org.uk/wilbert/2011/04/15/the-cloud-is-for-the-boring/ http://blogs.cetis.org.uk/wilbert/2011/04/15/the-cloud-is-for-the-boring/#comments Fri, 15 Apr 2011 15:37:25 +0000 http://blogs.cetis.org.uk/wilbert/?p=128 Members of the Strategic Technologies Group of the JISC’s FSD programme met at King’s Anatomy Theatre to, ahem, dissect the options for shared services and the cloud in HE.

The STG’s programme included updates on projects of the members as well as previews of the synthesis of the Flexible Service Delivery programme of which the STG is a part, and a preview of the University Modernisation Fund programme that will start later in the year.

The main event, though, was a series of parallel discussions on business problems where shared services or cloud solutions could make a difference. The one I was at considered a case from the CUMULUS project; how to extend rather than replace a Student Record System in a modular way.

View from the King's anatomy theatre up to the clouds

View from the King's anatomy theatre up to the clouds

In the event, a lot of the discussion revolved around what services could profitably be shared in some fashion. When the group looked at what is already being run on shared infrastructure and what has proven very difficult, the pattern is actually very simple: the more predictable, uniform, mature, well understood and inessential to the central business of research and education, the better. The more variable, historically grown, institution specific and bound up with the real or perceived mission of the institution or parts thereof, the worse.

Going round the table to sort the soporific cloudy sheep from the exciting, disputed, in-house goats, we came up with following lists:

Cloud:

  • email
  • Travel expenses
  • HR
  • Finance
  • Student network services
  • Telephone services
  • File storage
  • Infrastructure as a Service

In house:

  • Course and curriculum management (including modules etc)
  • Admissions process
  • Research processes

This ought not to be a surprise, of course: the point of shared services – whether in the cloud or anywhere else – is economies of scale. That means that the service needs to be the same everywhere, doesn’t change much or at all, doesn’t give the users a competitive advantage and has well understood and predictable interfaces.

]]>
http://blogs.cetis.org.uk/wilbert/2011/04/15/the-cloud-is-for-the-boring/feed/ 1
Enterprise Architecture throws out bath water, saves baby in the nick of time http://blogs.cetis.org.uk/wilbert/2010/10/19/enterprise-architecture-throws-out-bath-water-saves-baby-in-the-nick-of-time/ http://blogs.cetis.org.uk/wilbert/2010/10/19/enterprise-architecture-throws-out-bath-water-saves-baby-in-the-nick-of-time/#comments Tue, 19 Oct 2010 22:38:26 +0000 http://blogs.cetis.org.uk/wilbert/?p=120 Enterprise architecture started as a happily unreconstituted techy activity. When that didn’t always work, a certain Maoist self-criticism kicked in, with an exaltation of “the business” above all else, and taboos on even thinking about IT. Today’s Open Group sessions threatened to take that reaction to its logical extreme. Fortunately, it didn’t quite end up that way.

The trouble with realising that getting anywhere with IT involves changing the rest of the organisation as well, is that it gets you out of your assigned role. Because the rest of the organisation is guaranteed to have different perspectives on how it wants to change (or not), what the organisation’s goals are and how to think about its structure, communication is likely to be difficult. Cue frustration on both sides.

That can be addressed by going out of your way to go to “the business”, talk it’s language, worry about its concerns and generally go as native as you can. This is popular to the point of architects getting as far away from dirty, *dirty* IT as possible in the org chart.

So when I saw the sessions on “business architecture”, my heart sank. More geeks pretending to be suits, like a conference hall full of dogs trying to walk on their hind legs, and telling each other how it’s the future.

When we got to the various actual case reports in the plenary and business transformation track, however, EA self-negation is not quite what’s happening in reality. Yes, speaker after speaker emphasised the need to talk to other parts of the organisation in their own language, and the need to only provide relevant information to them. Tom Coenen did a particularly good job of stressing the importance of listening while the rest of the organisation do the talking.

But, crucially, that doesn’t negate that – behind the scenes – architects still model. Yes, for their own sake, and solely in order to deliver the goals agreed with everyone else, but even so. And, yes, there are servers full of software artefacts in those models, because they are needed to keep the place running.

This shouldn’t be surprising. Enterprise architects are not hired to decide what the organisation’s goals are, what its structure should be or how it should change. Management does that. EA can merely support by applying its own expertise in its own way, and worry about the communication with the rest of the organisation both when requirements go in and a roadmap comes out (both iteratively, natch).

And ‘business architecture’? Well, there still doesn’t appear to be a consensus among the experts what it means, or how it differs from EA. If anything, it appears to be a description of an organisation using a controlled vocabulary that looks as close as possible to non-domain specific natural language. That could help with intra-disciplinary communication, but the required discussion about concepts and the word to refer to them makes me wonder whether having a team who can communicate as well as they can model might not be quicker and more precise.

]]>
http://blogs.cetis.org.uk/wilbert/2010/10/19/enterprise-architecture-throws-out-bath-water-saves-baby-in-the-nick-of-time/feed/ 3
Bare bones TOGAF http://blogs.cetis.org.uk/wilbert/2010/10/18/bare-bones-togaf/ http://blogs.cetis.org.uk/wilbert/2010/10/18/bare-bones-togaf/#comments Mon, 18 Oct 2010 18:53:21 +0000 http://blogs.cetis.org.uk/wilbert/?p=118 Do stakeholder analysis. Cuddle the uninterested powerful ones, forget about the enthusiasts without power. Agree goal. Deliver implementable roadmap. The rest is just nice-to-have.

That was one message from today’s slot on The Open Group’s Architecture Framework (TOGAF) at the Open Group’s quarterly meeting in Amsterdam. In one session, two self-described “evil consultants” ran a workshop on how to extract most value from an Enterprise Architecture (EA) to institutional change.

While they agreed about the undivided primacy of keeping the people with power happy when doing EA, the rest of their approach differed more markedly.

Dave Hornford zero-ed in mercilessly on the do-able roadmap as the centre of the practice. But before that, find those all-powerful stakeholders and get them to agree on the organisational vision and its goal. If there is no agreement: celebrate. You’ve just saved the organised an awful lot of money in an expensive and unimplementable EA venture.

Once past that hurdle, Dave contended that the roadmap should identify what the organisation really needs – which may not always be sensible or pretty.

Jason Uppal took a slightly wider view, by focussing on the balance between quick wins and how to EA the norm in an organisation.

The point about ‘quick wins’ is that both ‘quick’ and ‘win’ are relative. It is possible to go after a long term value proposition with a particular change, as long as you have a series of interim solutions that provide value now. Even if you throw them away again later. And the first should preferably have no cost.

That way, EA can become part of the organisation’s practice: by providing value. This does pre-suppose that the EA practice is neither a project, nor a programme- just a practice.

An outline of the talks on the Open Group’s website

]]>
http://blogs.cetis.org.uk/wilbert/2010/10/18/bare-bones-togaf/feed/ 1
Linked Data meshup on a string http://blogs.cetis.org.uk/wilbert/2010/02/25/linked-data-meshup-on-a-string/ http://blogs.cetis.org.uk/wilbert/2010/02/25/linked-data-meshup-on-a-string/#comments Thu, 25 Feb 2010 12:05:58 +0000 http://blogs.cetis.org.uk/wilbert/?p=71 I wanted to demo my meshup of a triplised version of CETIS’ PROD database with the impressive Linked Data Research Funding Explorer on the Linked Data meetup yesterday. I couldn’t find a good slot, and make my train home as well, so here’s a broad outline:

The data

The Department for Business Innovation and Skills (BIS) asked Talis if they could use the Linked Data Principles and practice demonstrated in their work with data.gov.uk to produce an application that would visualise some grant data. What popped out was a nice app with visuals by Iconomical, based on a couple of newly available data sets that sit on Talis’ own store for now.

The data concerns research investment in three disciplines, which are illustrated per project, by grant level and number of patents, as they changed over time and plotted on a map.

CETIS have PROD; a database of JISC projects, with a varying amount of information about the technologies they use, the programmes they were part of, and any cross links between them.

The goal

Simple: it just ought to be possible to plot the JISC projects alongside the advanced tech of the Research Funding Explorer. If not, than at least the data in PROD should be augmentable with the data that drives the Research Funding Explorer.

Tools

Anything I could get my hands on, chiefly:

The recipe

For one, though PROD pushes out Description Of A Project (DOAP, an RDF vocabulary) files per project, it doesn’t quite make all of its contents available as linked data right now. The D2R toolkit was used to map (part of) the contents to known vocabs, and then make the contents of a copy of PROD available through a SPARQL interface. Bang, we’re on the linked data web. That was easy.

Since I don’t have access to the slick visualisation of the Research Funding Explorer, I’d have to settle for augmenting PROD’s data. This is useful for two reasons: 1) PROD has rather, erm, variable institutional names. Synching these with canonical names from a set that will go into data.gov.uk is very handy. 2) PROD doesn’t know much about geography, but Talis’ data set does.

To make this work, I made a SPARQL query that grabs basic project data from PROD, and institutional names and locations from the Talis data set, and visualises the results.

Results

A partial map of England, Wales and southern Scotland with markers indicating where projects took place
An excerpt of PROD project data, augmented with proper institutional names and geographic positions from Talis’ Research Grant Explorer, visualised in OpenLink RDF browser.

A star shaped overview of various attributes of a project, with the name property highlighted
Zooming in on a project, this time to show the attributes of a single project. Still in OpenLink RDF browser.

A two column list of one project's attributes and their values
A project in D2R’s web interface; not shiny, but very useful.

From blagging a copy of the SQL tables from the live PROD database to the screen shots above took about two days. Opening up the live server straight to the web would have cut that time by more than half. If I’d have waited for the Research Grant Explorer data to be published at data.gov.uk, it’d have been a matter of about 45 minutes.

Lessons learned

Opening up any old database as linked data is incredibly easy.

Cross-searching multiple independent linked data stores can be surprisingly difficult. This is why a single SPARQL endpoint across them all, such as the one presented by uberblic‘s Georgi Kobilarov yesterday, is interesting. There are many other good ways to tackle the problem too, but whichever approach you use, making your linked data available as simple big graphs per major class of thing (entity) in your dataset helps a lot. I was stymied somewhat by the fact that I wanted to make use of data that either wasn’t published properly yet (Talis’ research grant set), or wasn’t published at all (our own PROD triples).

A bit of judicious SPARQLing can alleviate a lot of inconsistent data problems. This is salient to a recent discussion on twitter around Brian Kelly’s Linked Data challenge. One conclusion was that it was difficult, because the data was ‘bad’. IMHO, this is the web, so data isn’t really bad, just permanently inconsistent and incomplete. If you’re willing to put in some effort when querying, a lot can be rectified. We, however, clearly need to clean up PROD’s data to make it easier on everyone.

SPARQL-panning for gold in multiple datastores (or even feeds or webpages) is way too much fun to seem like work. To me, anyway.

What’s next

What needs to happen is to make all the contents of PROD and related JISC project information available as proper linked data. I can see three stages for this:

  1. We clean up the PROD data a little more at source, and load it into the Data Incubator to polish and debate the database to triple mapping. Other meshups would also be much easier at that point.
  2. We properly publish PROD as linked data either on a cloud platform such as Talis’, or else directly from our own server via D2R or OpenLink Virtuoso. Simal would be another great possibility for an outright replacement of PROD, if it’s far enough along at that point.
  3. JISC publishes the public part of its project information as Linked Data, and PROD just augments (rather than replicates) it.
]]>
http://blogs.cetis.org.uk/wilbert/2010/02/25/linked-data-meshup-on-a-string/feed/ 7
Pinning enterprise architecture to the org chart http://blogs.cetis.org.uk/wilbert/2010/02/06/pinning-enterprise-architecture-to-the-org-chart/ http://blogs.cetis.org.uk/wilbert/2010/02/06/pinning-enterprise-architecture-to-the-org-chart/#comments Sat, 06 Feb 2010 05:31:57 +0000 http://blogs.cetis.org.uk/wilbert/?p=69 Recent discussion during the Open Group’s Seattle conference shows that we’re still not done debating the place of Enterprise Architecture (EA) in an organisation.

For one thing, EA is still a bit of a minority sport, as Tim Westbrock reminded everyone: 99+% of organisations don’t do EA, or, at least, not consciously. Nonetheless, impressive, linear, multi-digit growth in downloads and training in The Open Group’s Architectural Framework (TOGAF) indicates that an increasing number of organisations want to surface their structure.

Question is: where does that activity sit?

Traditionally, most EA practice comes out of the IT department, because the people in it recognise that an adequate IT infrastructure requires a holistic view of the organisation and its mission. As a result, extraordinary amounts of time and energy are spent on thinking about, engaging with, thinking as or generally fretting about “the business” in EA circles. To the point that IT systems or infrastructure are considered unmentionable.

While morally laudable, I fear that this anxiety is a tad futile if “the business” is unwilling or unable to understand anything about IT – as it frequently seems to –, but that’s just my humble opinion.

Mike Rollins of the Burton Group seems to be thinking along similar lines, in his provocative notion that EA is not something that you are, but something that you do. That is, in order for an architectural approach to be effective, you shouldn’t have architects (in the IT department or elsewhere), but you should integrate doing EA into the general running of the organisation.

Henri Peyret of Forrester wasn’t quite so willing to tell an audience of a few hundred people to quit their jobs, but also emphasised the necessity to embed EA in the general work of the organisation. In practical terms, that the EA team should split their time evenly between strategic work, and regular project work.

Tim Westbrock did provide a sharper contrast with the notion of letting EA become an integral part of the whole organisation inasmuch as he argued that, in a transformative scenario, the business and IT domains become separate. The context, though, was his plea for ‘business architecture’, which, simplifying somewhat, looks like EA done by non-IT people using business concepts and language. In such a situation, the scope of the IT domain is pretty much limited to running the infrastructure and coaching ‘the business’ in the early phases of the deployment of a new application that they own.

Stuart MacGregor of realIRM was one of the few who didn’t agonise so much about who’d do EA and where, but he did make a strong case for two things: building and deploying EA capacity long term, and spending a lot of time on the soft, even emotional side of engaging with other people in the organisation. A consequence of the commitment to the long term is to wean EA practices of their addiction to ‘quick wins’ and searches for ‘burning platforms’. Short term fixes nearly always have unintended consequences, and don’t necessarily do anything to fix the underlying issues.

Much further beyond concerns of who and where is the very deep consideration of the concepts and history of ‘architecture’ as applied to enterprise of Len Fehskens of the Open Group. For cyberneticians and soft systems adepts, Len’s powerpoint treatise is probably the place to start. Just expect your heckles to be raised.

Resources

Tim Westbrock’s slides on Architecting the Business is Different than Architecting IT

Mike Rollins’ slides on Enterprise Architecture: Disappearing into the Business

Henry Peyret’s slides on the Next generation of Enterprise Architects

Stuart MacGregor’s slides on Business transformation Powered by EA

Len Feshkens slides on Rethinking Architecture

]]>
http://blogs.cetis.org.uk/wilbert/2010/02/06/pinning-enterprise-architecture-to-the-org-chart/feed/ 3
Poking data.gov.uk for a day http://blogs.cetis.org.uk/wilbert/2010/01/23/poking-datagovuk-for-a-day/ http://blogs.cetis.org.uk/wilbert/2010/01/23/poking-datagovuk-for-a-day/#comments Sat, 23 Jan 2010 00:27:06 +0000 http://blogs.cetis.org.uk/wilbert/?p=64 data.gov.uk – a portal for UK governmental open data – is clearly a fantastic idea, and has the makings of a real treasure trove. But estimating how far it is in practice, means picking up a stick and poking the beast.

Which data

What it’s all about. To my unscientific eyeballing, the existence of most every dataset of interest is flagged. You can find them too, with some patience and use of the tags. Clicking through to the source quickly shows that most aren’t available to us yet in any shape or form, though. That’s fine, I’m sure something like that just takes time.

What would be good, though, is some indication which ones are available in the navigation or searching. Where available means: queriable through the SPARQL interface, or downloadable as an RDF dump.

Getting at the data

The main means through which you get at the data is via the SPARQL query service provided. That works, but has some quirks:

From experience, the convention for such services would be that you’d provide one at http://mysite.org/sparql, and that you’d get a human SPARQL query form if you go there with a browser, and you get an answer if you fire a SPARQL protocol query at it from the comfort of your own SPARQL client.

There is a human interface at https://www.data.gov.uk/sparql, but machine queries need to go someplace else: http://services.data.gov.uk/{optional dataset}/sparql Confusingly, the latter has a basic but very nice human interface too…

Of more concern is that both spit out either JSON (in theory) or XML, but not RDF. Including XML and JSON is very sensible indeed, because the greatest number of people can suck those formats up and stick them in the mashups they know and love. But for the promises of linked data to work, there is an absolute need for some kind of RDF output.

Exploring the data

In order to formulate the whizzy linked data queries that this stuff is all about, you have to get a feel for what an open data set and its vocabulary looks like. That is a bit lacking as well at the moment: you can ask the SPARQL endpoint for types, and then keep poking, but making the whole thing browsable on something like the OpenLink RDF browser would be even better. I stumbled on some ways of exploring vocabularies, but that didn’t seem to allow navigation between concepts just yet.

There is a forum and wiki where people can cooperate on how to work on these issues, but I couldn’t see how to join them- hence the post here.

So?

If you’re getting rather disappointed by now: don’t. As far as I can see, the underlying platform is easily capable of addressing each of the points I stumbled across. More importantly, all the pieces are there for something truly compelling: freely mixable open data. It’s just that not all the pieces have been put together yet. The roof has been pitched, but the rest still needs doing.

]]>
http://blogs.cetis.org.uk/wilbert/2010/01/23/poking-datagovuk-for-a-day/feed/ 1