SNA session at CETIS 12

I attended the SNA session at the CETIS conference hosted by Lorna, Sheila, Tony and Martin. Before the session I had blogged about some of the questions I had on SNA and although I think I have more new questions than answers I feel like things are much clearer now. My mind is still going over the conversations that were had at the session but these are the main themes and some early thoughts that I came away with.

What are the quick wins?
At the start of the session Sheila asked the question ‘What are the quick wins?’.   While Tony and Martins presentations were excellent I think it is hard for people who don’t have their head regularly in this space to replicate the techniques quickly. Lorna said that although she understood what was happening in the SNA examples there was some ‘secret magic that she couldn’t replicate when doing it for herself, Tony agreed that when you work in this area for a while you start to develop a workflow and understand some of the quirks of the software. I could relate to Lorna’s dilema as it took me a few hours of using Gephi just to know exactly when I needed to force quit the program and start all over again.

So for people who want to find out useful information about social networks but don’t have the time to get into the secret magic of SNA can we develop quick and simple tools that answer quick and simple questions?

The crossover between data driven visualisations and SNA
The session helped me make a clear distinction between Data Driven Journalism and SNA . While there is a crossover between the two the reasons for doing them are quite different. SNA is a way to study social networks and data driven visualisations are a way to convey a story to an audience. Although the two do cross over I found that making distinctions between them both helped me get to grips with the ‘why is it worth doing this’ question.

Data Validation
Martin made the point that when he was playing with PROD data to create visualisations he found that it was a great way of validating the data itself as he managed to spot errors and feed that back to Wilbert and myself.

Lies, Damned Lies and Pretty Pictures
Amber Thomas did a fantastic presentation, if you missed the session it is available here. I felt Amber had really thought about the ‘How is this useful?’ question and I felt lots of pieces of the puzzle click into place during the presentation. I really recommend spending the time to go through the slides.

Thanks to Sheila, Lorna, Amber, Tony and Martin for an interesting session.

Standards used in JISC programmes and projects over time

Today I took part in an introduction to R workshop being held at The University of Manchester. R is a software environment for statistics  and while it does all sorts of interesting things that are beyond my ability one thing that I can grasp and enjoy is exploring all the packages that are available for R, these packages extend Rs capabilities and let you do all sorts of cool things in a couple of lines of code.

The target I set out for myself was to use JISC CETIS Project Directory data and find a way of visualising standards used in JISC funded projects and programmes over time. I found a Google Visualisation package and using this I was surprised at how easy it was to generate an output , the hardest bits being manipulating the data (and thinking about how to structure it).  Although my output from the day is incomplete I thought I’d write up my experience while it is fresh in my mind.

First I needed a dataset of projects, start dates, standards and programme. I got the results in CSV format by using the sparqlproxy web service that I use in this tutorial and stole and edited a query from Martin


PREFIX rdfs:
PREFIX jisc:
PREFIX doap:
PREFIX prod:
SELECT DISTINCT ?projectID ?Project ?Programme ?Strand ?Standards ?Comments ?StartDate ?EndDate
?projectID a doap:Project .
?projectID prod:programme ?Programme .
?projectID jisc:start-date ?StartDate .
?projectID jisc:end-date ?EndDate .
OPTIONAL { ?projectID prod:strand ?Strand } .
# FILTER regex(?strand, “^open education”, “i”) .
?projectID jisc:short-name ?Project .
?techRelation doap:Project ?projectID .
?techRelation prod:technology ?TechnologyID .
FILTER regex(str(?TechnologyID), “^”) .
?TechnologyID rdfs:label ?Standards .
OPTIONAL { ?techRelation prod:comment ?Comments } .

From this I created a pivot table of all standards, and how much they appeared in each projects and programmes for each year (using the project start date). After importing this into R, it took two lines to grab the google visualisation package and plot this as Google Visualisation Chart.

M = gvisMotionChart(data=prod_csv, idvar=”Standards”, timevar=”Year”, chartid=”Standards”)

Which gives you the ‘Hans Rosling’ style flow chart. I can’t get this to embed in my wordpress blog, but you can click the diagram to view the interaction version. The higher up a standard is the more projects it is in and the further across it goes the more programmes it spans.

Google Visualisation Chart

Some things it made me think about:

  1. Data from PROD is inconsistent
  2. Standards can be spelt differently; some programmes/projects might have had a more time spent on inputting related standards than others

  3. How useful is it?
  4. This was extremely easy to do, but is it worth doing? I feel it has value for me because its made me think about the way JISC CETIS staff use PROD and the sort of data we input. Would this be of value to anybody else?  Although it was interesting to see the high number of projects across three programmes that involved XCRI in 2008.

  5. Do we need all that data?
  6. There are a lot of standards represented in the visualisation. Do we need them all? Can we concentrate on subsets of this data.

Getting data out of PROD and its triplestore

For a while I have been wondering about the best way of creating a how-to guide around getting data out of the JISC CETIS project directory and in particular around its linked data triple store. A few weeks ago Martin Hawksy posted some great examples of work he’s been doing, including maps using data generated by PROD, I think these examples are great and thought that they would be a good starting point for a how to guide.

Don’t be put off by scary terms as I think these things are relatively easy to do and I’ve left out as much technobabble as possible, The difficulty really lies with knowing both the location of various resources and some useful tricks. I’ve split the instructions into 3 steps.

  1. Getting data out PROD in a Google Spreadsheet
  2. Getting Institution, Long and Lat data out of PROD
  3. Mapping with Google maps.

The  steps currently live in a Google Doc while I update them. I’ve also created short screen casts of me following the instructions in case anybody get lost. Hopefully from here you have built up enough confidence to edit the queries for different results, the Google Spreadsheet Mapper to change the look and feel of your map or explore some the technologies behind the techniques.

You’ll want the Step By Step Guide and you can see Sheila’s example in her post here.

Obtain Data from PROD (via Talis store) to populate a google doc

Getting Prod Project Data w/ Long Lat Data into Google Spreadsheet

Mapping PROD Data with Google Maps

Visualisation session at the CETIS conference. Thoughts and resources.

We are 34 days away from the CETIS conference. On day two I have signed up to a session on Social Network Analysis and Data Visualisation being run by Sheila and Lorna. I’m really looking forward to the session as recently I have been thinking about visualisations, what they mean and how they can be used in the most effective manner and I have found understanding them quite difficult. I am only just getting my head around the area and hope that the session might be a hub for the experienced to share some of their protips. I thought that by airing some of my questions and sharing some favourite resources might be a good way to get the tips rolling in and a conversation going before the event. I guess that everybody at the session will have his or her own interests and questions and I would be interested to know what these are.

Tips please

Some of the questions I have:

  • When are visualisations useful, when are they not and what makes a good visualisation?
  • When is a visualisation more then a bunch of lines connecting things?
  • What models sit behind the visualisation?
  • How do you find and validate good data, particularly data about social networks?
  • What are the most effective ways of visualisation, any tips on development environment?


I’d also be grateful for any resources you think might be useful. I’ll start with two I’m working with at the moment.

  • A github repository belonging to Adam Cooper with examples to find emergent trends and “weak signals” in paper abstracts.
  • A handy book that aims to “introduce the principles of statistics and modern statistical analysis for a non-mathematical audience”. Does it well and introduces R at the same time.

Looking forward to the session and a protip from me.

Notes on badges

If you haven’t heard about Mozillas Open Badge Initiative, a great explanation and round up lives on Rowins blog. As Rowin points out that badges ‘draw upon widespread use of badges and achievements in gaming‘ and as somebody who has many badges and achievements in various game systems I can’t help but wonder if some of the problems that have cropped up in games might cross over into the Open Badge Initiative. Some early thoughts:

  • Nobody wants to complete a level using only a hyperblaster
  • Badly designed meta-goals can ruin an experience, some players will attempt to do all the tasks asked of them to get as many as the badges, achievements or points as possible. Is it fun completing any levels in quake 4 with only a hyperblaster? No, but it gets you a badge. Would learners do pointless tasks just to get badges, should we worry about loss of intrinsic motivation?

  • Bribery
  • Early in life a Microsoft console a game publisher realised it had a bad game on its hands. The answer to get gamers to part with their cash was to it would give them a full set of achievements in 3 minutes. Would users go for a product because it’s the quickest way to reach get a badge?

  • Badge Inflation
  • Achievements just aren’t enough anymore, as soon as games started giving out easy achievements gamers wanted more. How about a virtual hat? Now gamers are checking that their new game has extra avatar awards as well as achievements.

  • Bypassing the rewards system or creating a new one
  • What happens when developers read on a blog that bribery and badge inflation are a problem on your host platforms badge system. Some developers just create their own.

  • As punishment
  • Although now I like the term “useful indicator for characterizing an unknown” (see comments)

I think that badges are a really interesting idea. But maybe its worth thinking about other reward systems and the effects badges/achievements have had after implementation.

Developing a web analytics strategy for a distributed organisation

For as long as I have been a web developer with CETIS we have relied on analysing server logs to give an indication of traffic sources and visitor trends. This approach existed long before I joined CETIS and seemed like a logical way of doing things, CETIS has had many web servers and many different developers have installed different tools and resources and since they were all using the same servers and producing the same style logs it has been a reasonable method of producing comparable stats.

While this method of collecting stats has stayed the same over the life of CETIS, the direction of CETIS and the environment that it finds itself in has changed over time and a need for a new strategy has become apparent.

Challenges from JISC CETIS and the environment

  • JISC CETIS is more distributed from a technical point of view

Historically CETIS has had access to physical servers that sat in a server room somewhere in a University. A recession later and shifts in University policies mean that the abundance of resource is no longer available. While there are lots of external providers are happy to help you produce a flexible service and tie you into their hosting packages it does raise issues. Do we have access to server logs?  Are the logs the same? If not then are the stats produced similar to the stats package we use? Can we even produce stats?

Similarly JISC CETIS is moving away from bespoke code when there are popular services that do the same thing and this raises similar questions.  What stats do the services produce, are they comparable with other services, is there an API and will we have to pay to access what we’ve collected down the line.

  • JISC CETIS is more distributed from a people point of view

Staff in JISC CETIS are technologically savvy and have our opinions on the services and techniques that we like. While I think it is a good thing to have such a technically diverse organisation trying new and exciting things it is also a problem from a stats analysis point of view. Are staff hosting their blogs, events and resources on cloud services and if so how do we measure the use of these resources?

  • A call for more sophisticated analytics

In JISC CETIS there is an increasing call to know more about the things we do and how they are used. It is important for any organisation to respond to its environment and the questions we are asking ourselves about our resources are becoming more and more complex. Log files can only give you so much information and it seems that Javascript solutions are needed to answer these questions. Recent improvements in solutions such as Google Analytics offer real in depth analysis of your web traffic and resource usage

Implementation Woes

A simple step that we have taken is to start to role out javascript tracking  with Google Analytics over the CETIS services, but even that simple act starts to highlight issues. The first thing we noticed was that visitor numbers were hardly comparable. Some early thoughts on why this might be:

  • Google Analytics is more intelligent when it comes to what is and isn’t a visitor or a bot
  • Google Analytics is Javascript based and will not count anything if the tracking code is not executed for some reason
  • The hacks for Google Analytics to track binary files and RSS are not very good.

A hybrid solution

Despite the changing environment and early hiccups I feel positive about working towards a new web analytics strategy. I wrote this in an attempt to get my head around the issue and I think now I have some key starting ideas. I think that a hybrid solution is required as javascript solutions are more portable and answer more complex answers but are difficult to implement in such a distributed organisation and are held back by some of the limitations of javascript. I feel that we have to become more intelligence about how we analyse the data, my view is that analytics should be taken with a pinch of salt and that it is not about how high the figures are but about trends in these figures and that a good strategy for CETIS would be to identify places in its online resources with stats that can be compared and trends identified.

Finally I think that as organisations become more distributed and stats become more personal a web analytics strategy becomes more of an individual responsibility. I’m not quite sure what an effective strategy where analysis of individuals resources trends is helped to steer the organization as a whole would look like.

More to come…

Validating XCRI 1.2 with Schematron

I’ve started writing a Schematron ruleset that can be used to validate your own XCRI-CAP document so that you can get something that looks like this.


You can grab my work so far from google code. So far the ruleset only checks the core elements but I intend to work on it further soon. If your new to schematron you I recommend a scan of the schematron site and should be able to work out what to do with the ruleset by following the instructions on an old post of mine.

If anybody feels the urge to update or improve the ruleset please feel free too :)

Playing with canvas and webgl

I finally got around to playing with the HTML 5 canvas element and attempted to build a quick game to compare the process to my past experiences of Adobe Flash.

I was quite surprised with how much I was able to do using only canvas, WebGl, the gamma JS library and some example code. I was able to create:

  • Platforms
  • A controllable entity that the ‘camera’ follows
  • Enemy objects with collision detection
  • Dynamic Lighting
  • Multiple Levels
  • Basic Textures
  • Entities from COLLADA files

Although I was amazed that I could do these things the process wasn’t easy, it was a lot of work to do many of the things that I would take for granted in flash.

Fortunately there seems to be an explosion of libraries and game engines built using these standards that will make the process much easier and if you do want to create a game using canvas I would recommend not trying to reinvent the wheel and sticking to one of these. Although with there being so many of these engines popping up it’s quite hard to tell which ones will gain the most popularity.

While there might be some catching up to do for canvas/WebGL games the quality is increasing at an incredible rate while free libraries and game engines lower the entry level to creating them. While flash might be the weapon of choice for web based game developers now I feel Adobe will have to do something special to keep up.

The game is very basic and was just an attempt to see what was possible, still if you want to play it you can find it here and you’ll need the latest chrome/firefox/safari.

Thoughts on Agent Based Models and Institutional Systems

Over the past few months I have developed an interest in agent-based modelling using tools such as Netlogo, RePast or Swarm. These tools combined with increases in processing power make it incredible easy to get started and soon sucked me in.

Agent-based models are computation models that are used to make predictions about the interactions of agents in a system and how these interactions may affect the system on a whole. Quite often the systems being modelled are ones where simple small interactions on a low level have a huge effect on the overall system at a higher level such as how greenhouse gasses blocking infrared light might have an effect on global temperature (have a play on the model by Lisa Schultz here!)

After playing with these models I got me wondering about the possibilities of modelling the interactions of agents within educational institutions and how we could use these techniques to explain the emergence of behaviour but at the same time I have worried about how we would validate these models without ‘hard data’.

Here at the University of Bolton we have recently switched VLE to Moodle and it appeared to me that what could seem like a simple process of ‘changing the VLE’ was actually made up of very complex communications and interactions between the staff based here. Using this as a starting point I got together with a colleague and started to create a model that explained how we thought the communications within the University might look and how these communications could be disrupted or improved using technology.

At 2011 Cal Conference in Manchester my colleague Mark Johnson presented the model as a way of explaining how we thought technological interventions could be used to change communications and how this might have an effect on the how the institutions works on a whole.

The model showed the different types of communications between certain groups of people and how these communications could change when they people were placed in different social situations or when technological interventions were made.

Agent Based Model Netlogo
Screenshot of Netlogo’s Patch while the model is running

I thought the response from the audience was great, who did not worry about the validity of the model itself but seemed to find the visual representation of how we thought technical intervention may change communication useful. The reaction of the audience at the session made me realise that a powerful aspect of agent-based modelling might simply be the ability to demonstrate what your view on a problem is.

Playing with PROD v0.01

PROD is a project directory-monitoring tool for JISC funded projects; and used within JISC CETIS to aid in programme support. I feel that although the tool holds a great deal of interesting information on projects it is sometimes hard to convey that information to people who do not use the tool on a daily basis. I have been wondering how it might be possible to help disseminate some of the information in PROD to a wider audience. I thought I would start by taking a programme that has a rich array of information in PROD such as Curriculum Design and see how it might be possible to make the information more interesting.

Wordl of Standards and Technologies used in Curriculum Design according to PROD

Wordl of Standards and Technologies used in Curriculum Design according to PROD

One of my experiments was to try and get PROD to generate something visual for users to click and explore. Below is one of my first attempts at getting PROD to generate a mind map for the programme entries in PROD. You should be able to click and drag around the map to get a richer picture of the programme and embed this Google gadget in your own web pages. I think the interface is a little hard to use so you may want to download the XML that can be imported into your own copy of freemind or view the mindmap in a separate window. I hope to tweak the map to include items such as hyper links out to relevant projects and information.

Finally I tried to use the graphviz set of visual rendering tools to show relationships between the different projects in the programme. I haven’t attempted to tidy these up or make much sense of these yet. You can click to get the larger version/