Educational Data Mining Conference – Impressions and Personal Favourites

The 2011 Educational Data Mining (EDM) Conference (in Eindhoven) is the fourth so far and provided an interesting, if sometimes quite esoteric, experience. A measure of comfort with statistical and data mining methods and vocabulary was necessary to really follow most of the papers, although it was by no means the case that they were of a narrow technical nature.

The Shock of Tutoring Systems

The majority of the papers were devoted to tutoring systems, particularly “intelligent tutoring systems” (ITS) and to a model of learning that is rather contrasting to the centre of gravity of the dominant models within the educational technology (e-Learning, technology enhanced learning, etc) domain. I will comment on this first, before mentioning some of my personal favourite posters/papers.

The dominance by papers on ITS and tutoring systems generally seems to be a consequence of the origins of the EDM Conference, arising particularly from a workshop of the Artificial Intelligence in Education (AIED) conference. The dominant model of learning in this work is based on “knowledge components” and the idea of “knowledge tracing“. There is a strong bond between this and AI concepts.

At times I got the feeling that there is quite a lot of work going on within this paradigm, almost blinkered to challenging its assumptions. It is certainly the case that there are very widely used ITS systems in the US (e.g. Cognitive Tutor) and a VERY large amount of log data available in the Pittsburg Science of Learning Data Shop. Given the practicability of working in this area (data, an established model and large-scale real use of some pieces of tutoring software) it is not surprising that career academics publish in it. As the conference wore on, it became clear that the EDM community is not blinkered; several conversations, moments of excitement in post-presentation questions and some clear signposting from a few thought leaders suggested to me that this is a community that is looking well beyond the point of current focus and is interested in reaching out to people whose pedagogic home is somewhat closer to social constructivism.

One thing remains absolutely clear: there remains a great deal of difference between North American and European educational practice and culture. I wonder, though, whether “we” should recover from our recoil against ITS, give a bit more thought to see whether there is something to make our own for some applications.

Some Personal Favourites

The full text of the papers and abstracts of posters is available online; the links are to these, which are PDF files.

The poster that won the “best poster” prize and a poster that was never shown both interested me for the same reason: they are essentially about mining and visualising data in the service of reflection. This seems to be a bit of a neglected area in general; Business Intelligence and the kind of tracking and performance features in VLEs is all very well but they seem rather dry and outcome rather than process oriented. My feeling is that this is an area where some good innovative ideas could go along way: in helping students to understand and manage their own learning and learning practices; and in helping teachers understand how students are really using the VLE, maybe challenging some assumptions, whether tacit or as held-forth over. While neither of these two pieces of software is finished-and-ready for use, both stimulate further ideas and are stepping stones across this particular stream:
* eLAT: An Exploratory Learning Analytics Tool for Reflection and Iterative Improvement of Technology Enhanced Learning (Anna Lea Dyckhoff, Dennis Zielke, Mohamed Amine Chatti and Ulrik Schroeder), “best poster” prize
* Brick: Mining Pedagogically Interesting Sequential Patterns (Anjo Anjewierden, Hannie Gijlers, Nadira Saab and Robert De Hoog)

My top three papers are (comments follow):

* What’s an Expert? Using learning analytics to identify emergent markers of expertise through automated speech, sentiment and sketch analysis (Marcelo Worsley and Paulo Blikstein)
* Student Translations of Natural Language into Logic: The Grade Grinder Translation Corpus Release 1.0 (Dave Barker-Plummer, Richard Cox and Robert Dale)
* A Dynamical System Model of Microgenetic Changes in Performance, Efficacy, Strategy Use and Value during Vocabulary Learning (Philip I. Pavlik Jr. and Sue-Mei Wu)

“What’s and Expert” reports on work following the constructionist tradition of Seymour Papert. Their primary research question – “How can we use informal student speech and drawings to decipher meaningful ‘markers of expertise’ in an automated and natural fashion?” – was the first of the papers presented grounded in such an educational theme. Whether for summative or formative purposes, this approach looks really interesting for anyone interested in making the connection between “authentic” behaviour and the more obscure growth of ability.

The “Grade Grinder” paper is ostensibly rather esoteric, being based on the release of data from an online self-test tool for the translation of logical statements from visual representations and natural language to formal logic. The interesting aspect is what they have been doing with this data, which is to try to uncover and better understand the causes of error and misconception. This is not a new kind of research question – science education research has considered misconception for years – but the methods employed to date have generally been interview rather than data-centric. My curiosity about such questions – not to mention its relevance to educators – that explore the characterisation of misconception and its consequences is why I chose this paper.

It isn’t so easy to justify choosing last of my top three; the exploratory nature of the work is clear in the paper and was noted by the presenter. What made it stand out is that the researchers have been trying to look at the question of motivation and I believe this is an important question in its own right and one where intelligent use of data mining could help to make more visible some of the invisible causes and consequences of changes in motivation. On the whole, however, the discussion and results tended to say more about the use of strategies. They had also investigated simulations, which I find an appealing, if problematical, means of investigating the dynamical effects of proposed models of affect.

Curiously, all but one of the things that I found most interesting were not full papers and it seems that at least two of these had been rejected as such by the reviewers.

The invited speakers were 2/3 from outside the EDM community. The conference opener was Barry Smyth from UC Dublin, a clearly entrepreneurial chap. His topic was social search and in particular his start-up, HeyStaks. Barry made much of  statistics on the amount of time people spend on web search and the percentage of time the search is for something previously found by the searcher or a friend/colleague; they are indeed most sobering. HeyStaks is intended to address this and falls under the category of “social search”. Social search has been lurking just off the hype radar for a while but has the feel of a term that we’ll be hearing a lot of pretty soon, not least due to the release of Google+. HeyStaks takes a significantly different approach to Google+, being based on a browser plugin that sits over Google, Bing or Yahoo use and communicates with the HeyStaks server to store the “staks”. HeyStaks also lets you re-find things from your own search-and-select history. A peer-to-peer version would be great but not so appealing to an entrepreneur. I can see HeyStaks being more attractive for teaching and learning or research group use and it would be interesting to see an analysis of the different perspectives on, and realisations of, “social search” more generally and the relative afforances for educational use.

The invited speaker for day 2 was Erik-Jan van der Linden, the CEO of Dutch visualisation software company MagnaView and a man with a history of research on data visualization at Amsterdam and Tilburg universities. MagnaView pitches variations of their software at the legal profession, education, etc, so far limited to secondary education in the Netherlands. This seems to have made a number of “what if” questions become more answerable but also stimulate more questions among the users. They are also experimenting with using user-intuitive actions to drive parameters for data mining, for example by allowing users to re-assign clustered items by drag/drop.

In conclusion: for all this conference has some history and to some extent some baggage in AI and ITS and in spite of the quite technically-challenging nature of a lot of the work, I see this as a community ready to embrace those from other backgrounds with an interest in “learning science” and one where some more people whose mindset is education-first will have a challenging but exciting time over the next few years.

Full proceedings are available for download from the EDM2011 site.