Recently I’ve been playing with an R wrapper for a machine language library called Mallet to generate lists of topics from a series of text documents. The technique is called Topic Modelling and I have gotten to grips with it from Ben Marwick‘s readings of archaeology papers which has some excellent reusable code. A topic in my model is simply a collection of words that make up the topic. Mallet can do all sorts of fancy things with the words and topics, it can tell me how likely a word is to appear in the topic, analyse text and tell me how much of that text belongs to which topics.