Recently I’ve been playing with an R wrapper for a machine language library called Mallet to generate lists of topics from a series of text documents. The technique is called Topic Modelling and I have gotten to grips with it from Ben Marwick‘s readings of archaeology papers which has some excellent reusable code. A topic in my model is simply a collection of words that make up the topic. Mallet can do all sorts of fancy things with the words and topics, it can tell me how likely a word is to appear in the topic, analyse text and tell me how much of that text belongs to which topics.
Knitr support in RStudio is a nice but the default styling of the HTML output, in particular the treatment of tables, is not to my taste. It is possible to override the default handler for markdown, as described on the RStudio site, but this doesn’t immediately work when using knitr in RStudio as several posts to stackoverflow etc testify (with some interesting workarounds proposed involving post-processing the output). This blog post (Neil Saunders) explains how to make it work but requires sourcing a file manually.[..]