Visualization Workshop Hackathon Challenge

Posted on March 20, 2016 by Kenneth Benoit

Announcing the Hackathon Challenge for the Visual Text Analytics and Social Science Workshop, to be held at Imperial College London and the London School of Economics, 24-25 March 2015.

Text analysis of the 10th Republican Presidential candidate debate using R and the quanteda package

Posted on February 26, 2016 by Kenneth Benoit

Tags: quanteda Data Science

On 25 February 2016, the tenth debate among the Republican candidates for the 2016 Presidential election took place in Houston, Texas, moderated by CNN. In this demonstration of the quanteda package, I will show how to download, import, clean, parse by speaker, and analyze the debate by speaker.

What text analysis software is available for Stata?

Posted on July 14, 2015 by Kenneth Benoit

Tags: General quanteda R

A lot of text analysis packages exist for R, such as quanteda, tm, qdap, and korPus. But these are only useful if you are proficient in R programming. What about users of alternative statistical packages, such as Stata?

Encoding headaches, emoticons, and R’s handling of UTF-8/16

Posted on February 5, 2015 by Kenneth Benoit

Tags: quanteda R

I was recently asked for help from a colleague (@kmmunger) who was experiencing a choke on cleaning the tokenized texts from Twitter data. The tweets were in the JSON format that comes from the Twitter API, in what we thought was UTF-8 encoding. Turns out these tweets used some emoticons from the nosebleed section of the Unicode maps, and these were not being read properly into R, as quanteda was being used for processing this text.

How to install the R package topicmodels on OS X

Posted on February 4, 2015 by Kenneth Benoit

Tags: R

Many people have reported problems when attempting to install the R package topicmodels on R when using OS X Mavericks or Yosemite. The problem is that the binaries are not yet built for these versions of OS X, and you need additional software installed in order to build the source. Once you have built the package from source, however, it seems to work fine.