Announcing the Hackathon Challenge for the Visual Text Analytics and Social Science Workshop, to be held at Imperial College London and the London School of Economics, 24-25 March 2015.
On 25 February 2016, the tenth debate among the Republican candidates for the 2016 Presidential election took place in Houston, Texas, moderated by CNN. In this demonstration of the quanteda package, I will show how to download, import, clean, parse by speaker, and analyze the debate by speaker.
A lot of text analysis packages exist for R, such as quanteda, tm, qdap, and korPus. But these are only useful if you are proficient in R programming. What about users of alternative statistical packages, such as Stata?
I was recently asked for help from a colleague (@kmmunger) who was experiencing a choke on cleaning the tokenized texts from Twitter data. The tweets were in the JSON format that comes from the Twitter API, in what we thought was UTF-8 encoding. Turns out these tweets used some emoticons from the nosebleed section of the Unicode maps, and these were not being read properly into R, as quanteda was being used for processing this text.
Many people have reported problems when attempting to install the R package
topicmodels on R when using OS X Mavericks or Yosemite. The problem is that the binaries are not yet built for these versions of OS X, and you need additional software installed in order to build the source. Once you have built the package from source, however, it seems to work fine.