This visualization examines twitter messages sent during the 2012 Presidential Debates between Mitt Romney and Barack Obama.
Tweets were gathered using the Twitter search API for the first two debates (University of Denver and Hofstra University) and the Twitter streaming API during the third. The code is available on Github:
Filters were applied to the corpus of tweets to surface significant words. "Stop words", or common English words with low information density, were removed, as well as hyperlinks.
@name mentions and
#hashtags were removed, as well as mentions of the candidate's names, and analyzed seperately. The remaining words were processed by the Python NLTK "lemmatizer", returning the Lemma, allowing related words to be grouped for analysis.
The chart takes a list of the most common 100 words, hastags, or @name mentions for the given time period and sizes them according to a log scale for comparison.
Further improvements that could be undertaken include processing the word lists using TF-IDF weighting against a corpus of tweets using random English words as a search parameter, or pairing this visualization with video from the debates to provide context to the word cloud.