Wells Fargo Campus Analytic Challenge

Just wrapped up a couple weeks of text mining, topic modeling, and sentiment analysis using R! The Wells Fargo Campus Analytic Challenge asked students from 13 Universities to submit code as well as analytical responses to parse topic and substance from a data set of around 250,000 social media posts.

Having never done anything like this before, it took me a while to identify an approach and the R packages that would be most helpful. Luckily, R has a couple of very rich packages (and package vignettes!) for doing this sort of work. Unfortunately, the vignettes were one of the few sources available to investigate implementation and examples. One good web resource I found for the ‘LDA’ package was Carson Sievert’s page here. He walks through a straightforward implementation and shows off some nifty visuals as well.

I did most of the clean  up and text pre-processing work using ‘tm’ and ‘dplyr’. My favorite package though is STM, which I used to fit the topic model. It’s got lots of functionality and is relatively intuitive. It supports several D3 visualizations as well. I originally used the ‘LDA’ package, but the topics where much more coherent in the STM model. Both packages use an LDA model to fit the data, but the STM package also uses a generative model, but uses a slightly different algortihm (“Spectral”) and that seemed to make the difference. STM also has some nice interactive visuals (D3) that were converted into htmlwidgets by Kent Russell, who is very talented and a very nice guy – he takes requests!

STM_Corr_Viz
A static example of the D3 visualization of my STM model for the competition. Interact with the dynamic version in my submission report below.

A great package for sentiment analysis (and more) is qdap. I had a lot of fun going through the qdap vignette, which is extremely comprehensive. It took a long time to work through it but it was just awesome.
You can see my submission (best D3 experience on Firefox or Safari) here

And view the code here 

Hope this helps someone else get started in Topic Modeling and doing Sentiment Analysis in R!

 

Leave a comment