Recreating the Oakland Budget Sankey Network Diagram using networkD3

Last weekend I was working on a project for school and came across this beautiful thing. I was so smitten that I sought to recreate it entirely using R.

I learned a couple things;

  1. Socrata API – The data was available via  data.oaklandnet.com and there were several modalities available to obtain it. I saw the SocrataAPI option and thought I’d give it a whirl. I found this link helpful in getting up and running w Socrata. It was very easy.
  2. This was the first time I had to do so much unguided data wrangling and it was very challenging. I’m certain that my code is not the most efficient way to wrangle the data but I was happy just to have achieved it. The biggest challenge was figuring out a way to give ‘source’ and ‘target’ node id’s to the correct entries. I ultimately ended up splitting the data into General Fund and Non-Discretionary Fund (The inner nodes of the Sankey), and then each split into revenue and expense.  I gave all the revenue line items a corresponding ‘target’ of either the General Fund or the Non-Discretionary Fund and I similarly gave all the expenses a ‘source’ of one of the two as well. I created a look up table and left joined with dplyr.

Once the data was organized correctly, the sankey was an easy function call, thanks to the networkD3 package. You can take a look at the code here.

Leave a comment