Start Free

Speed up your data preparation with Designer Cloud powered by Trifacta

Free Sign Up
All Blog Posts

Wrangling Data for Good

April 12, 2015

You wouldn’t think spending a weekend at the office would be the start of an adventure to use data to change the world, but exciting things were afoot here at Trifacta’s headquarters the weekend of March 27–28. We had the pleasure of hosting DataKind San Francisco’s first ever DataDive, with three volunteer teams working with some amazing Bay Area nonprofits –TechSoup Global, Mission Economic Development Agency (MEDA) and San Francisco Health Improvement Partnership (SFHIP).

A sample of what was uncovered during the event:

“Religious organizations are really hot in Europe right now.” — TechSoup Global

“After enrolling in our financial capability program, participants are able to establish a credit history, which may mean taking on and managing some debt.  Although increasing debt runs counter to our programmatic measures of success, it is actually a way for individuals to participate in mainstream financial institutions. And that may be exactly what we want. — MEDA

“San Francisco residents can now have at their fingertips a new, more dynamic picture of what’s going on in their neighborhoods when it comes to safety.” — SFHIP

Doesn’t sound like your typical weekend at the office? Welcome to the world of data science volunteerism.

It’s no secret that the companies of San Francisco boast some of the smartest data scientists in the world but what is less publicized is the fact that the neighborhoods where these very companies are headquartered face issues with crime, homelessness and income disparity. And yet how often do we as data professionals get the opportunity to use our skills to help address social issues in our own backyard?

DataKind harnesses the power of data science in the service of humanity by bringing together data scientists and nonprofits in a symbiotic way.  The nonprofit benefits from improved quality, access to, and understanding of the data they collect, and the data scientist exercises their analytical muscles in a way they never have before.

Scratch that. The analogy that comes to mind is that of diving for pearls: the data scientist is thrown into unfamiliar waters of data they have little context for, riding the bumpy waves of ill-formatted columns and missing data, and plunging to ice-cold depths of the data’s provenance to emerge with the glistening gems that are new insights and models. In just two days. The adventure is well worth it when you know your work is going to help your community thrive.


Work actually began the week before, during a DataJam session where a few of the volunteers wrangled the beast that is the raw data of these organizations by profiling, cleaning, standardizing, joining, and even coming up with definitions to clarify metadata. Now, the weekend’s volunteers could immediately dive into the various data sets with relative ease.


As a new grad, I had never really seen this kind of unstandardized data before. I lived in the academic utopia of round numbers and defined columns. This was the first time I used Trifacta, the product I work on eight hours a day, in this context – to prep data for nonprofits having real world impact. If you have never heard of Trifacta, the product is designed to help individuals wrangle data of all shapes and sizes in preparation for analysis. Since most of us had never seen this data before, we used Trifacta to to perform some high-level profiling so that, ‘our eyes wouldn’t bleed’ at the sight of raw numbers. I’d never before known how powerful enabling the agile exploration of data in native formats could be, until I experienced it for myself preparing data for this DataDive.


So who are these organizations that we were working with? TechSoup Global is a 27-year old organization in San Francisco that delivers technology into the hands of other nonprofits. Mission Economic Development Agency, MEDA, seeks economic justice for San Francisco’s low and moderate-income Latino families through asset development. SF Health Improvement Partnership, SFHIP, is a new multi-sector initiative designed to improve the health and wellness of all San Franciscans.

The technical maturity, age, and size of the organizations was vastly different, but their representatives had two basic goals:

  1. Can we make sense of the existing data? i.e. What have we learned?
  2. Can we make sense of future data? i.e. How can we apply it to anticipate future needs?

As Saturday rolled on, each nonprofit team self-organized into smaller slices roughly centered around:

  1. Improving access to and quality of data (toolkit: R, Python Pandas, Trifacta, Excel)
  2. Semantically understanding sub-sections of the datasets, even joining with external data for context (toolkit: R, Python Pandas)
  3. Visualizing along the feature axis that seemed most relevant to the variable being measured or mathematically modeling and clustering outcomes based on previously identified feature vectors (toolkit: Tableau, R(ggplot2), D3, Shiny)


Throughout the weekend, the volunteer teams worked furiously to dive into the data and search for those pearls of insight. Before we knew it, it was already Sunday and time to present our key findings.




TechSoup’s ~2 million row dataset came from their software donation management program, and they were looking to understand their NGO member budget, inter-temporal and inter-regional effects, as well as overall nonprofit donation program behavior patterns.

  • “Religious Organizations are really hot in Europe right now.” — TechSoup Global

In 2013, several technology donors made product donations available to 501(c)(3) religious organizations, who have since discovered TechSoup as a partner. Coupled with TechSoup’s continued global expansion, this caused an unprecedented increase in religious organizations in Europe requesting large amounts of donated software due to their geographic need to manage members and activities electronically. TechSoup Global is now using this information to better serve these customers and evaluating implementing a “customer success” model.



MEDA had around 10,000 rows in a very wide dataset that collated information for each individual across their gamut of financial asset building programs. They wanted to better understand the effectiveness of each program, the impact of one-on-one coaching, and an inter-temporal analysis of overall individual success.

  • “After enrolling in our financial capability program, participants are able to establish a credit history, which may mean taking on and managing some debt.  Although increasing debt runs counter to our programmatic measures of success, it is actually a way for individuals to participate in mainstream financial institutions. And that may be exactly what we want. — MEDA

MEDA’s FinCap program provides underserved communities access to the mainstream economic system. FinCap participants learn how to manage healthy amounts of debt while building credit, leading to an increase in their total debt. Analysis performed during the DataDive showed that participants who came in with large debt, were able to reduce it significantly. The team left MEDA with recommendations on next steps including different ways of looking at client success.



SFHIP had all publicly available GIS files and datasets, including demographic information, locations of key institutions like schools, and crime data. Their team was seeking to create an automated interactive data visualization tool that San Francisco residents can use to glean insights into factors that impact public safety.

  • “Now we can focus on our neighborhoods in a new way, see important differences between them, and work with residents to help them make sure policies that are supposed to ensure that their communities are healthy and safe are working for them.” — SFHIP

The SFHIP team left on Sunday with a prototype they can test and refine with community and civic partners. They have workshops planned not far down the road to put the product developed during the DataDive – and all the insights built into it – into practical use for the improvement of communities who need it most.

What a Weekend!

We came out of the weekend having learned a lot. As volunteers, we learned to bounce ideas off each other, work under pressure, make unfamiliar R functions work, and truly attempt to understand the question before we rush to the answer. As nonprofit representatives, we learned how to improve our data collection in the future and gain a new appreciation for the power our data has to transform our work. And as organizers, we learned that there is no such thing as too much pizza.

Armed with valuable new insights, these organizations are now moving forward in their data science journeys while the volunteers are already looking forward to the next DataDive. If you’re a data scientist or nonprofit in the Bay Area interested in learning more, I encourage you to attend one of DataKind San Francisco’s upcoming Meetups!