This past Sunday, the data scientist Nate Silver spoke at the University of California, Berkeley. A group of Trifacta employees went to the talk — an exciting event for a group of data science aficionados. For anyone who hasn’t been following Silver, he’s a statistician who makes predictions about topics like sports and politics. Most famously, he correctly predicted the outcome of every state in the 2012 election.
Silver gave an engaging presentation on Sunday that interweaved anecdotes with his thoughts on data analysis in the Big Data era. A key theme was the role that human judgment plays in data analysis. One of the most poignant anecdotes he gave was on the Deep Blue chess challenge in May 1997. The computer player’s artificial intelligence encountered an error and made a random (and suboptimal) move. The human player, Garry Kasparov, couldn’t understand why Deep Blue would make such an odd move, and assumed the computer had a superior understanding of the game. Devaluing his learned expertise and common sense cost Kasparov (who performed worse in the subsequent matches).
There has been some notable criticism lately about the perceived over-hyping of Big Data and its capacity to solve tough problems. Many critics argue that proponents of Big Data de-legitimize the importance of human expertise and knowledge, which can prove perilous in cases like the Deep Blue chess game. They have a point, especially amidst grandiose predictions that the exponential growth in data and compute power will render the scientific method obsolete.
Both in his talk at Berkeley and his book, The Signal and the Noise, Silver articulates a more balanced approach. As he states in the book: “technology is beneficial as a labor-saving device, but we should not expect machines to do our thinking for us”. We cannot decouple human judgment from data science. As he notes, “the numbers have no way of speaking for themselves. We speak for them. We imbue them with meaning.”
Silver also commented during his talk on Sunday that “most of the work” in analyzing large data sets lies in data preparation and cleaning. At Trifacta, we consider his point about the role of human judgment to be especially important during this part of the process. Computers are great at generating and processing large amounts of data. It takes a human to pull out the noteworthy features, examine anomalies, and transform the raw data into a format ready for analysis and visualization.
We can’t rely on our human hunches alone in the face of contradictory data, but let’s not throw out the baby with the bathwater. By marrying technology and the unique power of the human brain into each step of data analysis, we can come to more meaningful (and accurate) conclusions from an exponentially growing amount of data. That is our perspective at Trifacta at least. It was nice to see Nate Silver take a stand for the human side of the Big Data equation as well.