The current “big data” era has lead to a proliferation of new technologies that enable a broader base of users to consume and analyze data. In large part, this has been brought about by the advancement and popularity of data visualization. We’ve moved beyond traditional dashboards and are now incorporating data visualizations into nearly every aspect of the analysis process. With advances in data visualization technology, designers and developers have more flexibility in how visualizations are leveraged within data products. But without a well-defined strategy for the end purpose of the visualization, these products may prove ineffective.
In our recent talk at Strata NYC, Jock Mackinlay (VP of Research and Design at Tableau) and I discussed the process of designing data visualizations for distinct stages of the analysis process, and compared how visualizations are used in Trifacta and Tableau. Both tools aim to support the iterative process of visual analysis, including data preparation, quality assessment, exploratory analysis and reporting. However, the tools differ in which parts of this process they primarily target, and in how they leverage visualization to support users’ goals.
Consider the case of campaign finance data from the Federal Election Commission (FEC). By examining the spending records of candidates for political office, we might gain some insight into their strategies. However, the actual data tables are published as multiple text files, use inconsistent date formats, and must be combined (joined, in database terms) to connect spending records back to the candidates. Significant cleaning and transformation is required prior to exploratory analysis.
We can initially parse and examine the data using Trifacta, and perform the joins needed to link the relevant records together. Rather than forcing users to manually specify dozens or more charts, Trifacta also profiles the data set and automatically generates a set of interactive summary visualizations. These visualizations provide insight into the shape and structure of the data, leveraging both perceptual principles and the underlying data distributions to drive design decisions. In this way, Trifacta promotes a “breadth-first” tour of the data, forming an initial overview to aid understanding and to identify and correct data quality issues.
Once the data has been suitably prepared, we can export it from Trifacta and load it directly into Tableau. We can then explore more targeted questions. For example, we might examine how and where the 2016 presidential candidates are spending their money. Through a series of visualizations, we discover that payroll and advertisement buys are common expenses, and that Donald Trump spends more on private jets than anyone else. More interestingly, we can see very different geographic patterns. For example, Hillary Clinton has a much larger spending spread across the country, suggesting an early focus on the local ground game. By enabling users to interactively create and refine these visualizations, Tableau supports a “depth-first” style of question answering: users might start with a vague question, refine it over a series of charts, and, once satisfied, begin again to explore a different set of questions.
Across examples like these, we have found that initial explorations benefit from taking a wide view across the data set, which Trifacta accelerates via automatic visual profiling. There is then a natural transition to more question-focused explorations, for which Tableau is well-suited. These different tasks — initial overview versus targeted exploration — lead to different visualization workflows. However, we’ve also seen that analysts need to move back and forth between these styles. How might “breadth-first” and “depth-first” explorations be better integrated?
To learn more about these topics, and also get a peek at ongoing research on new visualization tools, you can watch the recording of our Strata presentation below. You can also find additional videos and resources in the learning section of our website.