Excel launched in 1985, when companies dealt in megabytes and the concept of regularly crunching terabytes or even petabytes of data was unheard of. Today, big data is the norm. A 2015 Gartner survey determined that more than 75% of companies are investing or planning to invest in big data in the next two years. Similarly, a recent Tech Pro Research survey shows that 49% of large companies are implementing big data solutions. At Trifacta, we love Excel and use it daily. But for modern businesses with big data needs, Excel is not the right tool for the job.
Excel and Big Data Don’t Mix
Excel has some major limitations that become more profound as you add increasing amounts of data and calculations. Once you add VLOOKUP or PIVOT functions—or even just apply a large number of functions within a workbook—Excel’s performance slows to an intolerable level. We are talking 10-30 minutes to open a workbook or refresh a table. Even worse? Excel often crashes when burdened with large data sets.
Many businesses have been forced to reckon with Excel’s insufficiencies. Users have discovered some workarounds that allow them to work with big data, but, while functional, they’re flawed.
Here are a few specific examples of where Excel breaks down when working with data at scale:
Manual Calculation Mode
Excel defaults to recalculating all formulas in real time as soon as you enter them, which can cause a major slowdown. Changing to a manual calculation—refreshed only when the F9 key is hit—can assert some control over when Excel recalculates, but it’s important to note that enabling manual calculation will not speed up Excel’s ability to perform calculations.
Conversion of Formulas into Static Values
On large datasets, applying a formula in Excel can take hours. To lessen time to insight, you can copy the column with the formula and paste it as values (Edit – Paste as values), then delete the formula column. But this method is hazardous . You will lose the rules that created these newly-pasted values, determining how you got to your final numbers will be next to impossible for outside auditors, and all future maintenance will have to be completed manually.
The VLOOKUP or PIVOT Function
These extremely time-consuming functions will always slow down your workbooks when dealing with large data sets. Is it possible to not to use them? Given that VLOOKUP and Pivot Function’s are the highest functioning techniques used to manipulate data in Excel, most businesses do not consider this an option.
These workarounds, while helpful, are not sustainable in the long-term. That’s why Trifacta has employed a better approach that leverages a best-in-class data processing engine while also offering interactive feedback on all data transformations.
Trifacta Was Made for Big Data
Trifacta’s data wrangling solution was created to handle the large data volumes of data, where manipulating gigabytes or terabytes of data is commonplace. How do we do it? Trifacta’s high-performance data wrangling engine, Photon, enables faster feedback on greater volumes of data, which leads to huge productivity gains for all of our users. In addition, Trifacta’s Intelligent Execution Architecture maintains support for a growing list of modern data processing engines, such as Spark, Google Data Flow and MapReduce.
Here’s how Trifacta works: when you open a large dataset, the application automatically presents compelling visual representations of your data. When you brush over or click onto certain elements, Trifacta will suggest logical transforms that you can select, edit, or build from scratch with real time feedback. The process is seamless for users, who can also leverage Trifacta’s at-scale profiling to examine the validity of the resulting output data set and search for any remaining data inconsistencies.
Leveraging an intelligent visual interface and powerful compute framework, Trifacta has been able to provide the best user experience for analysts—keeping interactions agile and reactive—while delegating the heavy processing to the right engine.