Agile data transformation requires fast iteration: Assess the current state of your data, transform it to move closer to your desired end state, repeat. Accelerating that interactive loop dramatically accelerates the overall time it takes to build data pipelines. Executing transformations is a major factor in the latency of this loop. Transforming data requires computation, […]
More Sean Kandel • February 9, 2021 Much has been written about the shift from ETL to ELT and how ELT enables superior speed and agility for modern analytics. One important move to support this speed and agility is creating a workflow that enables data transformation to be exploratory and iterative. Preparing data for analysis requires an iterative loop of forming and […]
More Sean Kandel • January 15, 2021 For those who work with data regularly, the problem of “Data Wrangling” can be one of the most frustrating aspects of performing analysis. My first exposure to the real pain associated with data wrangling was my work in quantitative research at Citadel Investment Group. My work revolved around data but much of the data relevant […]
More Sean Kandel • August 28, 2014 Blog content originally posted at hortonworks.com The most commonly reported use of Hadoop today is data transformation. After standing-up a cluster and scaling it for raw data collection, organizations begin the hard work of preparing various types of data for new analytic use cases. In many Hadoop implementations this process has slowed wider enterprise adoption […]
More Sean Kandel • June 3, 2014 Life science research breakthroughs are driven by access to data. So it is no surprise that open access to data has often been a discussion in the scientific community. For many years PLoS, a publisher of seven scientific journals including the largest scientific journal PLoS One, has encouraged the sharing of data among researchers. But […]
More Sean Kandel • March 15, 2014