Celebrating the Small World of Big Data
Computing is a small world; data is an even smaller one. It’s small enough that a core group of technology and business leaders can gather in one room to throw a birthday party for a founding figure. This weekend, we’re celebrating Mike Stonebraker’s 70th birthday in a daylong event at MIT. Everyone is welcome, and it will be livestreamed as well.
Stonebraker is, of course, famous as a founder of the relational database industry, a serial entrepreneur, and the foremost academic researcher in database systems. I’ve known him as a research advisor, an academic colleague, a co-founder, an academic rival, a business competitor, and a friend. The man is a force.
What’s hard to keep in perspective is how long Stonebraker has been a force. He led the relational database revolution as a young professor, via the Ingres project he started at Berkeley in the early 1970’s. The very idea that you could build an efficient database system with a high-level query interface and make it work on 1970’s hardware seems almost ridiculous in retrospect. Stonebraker was a believer though, and he pulled it off with a handful of grad students and colleagues. And he released the source code under a highly permissive license (“open source” wasn’t a phrase back then). Only IBM — which was absolutely titanic at the time — had equivalent technical ambitions. This was still some years before a young Larry Ellison took on both Stonebraker and IBM, in a storied battle for market dominance.
My own experience with Stonebraker began on his second big project, Postgres. I was a grad student on the project, along with a host of other talented students including Mike Olson (Cloudera, Sleepycat) and Wei Hong (Arch Rock, Illustra). Postgres was conceptually the most ambitious data management system ever designed — an integrated system with six or seven major technological shifts baked into it, each of which could easily have been a research project or startup company on its own. Interestingly, Postgres pioneered the idea of pushing user code into the data processing infrastructure, with rich query optimization that took account of both relational and user-defined pieces of a query. The impact of Postgres on research and open source is profound. Its influence on commercial databases has been substantial as well — the Postgres codebase has been adapted to develop a number of influential industrial systems including Illustra, Netezza, Greenplum, AsterData, and Redshift.
Projects like Ingres and Postgres have had enormous impact on research and the computing industry. But beyond any individual project, Stonebraker’s biggest contribution is in demonstrating how to drive computing technology from both the university and the field. A dedicated academic and serial entrepreneur, Stonebraker’s biggest wins were influential in both spheres: Ingres -> Ingres Corp, Postgres -> Illustra/Informix, C-Store -> Vertica and more. But the full accounting is even bigger when you look at his students. Bob Epstein was co-founder of Sybase; Dale Skeen was co-founder and CTO of Vitria; Diane Greene was co-founder and CEO at VMWare; Mike Olson was co-founder and CEO at Cloudera. Meanwhile, Stonebraker’s original home base of Berkeley is once again the center of the whirlwind, this time for Big Data: the last Strata Big Data conference was awash in talks from companies that grew out of recent Berkeley research projects, including Trifacta, Databricks, Captricity and Wise.io.
Like Stonebraker, I started Trifacta as a Berkeley professor — but this time in collaboration with folks at Stanford: Jeff Heer and Sean Kandel. Our Berkeley/Stanford collaboration is a major point of difference from the rivalries of earlier days, and an indicator of what makes Trifacta unique: the integration of diverse technologies, approaches, and schools of thought. We have ties to not only the Berkeley data community, but another set of academic/entrepreneurial leaders including senior members of Stanford’s data visualization group like Pat Hanrahan (Tableau, Pixar). We believe that today’s biggest challenges require that mindset of collaboration across diverse areas like data and human-computer interaction design.
Today’s data industry and technology are vastly different from the world of the early 1970’s. But two themes remain the same across the decades: the centrality of data to both business and computer science, and the ongoing need for innovations that let users work with huge volumes of data in ever-simpler and more powerful ways. Mike Stonebraker saw the importance of these issues early, and he’s pursued them with creativity and vigor for over 40 years. The trails that technologists blaze today lead off from the long path he is still actively following.
So… to Mike Stonebraker on his 70th: Happy Birthday, with admiration and gratitude!