I sat down with Adam, Trifacta’s first engineer, to discuss his experience after starting in September 2012. Prior to his role at Trifacta, he worked at LinkedIn and Yahoo! Research, and graduated from Duke with a PhD in Computer Science. LinkedIn, Publications
How did you find out about Trifacta?
I’ve known Joe Hellerstein since grad school, so about 8 years. We were working in the same research area. I ran into him as he was starting the company in June of 2012.
What made you decide to join the company as Engineer #1?
One lesson I’ve learned in my career is, no matter what your interest, go work with the smartest people you can find. With that lesson in hand, it made going to Trifacta pretty easy. It was clear I was joining some top people that I could learn a lot from.
How did your work at Yahoo! Research and LinkedIn prepare you for your role here?
I was in the Database and Systems Group at Yahoo! Research, which at that time had some of the top people in my field — we had gathered from academia and industry at an exciting new research lab. In the 4 years there, I learned what I might have learned in 10 or 15 somewhere else. LinkedIn was great for filling in a lot of engineering experience I didn’t have. I was really impressed by their attention to detail and ability to manage large projects.
What technology did you have to build upon at the early stages?
Could you talk about some of the critical design decisions you made as Engineer #1?
We knew from the beginning that we want to transform your data no matter where it sits, whether Hadoop, a SQL warehouse, or other places. If we started out by compiling to only one of those backend environments, we would probably have made a lot of decisions we’d have to unwind later. So, early on we spent a lot of time figuring out how to compile to two different backends: one of which is a native platform we built, and one of which is Hadoop.
One decision I made in the early days was our strategy for running Hadoop jobs. You can compile to Hadoop natively by writing raw jobs, or you can try to rely on one of the higher level languages invented to run on Hadoop, such as Pig or Hive. I chose Pig for a couple of reasons. First, it looks much more like our Trifacta language than Hive does. Second, I’m really familiar with it: it was invented in my lab at Yahoo. I think we were able to prototype a lot more quickly as a result, while staying honest with ourselves about our strategy for running against data no matter where it was sitting.
What stands out to you about Trifacta’s company culture?
There’s a lot of people working really hard. We are happy to have whiteboard discussions, but we cut them off when the time is right and don’t argue things down to minutiae. We are a “move quickly, methodically, and thoughtfully” type of company.
What are some of the most interesting technical challenges Trifacta faces, from your perspective as a backend engineer?
Well, I can talk about something that’s on our minds lately, which is the wide table challenge. When you read papers about cluster computing or Hadoop, you’re used to hearing a lot about what horizontal partitioning can do for you. If you break up your table into pieces, you can process an unlimited amount of data.
We actually have some interesting problems when you have a huge number of columns and a relatively small number of rows. Despite my familiarity with column stores, I didn’t see this one coming, and we certainly have use cases where it appears. We face these problems all over our stack. How do you make sense of a lot of columns when you’re the user staring at your data? How do you process very long rows and execute expensive operations like regexes on them?
We’re always in a contest to see how many columns we can process, from both the UX and infrastructure side, and we continue to push that number up and up as we go. We’ve been successful so far. We’ve increased the number of columns we can process by a significant amount since the early stages.
As someone who has worked both in research and production-oriented engineering, where does the work at Trifacta fit into that spectrum?
I think after now having spent a bunch of years as a researcher and a few years as an engineer, I still feel more like a researcher than an engineer. At Trifacta I’m taking on some pretty big, maybe unsolved, problems, and I have the chance to formulate some big solutions to these problems. I miss writing papers, which is a big part of being a research scientist, and an area where I’ve had a lot of success. But, we may eventually do some publishing of our work at Trifacta, and I guarantee those papers will be impactful and they will write themselves; those are the best kind.
Can you tell us a bit about yourself personally?
I’m from New Jersey originally. There are a number of foods I miss, and the top two are the same for everyone from New Jersey: bagels and pizza. There’s some okay pizza out here, but for some reason it costs twice as much as in New Jersey and is not as good. I’ll also add Friendly’s, which most people in New Jersey don’t actually like, but feel some nostalgia for.
In terms of hobbies, at Duke, I played Ultimate Frisbee. My freshman year, we were average at best and losing to all of the awesome teams in our area. By my senior year, we made the national championships, so that was a lot of fun. A few years ago I switched over to distance running. I’ve run 5 marathons and a bunch of other races. My current goal is to break 3 hours.
When not at work I spend a lot of time with my wife and 2 kids, ages 1 and 3. I also like quoting obscure ’80s and ’90s movies.
What other advice do you have for anyone interested in joining Trifacta?
The key thing to know is, even though we’re now about 30 people, that’s still very small in the grand scheme of things. We’ve laid the groundwork to do a lot of the exciting things here, like figuring out what a user wants to do before they even want to do it. We have enough of the infrastructure now to innovate at a really fast pace.
So even if there’s code in place when you join us and it’s not Day 1, it’s still extremely early. Most of the awesome problems that got me to come here are still barely solved.