Data Engineering – The Linchpin of Big Data Part I: The Role of the Data Engineer
As a child, I spent most of my school breaks and weekends working construction alongside my father and learning the art of ceramic tile and marble. I loved being around the heavy equipment, walking through the mud, and hearing the sweet honk of the food truck. I also learned quite a bit about the construction business. My greatest takeaway is that it’s not just grunt work – the majority of those involved are artisans. In fact, one area of expertise receives a level of respect of almost mythical proportions. Who garners such respect? Plumbers! They bring warmth, remove waste, and most importantly, install the pipes that bring life-sustaining water into the home.
Big data has similar expert artisans – data engineers. These skilled individuals are the plumbers of big data. They design and implement the infrastructure that delivers data, the life-sustaining resource of any business.
In this blog series I introduce, describe and detail the role of the data engineer, otherwise known as big data’s plumber.
Big data analytics is a team sport involving multiple players, but we tend to mostly hear about data scientists. In 2012, Tom Davenport and DJ Patil published a Harvard Business Review article lauding the role of Data Scientist as “The Sexiest Job of the 21st Century.”
As reported by Google Trends, the term “data scientist” became much more popular right around the time of the HBR article.
There is no question that the data scientist is absolutely critical in helping organizations address their most difficult challenges. However, every successful rock star is also backed by an ensemble of highly talented musicians. The data scientist does not go it alone. They are part of a team, including data engineers, who provide a critical function within this team.
A strong argument can be made that if an organization can justify a data scientist they can also justify a data engineer. What does the job market say to this?
While data gathered thus far indicates a strong demand for both data scientist and data engineer roles, it’s interesting to note that the numbers are greater for the data engineer.
Big data requires a lot of plumbing (data engineering) and a lot of hands-on time with many technologies that are precisely in the wheelhouse of the data engineer.
By looking at various job descriptions, we can summarize data engineering as a collection of activities that set the stage for analytics.
Data Engineering activities and skills include:
- Advising on and managing big data infrastructure
- Architecting and developing data ingestion pipelines
- Developing POC’s with emerging technologies
- Assisting with data preparation
Clearly, the work of data engineers is not mutually exclusive to that of data scientists. The depiction below shows that they are, in fact, complementary within the analytics team.
A significant portion of the overlap occurs when data engineers assist the data scientist with data wrangling, a task that often accounts for the majority of time spent on an analytics project. The United States Chief Data Scientist, DJ Patil, recently tweeted, “I’ve found in my experience that cleaning the data is 80% of the hard work.”
By assisting with this aspect of an analytics lifecycle, data engineers free up data scientists to focus on what they do best. EMC’s Bill Schmarzo elegantly summarizes “… what they do best” as, “Identify, brainstorm and/or uncover new variables that are better predictors of business performance.” That is, the performance of key business initiatives.
By way of analogy, this post began by giving a high-level description of the role data engineers play in big data. The term “data engineer” itself doesn’t have a rigid definition, as a quick look at the job boards makes clear.
As such, I am interested to hear your thoughts on data engineering and how you believe data engineers fit into big data teams.
In Part II of this series I will discuss the tools, technology, and work streams of the data engineer and the role they play in helping organizations achieve key business initiatives.