Data engineering has emerged to be an important, if sometimes overlooked, relation to the hugely hyped field of data science. It’s important to keep them distinct: while data science is really about uncovering insights and asking questions using the data at your disposal, data engineering is all about putting the infrastructures and tools in place to actually make that happen.
Except it’s more than that. By seeing data engineering as merely an enabler of data science we actually fail to see the huge impact it can play on just about every function within an organisation. As the field evolves it will become business critical, tied up not just within analytics teams or even IT teams, but with everything from finance to product to software engineering.
In this post we’ll look at some possible future trends of data engineering. Taken together they’ll help us form a view on exactly what the field will look like - individually they all raise important questions about the way we use data and the way we manage our systems.
A greater distinction between analytics engineering and data engineering
This is already happening; it’s just not always that visible - it’s still too easy to see your data capabilities as a single coherent thing, staffed by a handful of related roles.
That won’t necessarily be the case as the field evolves. Instead, what we’ll see is data engineering become a discipline that’s more focused on building, maintaining and optimising the systems that enable data to run through an organisation, while data science and analytics teams will be much more focused on tackling key questions (indeed, we might continue to see data science becoming a capability that sits within individual teams, rather than something that sits across an organisation - but that’s for a different blog post).
Product and agile mindset
Much has been made in recent years of self-service analytics and business intelligence. Often these terms were used by vendors looking to position their products as something that could be used by anyone - even those without engineering or programming experience.
That probably won’t disappear, but what we will see is data engineers adopting the sort of product mindset required to build and embed self-service analytics within a business. This might seem strange, but given the nuances of every organisation, having a specific capability that can evolve and adapt your data workflows is far more effective than simply spending money on a very expensive platform.
In turn, this will require a level of agility within data engineering teams. The close link between data engineering and software engineering has been noted by a number of people - in the context of regular releases and consistent delivery, this is particularly true. For some software engineers, a move to data engineering might even be a smart career move.
Data mesh and data fabric: portability and integration
Data warehouses and data lakes have been two critical elements of data engineering ever since it first emerged. They’re not likely to disappear, but we are likely to see an evolution in how we think about data architecture. Although there are potentially many facets that will evolve, two of the most crucial elements will be data meshes and data fabrics.
Both are ultimately about better coordinating and unifying the way data is used within an organisation, but they do so in slightly different ways. As EY’s Data Platform Architect Lead James Serra puts it on his blog:
“A data fabric and a data mesh both provide an architecture to access data across multiple technologies and platforms, but a data fabric is technology-centric, while a data mesh focuses on organisational change.”
Another way of viewing it is by seeing data fabric as the evolution of the data warehouse - centralising your infrastructure, making everything available via APIs - while a data mesh is the evolution of a data lake - distributed and decentralised, rather than a single heavy monolith.
The data mesh in particular will help push forward the point above. As Serra says “its goal is to treat data as a product.”
Delta Lakes: Improved reliability and integrity
Two of the main aims of data engineers is to keep data pipelines fast and to manage increasingly large amounts of data. That’s probably not going to change - but the continued need for speed will inevitably introduce new risks and a certain level of instability in data architectures.
This is where the concept of a Delta Lake comes into play. It’s an open source storage layer that sits on top of your data lake that ensures the integrity and reliability of your by ensuring ACID compliance (atomicity, consistency, isolation, durability). In short, it means that data can be processed and moved at scale and speed without compromising on the integrity of the data, or the performance of the system itself.
Delta lakes won’t be right for every organisation - but for those that are handling huge amounts of data and using complex combinations of processing approaches and tools, it can add some much needed stability.
Emphasis on strategic, ethical and financial issues
It might not be completely accurate to say that speed and scale will remain the mainstays of data engineering: there are today important questions emerging that concern ethics, compliance, finance, and overall strategy.
In other words, it’s going to become more and more important for businesses to seriously engage with these issues. This will affect the way data engineers work because it may shape how they plan and build infrastructure, what they choose to optimise, and where they may spend more time and resources.
This means data engineers will have to be more strategic and commercially aware (as if they didn’t have to be already), and it’s likely that roles like Chief Data Officer will become more and more popular. For many data engineering teams it will be vital to have this person sitting in the C-suite fighting their corner and educating senior leadership teams about what data engineering can accomplish - and, of course, what it can’t.
Build and evolve your data capability with AND Digital.
Conclusion: A data engineering capability gives organisations agency over data
Data engineering is an exciting field, and one that’s going to be critical to business performance. For software developers it’s an opportunity to expand their skill set and evolve their career, while for businesses data engineering can bring a sense of agency to how data is managed and used.
That’s significant - the future of data requires action, care, and sensitivity. The big data gold rush needs to give way to something more sophisticated.