ETL vs. ELT pipelines: the differences and why they matter
20 October 2021 • 5 min read
Understanding the differences between ETL (extract, transform, load) and ELT (extract, load, transform) pipelines is important for anyone working on a data engineering or data infrastructure project. The shift in letters offers a very different set of capabilities and changes the way your data projects work in subtle yet significant ways.
But before we get into a comparison of ETL and ELT pipelines, let's first explore exactly what we mean when we talk about data pipelines.
What is a data pipeline?
A data pipeline is a series of steps, processes or actions, often automated, that take new data from a source to a destination, ready for analysis or reporting. Most organisations have important business information spread across a number of systems with different formats and structures, for example, a CRM, finance system, and a manufacturing or ERP solution. The right data pipeline can help you integrate all this data, in a standard format and central location, so you can provide key business analytics where they’re needed.
ETL (extract, transform, load) and ELT (extract, load, transform) are two of the most common types of pipelines. Let's now explore the key differences between them and why they matter.
What is an ETL pipeline?
ETL is the 'traditional' way of managing a data pipeline. As the acronym implies, it involves three main stages:
- Extract. Data is taken from the all the source systems and moved to a staging location, ready for transformation.
- Transform. Data analysts use programs, models and processes to convert the data into the right format for their intended use. This prepares the data for analysis, whether it’s standardising on a date format or identifying and removing duplicate records. It could also involve more complicated transformations, such as filtering rows, or identifying and linking related data from different datasets.
- Load. The transformed dataset is then moved into a location or database where it can be accessed as needed. The location is usually either a reporting application, or if data volumes are larger, a data warehouse or lake.
Typically with ETL, only the data required for the specific analysis is transferred to the destination system – not all of the transformed data. As a result, the destination is only a snapshot, and the whole pipeline will need to run again (extract, transform, load) any time you want an updated view.
ETL pipelines are still common today, and are usually implemented for business intelligence projects with smaller datasets with complex transformation needs or those that don’t need real-time updates.
What are ELT pipelines?
The cloud has transformed the way we manage data projects. Previously, organisations had to buy, run, and maintain expensive databases, disks, and processing power – it’s why only transformed data was loaded into the analytics system. Now, cloud gives organisations more cost-effective access to almost unlimited resources.
As a result, many data workloads today run on an ELT pipeline model: extract, load, transform. The key difference? Data is loaded into the destination (usually a cloud-based database or data warehouse) before it’s transformed.
In ETL, data has to be extensively structured and prepared, usually by data analysts with programming experience, before it’s ready to be loaded. However, with ELT, all of your source data is usually replicated straight into the data warehouse. This makes it available to query in real-time by almost anyone.
With the rise of no-code or low-code analytics tools, this includes business users with no previous experience of programming.
Which should you choose? ETL or ELT?
As with just about every technical decision, deciding between ETL or ELT depends on your project.
For example, where you have projects that primarily involve repetitive analysis and lack the need for real-time updates - so can be done through batch processing - ETL will suffice and do the job perfectly.
However, ELT offers a number of big benefits in the right context. This is because loading data into a data warehouse before it’s transformed reduces the number of copy or replication steps you need to manage. This reduces workload and risk. In a more traditional ETL pipeline, data is transferred at least twice: once for transformation, and then again when the transformed data is loaded to the destination.
By creating a pool of data that anyone in your business can access, business users can do more of the analysis themselves, often using - as already mentioned - low-code or no-code tools. It means you can more quickly deliver greater value from your data.
So, when used effectively, ELT can help organisations deliver deeper and more wide-ranging data analysis. For example, a data warehouse that supports high data volumes and cloud processing opens up the possibilities of using artificial intelligence.
Talk to us about how we can help you better leverage data. Learn more.
Upside: a practical example of the value of data
We recently worked with Upside, a financial services startup, to launch their highly innovative savings app. The app searches a user’s Open Banking data to identify transactions from Upside’s retail partners, who offer cashback on purchases. Cashback is transferred to the user’s savings pot.
As part of the project, we implemented an ELT pipeline. Upside extracts millions of rows of financial data every day, so we used Amazon Web Services to build a secure, scalable data warehouse to load them into. Choosing AWS also gave us access to a wide variety of data processing tools, helping us to quickly and easily analyse and transform the data as required.
As well as calculating cashback, the data warehouse powers benchmarking for Upside’s partners, helping them understand customer spend, frequency, and value against their competitors.
Read more about the Upside project here.
Choose the right type of pipeline for the job
Ultimately, it's not a case of either/or when it comes to comparing ETL and ELT pipelines - it's about selecting the right one for the task at hand. While ELT is decidedly more modern and powerful, that doesn't mean there's no use for ETL - using ELT pipelines in a scenario that doesn't require it will only add additional complexity that you don't need.
So, consider how and why you are using data. Pipelines and infrastructure will follow from your strategy and goals.
Talk to us about building faster, more effective data pipelines to power insight. Get in touch.