Databricks on Wednesday announced the general availability of Delta Live Tables, an ETL (Extract, Transform, Load) framework that offers a simple declarative approach to build data pipelines and manage data infrastructure at scale.
ETL is concerned with moving and transforming data from multiple sources and loading it into targets such as a Hadoop cluster or a visualization platform. In this way, ETL tools break down inherent data silos to make it easier for data scientists and business analysts to access and analyze data.
ETL simplified
According to Databricks, turning initial SQL queries into production ETL pipelines often requires a significant amount of complicated operational work. Moreover, reliable data pipelines require constant operational rigor to keep them up and running.
In practical terms, even smaller scale data pipelines will take up the bulk of a data practitioner’s time to manage and ensure that data flows don’t inadvertently get broken.
Delta Live Tables is the first ETL framework to solve this problem by combining both modern engineering practices and automatic management of infrastructure, says Databricks.
Data engineers simply need to describe the outcomes of data transformation, and Delta Live Tables will work out the dependencies and automate away the bulk of the manual complexity. Crucially, the data engineers can treat their data as code, applying modern software engineering best practices such as testing, error-handling, monitoring, and documentation to deploy data pipelines at scale.
By automating the most time consuming aspects of data engineering, data engineers can better direct their efforts on delivering data instead of operating and maintaining the pipelines.
Delta Live Tables supports both Python and SQL and works with both streaming and batch workloads.
Databricks says Delta Live Table is already deployed for production use at leading firms such as JLL, Shell, Jumbo, Bread Finance, and ADP.
“The power of [Delta Live Table] comes from something no one else can do – combine modern software engineering practices and automatically manage infrastructure. It’s game-changing technology that will allow data engineers and analysts to be more productive than ever,” said Ali Ghodsi, the chief executive officer and co-founder at Databricks.
“It also broadens Databricks’ reach; [Delta Live Table] supports any type of data workload with a single API, eliminating the need for advanced data engineering skills,” he said.
Image credit: iStockphoto/tadamichi