The Challenge

A leading Sales Automation software platform was facing challenges in maximizing the performance of its existing Databricks pipelines. The client’s goal was to improve data processing efficiency and evaluate the potential for migrating its EMR streaming jobs. This is where Entrada stepped in to help optimize data infrastructure and support growth objectives.

The Solution

Entrada collaborated closely with the client to deliver a comprehensive assessment of their current data infrastructure and provide actionable recommendations for enhancing performance and return on investment (ROI). The engagement focused on the following key areas:

  • Performance Optimization of Databricks Pipelines: Entrada utilized the latest capabilities of Databricks, including Liquid Clustering, Z-Ordering, and advanced partitioning techniques, to significantly improve the performance of lakehouse pipelines. The enhancements were particularly focused on jobs responsible for managing over 300 data pipelines, ensuring a more efficient and responsive data processing environment.
  • Migration to Unity Catalog: As a value-added service, Entrada migrated the client’s data pipelines to Unity Catalog. This strategic move not only improved data governance and security, but also laid the foundation for future AI and machine learning initiatives, enhancing the client’s ability to innovate and scale.
  • Optimization for DLT Compliance: To further streamline operations, Entrada optimized pipelines to be compliant with Delta Live Tables (DLT). This adjustment reduced operational complexities and ensured a more robust and reliable data processing framework.

The Results

Through strategic collaboration with Entrada, the client successfully enhanced the performance and cost-efficiency of its data infrastructure. The optimizations not only delivered immediate value in terms of cost savings and performance improvements, but also positioned the client for future growth, enabling the company to better leverage its data assets for innovation and competitive advantage:

  • 72% Cost Savings: The largest data pipeline saw a 72% reduction in run costs, translating into significant financial savings.
  • 82% Reduction in Initial Load Time: The initial load time for data processing was reduced by more than 82%, accelerating data availability and enhancing operational efficiency.
  • 50% Reduction in Run Time: The runtime for the largest tables was cut by over 50%, leading to faster data processing and improved overall system performance.

About Entrada
Entrada is a Databricks-focused consulting and implementation partner backed by Databricks Ventures. Entrada harnesses the power of Databricks to help customers accelerate their AI + data initiatives. Our expertise in AI/ML, Databricks, and analytics is centered around industry-centric solutions. Our mission is to simplify complex data + AI challenges and support end-to-end transformations, delivering future-ready solutions fast.

Other blog posts
Abstract gear and network visualization representing the Databricks FinOps cost control architecture covered in the article.

From Cost Visibility to Action: Scaling FinOps Intelligence with Databricks System Tables and Genie

This post walks through the architecture Entrada built around that observation, the Serverless Cost Control Accelerator, and, more importantly, the design principles behind it. Regardless os whether we’re a platform engineer, SRE, or FinOps lead trying to decide where to invest, the principles matter more than the product.

Read more
Abstract healthcare data architecture showing a secure medical research platform for imaging, clinical notes, and lab data on Databricks

Building Secure, AI-Ready Medical Research Platforms on Databricks

Research organizations need faster, more reliable ways to prepare sensitive data for analysis without loosening their grip on governance and privacy. Across the medical research platforms we’ve built on Databricks, the same patterns keep proving their worth: cleaner ingestion, standardized de-identification, simpler access to research-ready datasets, and a foundation that holds up when analytics and AI ambitions grow. Here’s what we’ve learned about designing these environments well.

Read more
Post cover "Lakebase: The Death of the Siloed Application Database" by William Guzmán Daugherty Data Engineer at Entrada

Lakebase: The Death of the Siloed Application Database

Every enterprise manages two separate, expensive database systems: OLTP for real-time transactions and OLAP for analytics. The pipeline connecting them is the most fragile thing in the entire stack. Databricks’ Lakebase makes that pipeline optional, offering a strategic opportunity to collapse two stacks into one and finally deliver the near-real-time data that critical business applications need.

Read more
Show all posts
GET IN TOUCH

Millions of users worldwide trust Entrada

For all inquiries including new business or to hear more about our services, please get in touch. We’d love to help you maximize your Databricks experience.