The Challenge
A flagship streaming platform has been launched as part of global company’s push into the direct-to-consumer digital space. Using Snowflake for mission-critical subscriber data ingestion and Airflow for orchestration, this media company was experiencing inconsistent performance, lengthy execution times, frequent SLA violations, and high run costs. This hindered business users from receiving timely insights to customer data changes, analyzing revenue performance, and performing subscriber analysis.
The Solution
Entrada collaborated closely with the client to deliver a multistep program, starting with PoC.
- We rewrote Snowflake SQL into Spark SQL and leveraged the Databricks Photon query engine.
- We Optimized their pipelines and tuned clusters with Delta Lake and Adaptive Query Execution.
- Using Databricks Workflows, we simplified scheduling, orchestrating, monitoring and alerting functions.
- Applied predicate pushdown and file pruning techniques to reduce the amount of data scanned, resulting in increased processing performance and consistency.
The Results
- Daily delta pipelines for ingestion and transformation were streamlined and accelerated
- Data quality and reliability was improved by ensuring SLA compliance through improved handling of data complexity and variability
- Increased business agility and efficiency by providing timely and accurate insights to business stakeholders.
- Reduced the operational overhead and costs of their data pipelines.
About Entrada
Entrada is a Databricks-focused consulting and implementation partner backed by Databricks Ventures. Entrada harnesses the power of Databricks to help customers accelerate their AI + data initiatives. Our expertise in AI/ML, Databricks, and analytics is centered around industry-centric solutions. Our mission is to simplify complex data + AI challenges and support end-to-end transformations, delivering future-ready solutions fast.