Databricks Model Serving Guide: Building Custom APIs

December 10, 2025

Mastering Databricks Model Serving: Building Custom Data APIs & Unity Catalog

We are witnessing a paradigm shift in our industry: Data Engineering is adopting more and more Software Engineering principles, particularly when it comes to backend development. The old standard of “dumping” data into tables via ETL and expecting downstream apps to consume it via JDBC is becoming obsolete for modern use cases.

Author

Skyler Myers

As a Principal Data Architect at Entrada, I often see teams struggle with the “last mile” of data delivery. While Delta Sharing is excellent for bulk data, what if you need granular control over the output format? What if your consumer is an AI Agent that needs a JSON response, or a mobile app requiring specific business logic?

The solution lies in a “Low-Risk” philosophy: decoupling the compute layer from the consumption layer. By wrapping data logic in a Custom API using Databricks Model Serving, we create strict contracts, ensure security, and prepare our architecture for the next generation of Compound AI Systems.

This guide explores how to treat arbitrary Python code as a “model,” turning your Lakehouse into an efficient engine of data distribution.

The Foundation: Why Model Serving is Your New Data Backend

Many developers associate “Model Serving” solely with Machine Learning models like Scikit-Learn or PyTorch. However, the pragmatic architect knows that Databricks Model Serving is not exclusive to ML. You can pass almost any arbitrary Python code to a Serving endpoint.

Decoupling Compute from Consumption

Direct database connections (JDBC/ODBC) often become bottlenecks and lack flexibility. By shifting to an API-first approach, we achieve true decoupling:

Universal Access: Any system, web apps, CI/CD pipelines, or 3rd party tools—can consume a REST API, unlike the limitations of JDBC drivers.

Stricter Contracts: Using mlflow.pyfunc enforces a schema (input/output) as a contract. If the schema breaks, the deployment fails, protecting downstream users from silent failures.

Safe Migrations: You can serve “v1” and “v2” simultaneously using Model Serving endpoints, allowing for zero-downtime migrations.

Unity Catalog as the Governance Layer for APIs

Adoption depends on trust. Opening APIs to the broader organization requires rigorous governance.

Service Principals: For production APIs, we move away from Personal Access Tokens (PATs) and utilize Service Principals (machine identities) for security.

Lineage Tracking: Even though we are running Python code, Unity Catalog tracks which tables are being accessed by the endpoint, maintaining full lineage visibility.

Technical Implementation: Mastering Custom PyFunc

To implement this, we use the “Wrapper” Pattern: encapsulating SQL execution or Pandas transformations inside a Python class that MLflow treats as a “model”.

Serving Logic, Not Just Models (The Wrapper Pattern)

The core of this implementation is a class that inherits from mlflow.pyfunc.PythonModel. You must override the predict method, which determines how to handle inputs and query the Unity Catalog tables.

Instead of exposing raw tables, we encapsulate the logic. Here is a pragmatic implementation of a CustomersAPI class that handles filtering, selection, and even pagination “under the hood”:

Python:
import mlflow
from databricks import sql
import pandas as pd

class CustomersAPI(mlflow.pyfunc.PythonModel):
    def predict(self, context, model_input):
        # Normalize input
        req = model_input
        select_cols = _parse_select(req.get("select_csv"))
        filters = _parse_filters(req.get("filters_json"))
        limit = req.get("limit", 50)
        cursor = req.get("cursor")

        # Build the SQL statement dynamically based on inputs
        stmt, params, internal_select, public_select = _build_sql(
            select_cols, filters, limit, cursor
        )

        # Execute query against Unity Catalog using Databricks SQL Connector
        with sql.connect(**_connect_kwargs()) as conn, conn.cursor() as cur:
            cur.execute(stmt, params)
            rows = cur.fetchall()
            cols = [d[0] for d in cur.description] if cur.description else []
            
            # Map rows to dictionary
            items = [{cols[i]: rows[r][i] for i in range(len(cols))} for r in range(len(rows))]

        # Pagination Logic (Cursor handling)
        page_size = max(1, min(int(limit or 50), MAX_LIMIT))
        has_more = len(items) > page_size
        next_cursor = None
        
        if has_more:
            last = items[page_size - 1]
            # Serialize keyset values for next cursor
            ks_vals = [_jsonable(last[k]) for (k, _) in KEYSET]
            next_cursor = _b64({"after": ks_vals})
            items = items[:page_size] # Trim lookahead

        # Return structured JSON response
        return pd.DataFrame([{
            "count": len(items),
            "items": items,
            "next_cursor": next_cursor,
            "has_more": has_more
        }])

This code ensures that every data object is JSON serializable and manages connection logic securely.

Optimizing “Under the Hood”: Cold Starts & Dependencies

To ensure reliability in production, we must look beyond the code.

1. Dependency Management We must leverage conda.yaml or pip_requirements to ensure the serving container has the exact libraries needed. When logging the model to Unity Catalog, use pip_requirements to lock specific versions:

Python:
with mlflow.start_run(run_name="customers_api_from_code"):
    mlflow.pyfunc.log_model(
        name="customers_api_model",
        python_model=CustomersAPI(), # Your class instance
        registered_model_name="main.default.customers_api",
        input_example=input_example,
        pip_requirements=[
            "pandas>=2.1",
            "mlflow>=2.8.0",
            "pydantic>=2",
            "databricks-sql-connector[pyarrow]>=3.0.0",
            "databricks-sdk>=0.33.0"
        ]
    )

2. Deployment Strategy: Move to Code While you can create endpoints via the UI, pragmatic architects use Databricks Asset Bundles (DABs). DABs provide full support for defining model serving endpoints as code, ensuring repeatable deployments.

Resource: Check out the Databricks Asset Bundles Resource Guide for the YAML configuration.

3. Performance Address the “Cold Start” problem by choosing between Provisioned Throughput and Serverless based on your specific traffic patterns.

AI Integration: Powering Compound Systems

One of the hottest topics in the MVP program is Compound AI Systems. How does a custom data API fit into an architecture dominated by LLMs?

The API as a “Tool” for AI Agents

A Custom Data API acts as a bridge between raw data and an AI Agent.

Tool Calling: The API becomes a specific “Tool” that an Agent (like OpenAI or a custom RAG agent) can autonomously call to retrieve real-time facts.

Structured Output: Unlike Genie which might return text or tables, an API returns JSON – the “native language” of AI Agents. This allows an LLM to parse the response deterministically without hallucinating structure.

Here is an example of the clean payload an Agent receives:

JSON:
{
  "count": 5,
  "has_more": true,
  "items": [
    {
      "customer_id": 1999978,
      "name": "dolore aliquip cillum",
      "email": "eiusmod@fugiat.co.uk"
    },
    {
      "customer_id": 1999966,
      "name": "aliquip esse",
      "email": "ipsum.fugiat@pariatur.com"
    }
  ],
  "next_cursor": "eyJ..."
}

“Trusted Assets” in Real-Time

For developers building RAG (Retrieval-Augmented Generation) applications, relying solely on vector embeddings is often insufficient.

Freshness: Vector stores are always slightly stale. An API query hits the live Delta Table.

Deterministic Results: For financial or operational metrics, you need 100% accuracy. You cannot rely on the probabilistic nature of a vector search.

Single Source of Truth: By hardcoding complex KPIs inside the API logic, we ensure the LLM doesn’t have to “guess” the math, reducing hallucinations.

Future Outlook: The Rise of the “Data API Product”

As Databricks releases features like Genie API and Function Serving, where do we draw the line?.

Custom PyFunc vs. Genie API

A pragmatic data strategy likely involves a hybrid approach:

Genie API: Best for ad-hoc, natural language questions where flexibility is key.

Custom API: Best for high-volume, low-latency, deterministic workflows—such as powering a customer-facing mobile app or a specific Agent tool.

Conclusion

As Databricks solidifies its position as a unified data hub, the demand to serve data in versatile ways increases. By building Custom APIs using Model Serving, you provide a secure, governed, and highly scalable consumption layer. You aren’t just serving tables; you are serving Data Products.

Entrada

Mastering Databricks Model Serving: Building Custom Data APIs & Unity Catalog

The Foundation: Why Model Serving is Your New Data Backend

Decoupling Compute from Consumption

Unity Catalog as the Governance Layer for APIs

Technical Implementation: Mastering Custom PyFunc

Serving Logic, Not Just Models (The Wrapper Pattern)

Optimizing “Under the Hood”: Cold Starts & Dependencies

AI Integration: Powering Compound Systems

The API as a “Tool” for AI Agents

“Trusted Assets” in Real-Time

Future Outlook: The Rise of the “Data API Product”

Custom PyFunc vs. Genie API

Conclusion

GET IN TOUCH

Millions of users worldwide trust Entrada

Services

Industries

Solutions

Resources

About

Mastering Databricks Model Serving: Building Custom Data APIs & Unity Catalog

The Foundation: Why Model Serving is Your New Data Backend

Decoupling Compute from Consumption

Unity Catalog as the Governance Layer for APIs

Technical Implementation: Mastering Custom PyFunc

Serving Logic, Not Just Models (The Wrapper Pattern)

Optimizing “Under the Hood”: Cold Starts & Dependencies

AI Integration: Powering Compound Systems

The API as a “Tool” for AI Agents

“Trusted Assets” in Real-Time

Future Outlook: The Rise of the “Data API Product”

Custom PyFunc vs. Genie API

Conclusion

From DAX Filters to Data Contracts: Migrating Power BI Security to Unity Catalog

Containerizing the Lakehouse: The Role of Kubernetes in Modern Data Platforms

Advanced Unity Catalog Strategy: Multi-Cloud Federation

GET IN TOUCH

Millions of users worldwide trust Entrada