As a Principal Data Architect at Entrada, I often see teams struggle with the “last mile” of data delivery. While Delta Sharing is excellent for bulk data, what if you need granular control over the output format? What if your consumer is an AI Agent that needs a JSON response, or a mobile app requiring specific business logic?
The solution lies in a “Low-Risk” philosophy: decoupling the compute layer from the consumption layer. By wrapping data logic in a Custom API using Databricks Model Serving, we create strict contracts, ensure security, and prepare our architecture for the next generation of Compound AI Systems.
This guide explores how to treat arbitrary Python code as a “model,” turning your Lakehouse into an efficient engine of data distribution.
The Foundation: Why Model Serving is Your New Data Backend
Many developers associate “Model Serving” solely with Machine Learning models like Scikit-Learn or PyTorch. However, the pragmatic architect knows that Databricks Model Serving is not exclusive to ML. You can pass almost any arbitrary Python code to a Serving endpoint.
Decoupling Compute from Consumption
Direct database connections (JDBC/ODBC) often become bottlenecks and lack flexibility. By shifting to an API-first approach, we achieve true decoupling:
- Universal Access: Any system, web apps, CI/CD pipelines, or 3rd party tools—can consume a REST API, unlike the limitations of JDBC drivers.
- Stricter Contracts: Using mlflow.pyfunc enforces a schema (input/output) as a contract. If the schema breaks, the deployment fails, protecting downstream users from silent failures.
- Safe Migrations: You can serve “v1” and “v2” simultaneously using Model Serving endpoints, allowing for zero-downtime migrations.
Unity Catalog as the Governance Layer for APIs
Adoption depends on trust. Opening APIs to the broader organization requires rigorous governance.
- Service Principals: For production APIs, we move away from Personal Access Tokens (PATs) and utilize Service Principals (machine identities) for security.
- Lineage Tracking: Even though we are running Python code, Unity Catalog tracks which tables are being accessed by the endpoint, maintaining full lineage visibility.
Technical Implementation: Mastering Custom PyFunc
To implement this, we use the “Wrapper” Pattern: encapsulating SQL execution or Pandas transformations inside a Python class that MLflow treats as a “model”.
Serving Logic, Not Just Models (The Wrapper Pattern)
The core of this implementation is a class that inherits from mlflow.pyfunc.PythonModel. You must override the predict method, which determines how to handle inputs and query the Unity Catalog tables.
Instead of exposing raw tables, we encapsulate the logic. Here is a pragmatic implementation of a CustomersAPI class that handles filtering, selection, and even pagination “under the hood”:
Python:
import mlflow
from databricks import sql
import pandas as pd
class CustomersAPI(mlflow.pyfunc.PythonModel):
def predict(self, context, model_input):
# Normalize input
req = model_input
select_cols = _parse_select(req.get("select_csv"))
filters = _parse_filters(req.get("filters_json"))
limit = req.get("limit", 50)
cursor = req.get("cursor")
# Build the SQL statement dynamically based on inputs
stmt, params, internal_select, public_select = _build_sql(
select_cols, filters, limit, cursor
)
# Execute query against Unity Catalog using Databricks SQL Connector
with sql.connect(**_connect_kwargs()) as conn, conn.cursor() as cur:
cur.execute(stmt, params)
rows = cur.fetchall()
cols = [d[0] for d in cur.description] if cur.description else []
# Map rows to dictionary
items = [{cols[i]: rows[r][i] for i in range(len(cols))} for r in range(len(rows))]
# Pagination Logic (Cursor handling)
page_size = max(1, min(int(limit or 50), MAX_LIMIT))
has_more = len(items) > page_size
next_cursor = None
if has_more:
last = items[page_size - 1]
# Serialize keyset values for next cursor
ks_vals = [_jsonable(last[k]) for (k, _) in KEYSET]
next_cursor = _b64({"after": ks_vals})
items = items[:page_size] # Trim lookahead
# Return structured JSON response
return pd.DataFrame([{
"count": len(items),
"items": items,
"next_cursor": next_cursor,
"has_more": has_more
}])
This code ensures that every data object is JSON serializable and manages connection logic securely.
Optimizing “Under the Hood”: Cold Starts & Dependencies
To ensure reliability in production, we must look beyond the code.
1. Dependency Management We must leverage conda.yaml or pip_requirements to ensure the serving container has the exact libraries needed. When logging the model to Unity Catalog, use pip_requirements to lock specific versions:
Python:
with mlflow.start_run(run_name="customers_api_from_code"):
mlflow.pyfunc.log_model(
name="customers_api_model",
python_model=CustomersAPI(), # Your class instance
registered_model_name="main.default.customers_api",
input_example=input_example,
pip_requirements=[
"pandas>=2.1",
"mlflow>=2.8.0",
"pydantic>=2",
"databricks-sql-connector[pyarrow]>=3.0.0",
"databricks-sdk>=0.33.0"
]
)
2. Deployment Strategy: Move to Code While you can create endpoints via the UI, pragmatic architects use Databricks Asset Bundles (DABs). DABs provide full support for defining model serving endpoints as code, ensuring repeatable deployments.
- Resource: Check out the Databricks Asset Bundles Resource Guide for the YAML configuration.
3. Performance Address the “Cold Start” problem by choosing between Provisioned Throughput and Serverless based on your specific traffic patterns.
AI Integration: Powering Compound Systems
One of the hottest topics in the MVP program is Compound AI Systems. How does a custom data API fit into an architecture dominated by LLMs?
The API as a “Tool” for AI Agents
A Custom Data API acts as a bridge between raw data and an AI Agent.
- Tool Calling: The API becomes a specific “Tool” that an Agent (like OpenAI or a custom RAG agent) can autonomously call to retrieve real-time facts.
- Structured Output: Unlike Genie which might return text or tables, an API returns JSON – the “native language” of AI Agents. This allows an LLM to parse the response deterministically without hallucinating structure.
Here is an example of the clean payload an Agent receives:
JSON:
{
"count": 5,
"has_more": true,
"items": [
{
"customer_id": 1999978,
"name": "dolore aliquip cillum",
"email": "eiusmod@fugiat.co.uk"
},
{
"customer_id": 1999966,
"name": "aliquip esse",
"email": "ipsum.fugiat@pariatur.com"
}
],
"next_cursor": "eyJ..."
}
“Trusted Assets” in Real-Time
For developers building RAG (Retrieval-Augmented Generation) applications, relying solely on vector embeddings is often insufficient.
- Freshness: Vector stores are always slightly stale. An API query hits the live Delta Table.
- Deterministic Results: For financial or operational metrics, you need 100% accuracy. You cannot rely on the probabilistic nature of a vector search.
- Single Source of Truth: By hardcoding complex KPIs inside the API logic, we ensure the LLM doesn’t have to “guess” the math, reducing hallucinations.
Future Outlook: The Rise of the “Data API Product”
As Databricks releases features like Genie API and Function Serving, where do we draw the line?.
Custom PyFunc vs. Genie API
A pragmatic data strategy likely involves a hybrid approach:
- Genie API: Best for ad-hoc, natural language questions where flexibility is key.
- Custom API: Best for high-volume, low-latency, deterministic workflows—such as powering a customer-facing mobile app or a specific Agent tool.
Conclusion
As Databricks solidifies its position as a unified data hub, the demand to serve data in versatile ways increases. By building Custom APIs using Model Serving, you provide a secure, governed, and highly scalable consumption layer. You aren’t just serving tables; you are serving Data Products.
Race to the Lakehouse
AI + Data Maturity Assessment
Unity Catalog
Rapid GenAI
Modern Data Connectivity
Gatehouse Security
Health Check
Sample Use Case Library