Databricks

Posted on April 5, 2026June 13, 2026 | by rajeshkumar

One important correction first: a Serverless Starter Warehouse alone cannot do the full AI/ML lifecycle. A notebook attached to a SQL warehouse can run only SQL and Markdown, not Python. So the right end-to-end design is:

Serverless Starter Warehouse for SQL exploration and validation
Serverless notebook compute for Python, MLflow, training, and experiments
Model Serving for deployment
AI Playground / Agent tooling for agent prototyping

That split matches the current Databricks product model. (Databricks Documentation)

flowchart LR
    A[Starter SQL Warehouse<br/>SQL exploration] --> B[Unity Catalog schema]
    B --> C[Serverless Notebook<br/>load built-in sample data]
    C --> D[MLflow experiment<br/>train + compare runs]
    D --> E[Register model<br/>Unity Catalog]
    E --> F[Model Serving<br/>deploy endpoint]
    F --> G[Test predictions<br/>UI / SDK]
    C --> H[AI Playground<br/>LLM + tool agent prototype]
    H --> I[Export / code-first agent path]

What this lab will show

By the end, a student will have done all of this in one workspace:

Used Starter Warehouse to query data.
Created a schema in Unity Catalog.
Loaded Databricks-provided sample data into managed tables.
Written Python in a serverless notebook.
Trained a model and tracked runs in MLflow experiments.
Registered the best model in Unity Catalog Model Registry.
Deployed it with Mosaic AI Model Serving.
Tested the endpoint.
Prototyped an AI agent in AI Playground.
Run a code-first agent demo notebook that needs no data setup. Databricks provides sample datasets in both the samples catalog and /databricks-datasets, and it also provides a standalone agent demo notebook that is ready to run with no setup or data required. (Databricks Documentation)

Before you start

This tutorial assumes:

your workspace is Unity Catalog-enabled
Serverless Starter Warehouse is already working
serverless notebooks are available
Serving and Playground are visible in the left menu

Serverless notebooks require Unity Catalog, and in most Unity Catalog-enabled workspaces they are available without extra per-user setup. AI Playground requires a workspace/region that supports Foundation Models. Model Serving and Foundation Model features are region-dependent. (Databricks Documentation)

If Playground or Serving is missing, stop there and verify workspace feature availability before continuing. (Databricks Documentation)

Part 1 — Use the Starter Warehouse first

Step 1. Open SQL Editor

In Databricks:

Click SQL Editor
Attach Serverless Starter Warehouse
Run this warm-up query

SELECT * 
FROM samples.tpch.customer
LIMIT 10;

Databricks exposes built-in sample data through the samples catalog, so this is the fastest way to prove your warehouse is working before you build anything. (Databricks Documentation)

Step 2. Create a schema for the lab

Use this SQL in the same editor:

CREATE SCHEMA IF NOT EXISTS main.ai_ml_starter;

If main is not writable in your workspace, use any catalog where you have USE CATALOG, USE SCHEMA, CREATE TABLE, and CREATE MODEL. Databricks’ ML quickstart explicitly requires those Unity Catalog permissions for the schema where you will write tables and register models. (Databricks Documentation)

At this point you have:

warehouse working
schema ready
no custom data yet

Part 2 — Create the notebook and ingest built-in sample data

Step 3. Create a new notebook

Create a notebook called:

01_ml_lifecycle_serverless

Then attach it to Serverless from the compute dropdown. Databricks documents that new notebooks can default to serverless on execution when no other compute is selected. (Databricks Documentation)

Step 4. Install the Python libraries

In the first cell, run:

%pip install -U mlflow scikit-learn pandas matplotlib hyperopt
dbutils.library.restartPython()

Serverless notebooks support dependency management through the notebook/serverless environment, so installing these packages in the notebook is the simplest fresh-workspace path. (Databricks Documentation)

Step 5. Add the setup cell

import mlflow
import pandas as pd
import sklearn.metrics
import sklearn.model_selection
import sklearn.ensemble
import matplotlib.pyplot as plt

from hyperopt import fmin, tpe, hp, SparkTrials, STATUS_OK
from hyperopt.pyll import scope

mlflow.set_registry_uri("databricks-uc")

CATALOG_NAME = "main"
SCHEMA_NAME = "ai_ml_starter"
MODEL_NAME = f"{CATALOG_NAME}.{SCHEMA_NAME}.wine_quality_model"

spark.sql(f"CREATE SCHEMA IF NOT EXISTS {CATALOG_NAME}.{SCHEMA_NAME}")

print("Registry URI set to Unity Catalog")
print("Model name:", MODEL_NAME)

Databricks’ current ML quickstart uses Unity Catalog as the model registry and follows this same pattern: set the registry URI, choose catalog/schema, then train and register the model there. (Databricks Documentation)

Step 6. Load the built-in wine dataset and save it as Unity Catalog tables

Run this cell:

white_wine = spark.read.csv(
    "/databricks-datasets/wine-quality/winequality-white.csv",
    sep=";",
    header=True,
    inferSchema=True
)

red_wine = spark.read.csv(
    "/databricks-datasets/wine-quality/winequality-red.csv",
    sep=";",
    header=True,
    inferSchema=True
)

for c in white_wine.columns:
    white_wine = white_wine.withColumnRenamed(c, c.replace(" ", "_"))

for c in red_wine.columns:
    red_wine = red_wine.withColumnRenamed(c, c.replace(" ", "_"))

white_wine.write.mode("overwrite").saveAsTable(f"{CATALOG_NAME}.{SCHEMA_NAME}.white_wine")
red_wine.write.mode("overwrite").saveAsTable(f"{CATALOG_NAME}.{SCHEMA_NAME}.red_wine")

print("Tables created:")
print(f"{CATALOG_NAME}.{SCHEMA_NAME}.white_wine")
print(f"{CATALOG_NAME}.{SCHEMA_NAME}.red_wine")

This dataset is already available in Databricks under /databricks-datasets, and Databricks’ official ML getting-started tutorial uses these exact wine-quality files for a fresh model-building workflow. (Databricks Documentation)

Step 7. Quick validation in notebook

Run a SQL cell:

SELECT quality, COUNT(*) AS cnt
FROM main.ai_ml_starter.red_wine
GROUP BY quality
ORDER BY quality;

That confirms:

your notebook can read the tables
the tables landed in Unity Catalog
the sample data is ready for training

Part 3 — Go back to the Starter Warehouse for SQL exploration

Step 8. Query the same tables from SQL Editor

Open SQL Editor again on the Starter Warehouse and run:

SELECT COUNT(*) AS rows_red
FROM main.ai_ml_starter.red_wine;

SELECT COUNT(*) AS rows_white
FROM main.ai_ml_starter.white_wine;

SELECT quality, COUNT(*) AS cnt
FROM main.ai_ml_starter.red_wine
GROUP BY quality
ORDER BY quality;

This is the point students clearly see the platform concept:

notebook wrote governed tables
warehouse can query them immediately
same governed data supports both engineering and analytics

That is one of the core Databricks mental models: workspace assets and SQL assets sit on top of the same governed Unity Catalog objects. (Databricks Documentation)

Part 4 — Train the first model in the notebook

Step 9. Prepare the training data

Back in the notebook, run:

white_pdf = spark.read.table(f"{CATALOG_NAME}.{SCHEMA_NAME}.white_wine").toPandas()
red_pdf = spark.read.table(f"{CATALOG_NAME}.{SCHEMA_NAME}.red_wine").toPandas()

white_pdf["is_red"] = 0.0
red_pdf["is_red"] = 1.0

data_df = pd.concat([white_pdf, red_pdf], axis=0)

# High quality = quality >= 7
labels = data_df["quality"].astype("int") >= 7
features = data_df.drop(["quality"], axis=1)

X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(
    features,
    labels,
    test_size=0.2,
    random_state=1
)

print("Train shape:", X_train.shape)
print("Test shape :", X_test.shape)

This follows the same official Databricks starter flow: combine red and white wine, create a boolean is_red, convert quality to a classification label, and split into train/test. (Databricks Documentation)

Step 10. Train a baseline model and log it with MLflow

mlflow.autolog()

with mlflow.start_run(run_name="gradient_boost_baseline"):
    model = sklearn.ensemble.GradientBoostingClassifier(random_state=0)
    model.fit(X_train, y_train)

    predicted_probs = model.predict_proba(X_test)
    roc_auc = sklearn.metrics.roc_auc_score(y_test, predicted_probs[:, 1])

    roc_curve = sklearn.metrics.RocCurveDisplay.from_estimator(model, X_test, y_test)
    roc_curve.figure_.savefig("/tmp/roc_curve.png")

    mlflow.log_metric("test_auc", roc_auc)
    mlflow.log_artifact("/tmp/roc_curve.png")

    print("Baseline test AUC:", roc_auc)

This is the cleanest first model because it is simple, fast, and aligned with Databricks’ official “build your first ML model” tutorial. Databricks also documents that MLflow experiments organize training runs, and if no active experiment is set in a notebook, Databricks automatically creates a notebook experiment for you. (Databricks Documentation)

Step 11. Tell the student where to look

In the notebook, click the Experiment icon on the right or the run link in the output.

The student should look for:

parameters
metrics
artifacts
the logged model

That is the first “experimenting with code” moment in the tutorial. Databricks’ MLflow UI is designed exactly for this compare-and-review workflow. (Databricks Documentation)

Part 5 — Tune the model and compare runs

Step 12. Run a light hyperparameter search

Use a smaller search than the full docs example so the lab stays practical on a fresh workspace.

search_space = {
    "n_estimators": scope.int(hp.quniform("n_estimators", 20, 300, 1)),
    "learning_rate": hp.loguniform("learning_rate", -3, 0),
    "max_depth": scope.int(hp.quniform("max_depth", 2, 6, 1)),
}

def train_model(params):
    mlflow.autolog()
    with mlflow.start_run(nested=True):
        model_hp = sklearn.ensemble.GradientBoostingClassifier(
            random_state=0,
            **params
        )
        model_hp.fit(X_train, y_train)

        predicted_probs = model_hp.predict_proba(X_test)
        auc = sklearn.metrics.roc_auc_score(y_test, predicted_probs[:, 1])

        mlflow.log_metric("test_auc", auc)
        return {"loss": -auc, "status": STATUS_OK}

spark_trials = SparkTrials(parallelism=4)

with mlflow.start_run(run_name="gb_hyperopt"):
    best_params = fmin(
        fn=train_model,
        space=search_space,
        algo=tpe.suggest,
        max_evals=8,
        trials=spark_trials
    )

print(best_params)

Databricks’ official quickstart uses Hyperopt with MLflow integration so multiple runs are logged automatically, and it uses SparkTrials for parallel sweeps. (Databricks Documentation)

Step 13. Pick the best run and register the model

best_run = mlflow.search_runs(
    order_by=["metrics.test_auc DESC", "start_time DESC"],
    max_results=1
).iloc[0]

print("Best run_id:", best_run.run_id)
print("Best test_auc:", best_run["metrics.test_auc"])

model_uri = f"runs:/{best_run.run_id}/model"
registered_model = mlflow.register_model(model_uri, MODEL_NAME)

print("Registered model:", registered_model.name)
print("Version:", registered_model.version)

This is the exact lifecycle step that turns an experiment output into a governed, reusable model artifact in Unity Catalog. Databricks’ quickstart uses this same pattern with mlflow.register_model(...). (Databricks Documentation)

At this point the student has already seen:

code
experiments
comparison
registration

That is the classic ML lifecycle in Databricks.

Part 6 — Serve the model

Step 14. Create the serving endpoint in the UI

Now move to the Serving page:

Click Serving
Click Create serving endpoint
Name it: wine-quality-endpoint
In Served entities, choose your Unity Catalog model
main.ai_ml_starter.wine_quality_model
Pick the latest version
Save

Databricks Model Serving is the managed serverless layer for deploying custom ML models and exposing them as REST endpoints. (Databricks Documentation)

Step 15. Test it from the Serving UI

Once the endpoint is ready, click Query endpoint and use this payload:

{
  "dataframe_records": [
    {
      "fixed_acidity": 7.4,
      "volatile_acidity": 0.70,
      "citric_acid": 0.00,
      "residual_sugar": 1.9,
      "chlorides": 0.076,
      "free_sulfur_dioxide": 11.0,
      "total_sulfur_dioxide": 34.0,
      "density": 0.9978,
      "pH": 3.51,
      "sulphates": 0.56,
      "alcohol": 9.4,
      "is_red": 1.0
    }
  ]
}

Databricks documents that custom model endpoints accept dataframe_split or dataframe_records, and the response comes back under the predictions key. The Serving UI is the simplest test path. (Databricks Documentation)

Step 16. Optional: query it from code later

If you want the student to see programmatic scoring later, Databricks supports:

Serving UI
REST API
MLflow Deployments SDK
SQL via ai_query for supported cases

Those are the official supported query paths for served models. (Databricks Documentation)

Part 7 — Show the LLM / agent side of Databricks

This is a separate capability track from the sklearn model above. Do not force them into one object. Teach them as two platform capabilities:

classic ML model lifecycle
GenAI / agent lifecycle

That is the honest and clean way to teach Databricks today. (Databricks Documentation)

Option A — Fastest path: AI Playground

Step 17. Open Playground

Click Playground
Choose a hosted model
Ask a simple prompt:
- “Explain in simple terms what a classifier does.”
Add a second model to compare responses side by side

Databricks AI Playground is the low-code place to test prompts, compare models, and prototype agents. (Databricks Documentation)

Step 18. Prototype a tool-calling agent

In Playground:

Choose a Tools enabled model
Click Tools
Add system.ai.python_exec
Ask something like:
“Use Python to calculate the average of 9.4, 9.8, 10.0, and 10.2.”

Databricks documents system.ai.python_exec as a built-in tool students can use when prototyping tool-calling agents. (Databricks Documentation)

Step 19. Export the agent to code

In Playground:

Click Get code
Choose Create agent notebook

Databricks says this creates a notebook that defines the agent and deploys it to a serving endpoint. It is the fastest way to show students how a no-code prototype becomes code. Databricks also notes that this exported path currently uses a legacy Model Serving workflow, while Databricks now recommends Databricks Apps for long-term agent authoring. (Databricks Documentation)

Option B — Best code-first path for class: official ready-run agent notebook

This is the path I would actually use in a classroom because it is more deterministic.

Step 20. Import the official agent demo notebook

Use the official Databricks tutorial notebook:

Tutorial: Build, evaluate, and deploy a retrieval agent

Databricks says this notebook:

is standalone
uses a sample document corpus
is ready to run with no setup or data required

That makes it perfect for your “fresh environment, no sample data creation” requirement. (Databricks Documentation)

Step 21. Run the notebook top to bottom

That notebook teaches the student to:

create an agent
define tools
use a sample corpus
evaluate quality
deploy the agent

It is officially maintained by Databricks, so for this part I would prefer the official notebook over rewriting a fragile custom agent lab from scratch. (Databricks Documentation)

Part 8 — What the student has learned

By the end of this single lab, the student will have seen:

SQL analytics on Starter Warehouse
Unity Catalog schema and tables
built-in sample data ingestion
serverless notebook development
MLflow experiment tracking
hyperparameter experiments
model registration
model serving
LLM prompting
tool-calling agents
code-first agent deployment

That is already a very strong “Databricks AI/ML end-to-end” tutorial for a fresh workspace.

Part 9 — Optional bonus with the Starter Warehouse

If your workspace supports AI Functions and pay-per-token Foundation Models, you can also show AI directly from SQL with the warehouse.

Example:

SELECT ai_query(
  'databricks-meta-llama-3-3-70b-instruct',
  'Explain in 40 words what our wine quality classifier does.'
) AS explanation;

ai_query is Databricks’ general AI function for invoking supported model serving endpoints directly from SQL or Python. It works on Databricks SQL, but feature availability depends on workspace/region and model support. (Databricks Documentation)

Part 10 — Clean up after the lab

To avoid surprise spend, tell students to clean up:

Stop or leave auto-stop enabled on the Starter Warehouse
Delete the model serving endpoint when finished
Delete the schema objects if they no longer need them
If they created agent endpoints or apps, remove those too

Model Serving runs on serverless infrastructure, and Databricks Apps are billed while running. (Databricks Documentation)

My recommendation for your tutorial format

Use this as three classroom chapters:

Chapter 1 — Data + SQL

Starter Warehouse, schema creation, built-in samples, SQL queries.

Chapter 2 — Classic ML

Serverless notebook, wine dataset, MLflow, experiments, registration, serving.

Chapter 3 — GenAI / Agents

Playground, tool-calling, export to code, official code-first agent demo notebook.

That is the most honest and technically correct way to teach Databricks in 2026.

Refernce URL 1 – https://gist.github.com/devops-school/6528d3f7935300e6d3537a4933a26727
Reference URL 2 – https://gist.github.com/devops-school/3e74e74f525b22bb6ce3b65b11a14b7f

0 0 votes

Article Rating

2 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Mary

3 months ago

Well-written tutorial. It provides a clear understanding of how SQL, AI, and serverless computing come together to enable modern data-driven applications.

Jackson Mercer

1 month ago

Great hands-on tutorial for understanding SQL Warehouses, MLflow, and AI tools in Databricks. The step-by-step explanation makes complex concepts easy to follow and very practical for learners.

Databricks Tutorials – Part 2 – SQL & AI Foundations Lab using Serverless Starter Warehouse