{"id":2887,"date":"2026-04-05T03:48:07","date_gmt":"2026-04-05T03:48:07","guid":{"rendered":"https:\/\/aiopsschool.com\/blog\/?p=2887"},"modified":"2026-04-05T03:53:10","modified_gmt":"2026-04-05T03:53:10","slug":"databricks-tutorials-part-1-defintion-and-terminology","status":"publish","type":"post","link":"https:\/\/aiopsschool.com\/blog\/databricks-tutorials-part-1-defintion-and-terminology\/","title":{"rendered":"Databricks Tutorials &#8211; Part 1 &#8211; Defintion and Terminology"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\">Databricks: latest high-level overview<\/h2>\n\n\n\n<p>Databricks is now best understood as a <strong>unified data, analytics, and AI platform<\/strong> rather than only a Spark notebook tool. Its current platform brings together <strong>data ingestion, data engineering, streaming, SQL warehousing, BI, governance, machine learning, generative AI, model serving, and app building<\/strong> in one environment. The core idea is that teams should not need one tool for ETL, another for warehouse, another for ML, another for governance, and yet another for AI apps; Databricks tries to keep those workflows on a single governed foundation. (<a href=\"https:\/\/docs.databricks.com\/gcp\/en\/lakehouse-architecture\/scope?utm_source=chatgpt.com\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<p>At the center of that platform is <strong>Unity Catalog<\/strong>, which is Databricks\u2019 built-in governance layer. Unity Catalog gives one place to manage access, organization, lineage, discovery, and governance for <strong>tables, volumes, models, features, functions, and more<\/strong> across workspaces. In practice, this means a data engineer, analyst, and ML engineer can work on the same governed assets without each team inventing a separate access model. (<a href=\"https:\/\/docs.databricks.com\/aws\/en\/data-governance\/unity-catalog\/?utm_source=chatgpt.com\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<p>For <strong>data engineering<\/strong>, Databricks covers both batch and streaming. Its current ingestion and pipeline story is centered on <strong>Lakeflow<\/strong>: <strong>Lakeflow Connect<\/strong> for ingesting from files, databases, SaaS apps, cloud storage, and message buses, <strong>Lakeflow Spark Declarative Pipelines<\/strong> for building batch and streaming pipelines in SQL or Python, and <strong>Lakeflow Jobs<\/strong> for orchestration and scheduling. This makes Databricks suitable for building end-to-end pipelines from source ingestion to curated data products. (<a href=\"https:\/\/docs.databricks.com\/aws\/en\/ingestion\/overview?utm_source=chatgpt.com\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<p>For <strong>analytics and BI<\/strong>, Databricks includes <strong>Databricks SQL<\/strong>, <strong>SQL warehouses<\/strong>, <strong>queries<\/strong>, <strong>dashboards<\/strong>, <strong>alerts<\/strong>, <strong>query history<\/strong>, and natural-language analytics through <strong>Genie<\/strong>. Databricks currently recommends <strong>serverless SQL warehouses<\/strong> for most SQL workloads when available, and Genie lets business users ask questions about data in plain language instead of writing SQL. <strong>Metric Views<\/strong> add a governed semantic layer so teams can define business metrics once and reuse them consistently. (<a href=\"https:\/\/docs.databricks.com\/aws\/en\/compute\/sql-warehouse\/?utm_source=chatgpt.com\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<p>For <strong>AI and machine learning<\/strong>, the platform now groups capabilities under <strong>Mosaic AI<\/strong>. That includes experiment tracking, model lifecycle, feature management, vector search, agent tooling, and model serving. Databricks also has <strong>AI Gateway<\/strong>, which acts as a governance and monitoring control plane for LLM endpoints, coding agents, and serving endpoints. In other words, Databricks is not only for training classic ML models anymore; it is also designed for production GenAI and agentic workloads. (<a href=\"https:\/\/docs.databricks.com\/aws\/en\/machine-learning\/?utm_source=chatgpt.com\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<p>A newer and important part of the story is that Databricks is increasingly becoming a place to <strong>build end-user data and AI applications directly on the platform<\/strong>. <strong>Databricks Apps<\/strong> lets teams deploy secure data and AI apps on Databricks\u2019 serverless platform, with native integration to Unity Catalog, Databricks SQL, and OAuth. That is useful for internal dashboards, RAG chat apps, forms, and operational tools without managing separate app infrastructure. (<a href=\"https:\/\/docs.databricks.com\/aws\/en\/dev-tools\/databricks-apps\/?utm_source=chatgpt.com\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">What stands out compared with many other platforms<\/h2>\n\n\n\n<p>What often feels <strong>missing on other platforms<\/strong> is not one individual feature, but the <strong>combination<\/strong> of features in one governed plane. Databricks is especially strong where you want <strong>one control model across data engineering, analytics, AI, and apps<\/strong>. Unity Catalog governing data and AI assets together, <strong>Delta Sharing<\/strong> as an open sharing protocol, <strong>Clean Rooms<\/strong> for privacy-preserving collaboration, <strong>Lakehouse Federation<\/strong> for querying external systems through governed foreign catalogs, and <strong>system tables<\/strong> for platform-level billing, access, lineage, and operational analytics are examples of capabilities that many teams otherwise assemble from multiple products. (<a href=\"https:\/\/docs.databricks.com\/aws\/en\/delta-sharing\/?utm_source=chatgpt.com\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<p>So, in plain words, <strong>Databricks can ingest data, transform it, govern it, query it, visualize it, share it, use it for ML and GenAI, deploy models and agents, and even host the apps built on top of it<\/strong>. That is the clearest overall picture today. Recent platform signals also show where Databricks is heading: <strong>Lakeflow<\/strong> has become the umbrella for ingestion\/jobs\/pipelines, <strong>Genie Code<\/strong> has expanded agentic capabilities for multi-step data work, and <strong>governed tags<\/strong> became generally available in April 2026, which further strengthens governance and discoverability. (<a href=\"https:\/\/docs.databricks.com\/aws\/en\/ingestion\/overview?utm_source=chatgpt.com\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<p><strong>Databricks is a unified data and AI platform where teams can ingest, engineer, govern, analyze, share, and operationalize data and AI products from one place.<\/strong> (<a href=\"https:\/\/docs.databricks.com\/gcp\/en\/lakehouse-architecture\/scope?utm_source=chatgpt.com\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<p>Below is a <strong>simple, human, easy-to-understand Phase 1 terminology guide<\/strong> you can use as your foundation. Databricks\u2019 current sidebar includes areas like <strong>Workspace, Catalog, Jobs &amp; Pipelines, Compute, Marketplace, SQL Editor, Queries, Dashboards, Genie, Alerts, Query History, SQL Warehouses, Playground, Agents, Experiments, Features, Models, and Serving<\/strong>. (<a href=\"https:\/\/docs.databricks.com\/gcp\/en\/workspace\/navigate-workspace\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<h1 class=\"wp-block-heading\">Phase 1: Databricks terminology<\/h1>\n\n\n\n<h2 class=\"wp-block-heading\">First, the big picture<\/h2>\n\n\n\n<p>Think of Databricks like this:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Workspace<\/strong> = your working office<\/li>\n\n\n\n<li><strong>Metastore<\/strong> = the master registry that knows what data exists<\/li>\n\n\n\n<li><strong>Catalog<\/strong> = a top-level business container<\/li>\n\n\n\n<li><strong>Schema<\/strong> = a sub-folder inside a catalog<\/li>\n\n\n\n<li><strong>Table<\/strong> = the actual stored data<\/li>\n\n\n\n<li><strong>Notebook<\/strong> = your working document where you write code and analysis<\/li>\n<\/ul>\n\n\n\n<p>That is very close to how Unity Catalog is structured: <strong>catalog \u2192 schema \u2192 table \/ view \/ volume \/ model \/ function<\/strong>, while the <strong>workspace<\/strong> is where people create and organize working assets like notebooks, queries, dashboards, and files. (<a href=\"https:\/\/docs.databricks.com\/aws\/en\/resources\/glossary\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">Core terms<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Catalog<\/h3>\n\n\n\n<p>A <strong>Catalog<\/strong> is the top business container in Unity Catalog.<br>It is usually used to separate data by business area, environment, or ownership.<\/p>\n\n\n\n<p><strong>Simple example:<\/strong><br>You may create catalogs like:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>sales<\/code><\/li>\n\n\n\n<li><code>finance<\/code><\/li>\n\n\n\n<li><code>marketing<\/code><\/li>\n\n\n\n<li><code>dev<\/code><\/li>\n\n\n\n<li><code>prod<\/code><\/li>\n<\/ul>\n\n\n\n<p>Inside a catalog, you create schemas. Databricks defines a catalog as the first level of the Unity Catalog namespace. (<a href=\"https:\/\/docs.databricks.com\/aws\/en\/resources\/glossary\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Metastore<\/h3>\n\n\n\n<p>A <strong>Metastore<\/strong> is the central registry behind Unity Catalog.<br>It stores the structure and metadata of your data objects, such as tables, columns, data types, and where data lives.<\/p>\n\n\n\n<p><strong>Simple understanding:<\/strong><br>If your company has many workspaces, the metastore is the central brain that keeps governance consistent across them. Unity Catalog uses this centralized model for access control, auditing, lineage, and discovery across workspaces. (<a href=\"https:\/\/docs.databricks.com\/aws\/en\/resources\/glossary\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Workspaces<\/h3>\n\n\n\n<p><strong>Workspaces<\/strong> means you may have multiple Databricks environments, such as:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Dev workspace<\/li>\n\n\n\n<li>UAT workspace<\/li>\n\n\n\n<li>Prod workspace<\/li>\n<\/ul>\n\n\n\n<p>A user can switch between workspaces if they have access. Databricks documents workspace switching in the UI and describes workspaces as organizational environments for developing and sharing objects. (<a href=\"https:\/\/docs.databricks.com\/gcp\/en\/workspace\/navigate-workspace\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Workspace<\/h3>\n\n\n\n<p>A <strong>Workspace<\/strong> is one Databricks environment where users actually work.<br>This is where you create notebooks, files, queries, dashboards, experiments, jobs, and other working assets.<\/p>\n\n\n\n<p><strong>Simple example:<\/strong><br>If Unity Catalog is the governed data layer, the Workspace is the place where engineers, analysts, and data scientists do their daily work. (<a href=\"https:\/\/docs.databricks.com\/aws\/en\/resources\/glossary\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Schema<\/h3>\n\n\n\n<p>A <strong>Schema<\/strong> is the second layer inside a catalog.<br>It is a container inside a catalog that holds tables, views, volumes, models, and functions.<\/p>\n\n\n\n<p><strong>Simple example:<\/strong><br>If <code>sales<\/code> is a catalog, then schemas inside it might be:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>raw<\/code><\/li>\n\n\n\n<li><code>curated<\/code><\/li>\n\n\n\n<li><code>reporting<\/code><\/li>\n<\/ul>\n\n\n\n<p>So a table might look like:<br><code>sales.reporting.monthly_revenue<\/code> (<a href=\"https:\/\/docs.databricks.com\/aws\/en\/resources\/glossary\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Table<\/h3>\n\n\n\n<p>A <strong>Table<\/strong> is where your actual rows of data live.<br>In Databricks, tables created there use <strong>Delta Lake by default<\/strong>.<\/p>\n\n\n\n<p><strong>Simple example:<\/strong><br>A customer table may contain:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>customer_id<\/li>\n\n\n\n<li>customer_name<\/li>\n\n\n\n<li>city<\/li>\n\n\n\n<li>signup_date<\/li>\n<\/ul>\n\n\n\n<p>So in simple words, a table is your real business data stored in a structured form. (<a href=\"https:\/\/docs.databricks.com\/aws\/en\/resources\/glossary\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Notebook<\/h3>\n\n\n\n<p>A <strong>Notebook<\/strong> is an interactive document where you write and run code.<br>You can use Python, SQL, Scala, or R in notebooks.<\/p>\n\n\n\n<p><strong>Simple example:<\/strong><br>You may use one notebook to:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>read raw data<\/li>\n\n\n\n<li>clean it<\/li>\n\n\n\n<li>join tables<\/li>\n\n\n\n<li>create features<\/li>\n\n\n\n<li>train a model<\/li>\n\n\n\n<li>build charts<\/li>\n<\/ul>\n\n\n\n<p>It is one of the most common working assets in Databricks. (<a href=\"https:\/\/docs.databricks.com\/aws\/en\/resources\/glossary\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">Workspace-level create options<\/h1>\n\n\n\n<p>These are the things users usually create in the <strong>Workspace<\/strong> area.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Folder<\/h3>\n\n\n\n<p>A <strong>Folder<\/strong> is simply a place to organize workspace objects.<\/p>\n\n\n\n<p><strong>Simple example:<\/strong><br>You may keep:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>\/Shared\/Finance<\/code><\/li>\n\n\n\n<li><code>\/Users\/rajesh\/tutorials<\/code><\/li>\n\n\n\n<li><code>\/Projects\/customer360<\/code><\/li>\n<\/ul>\n\n\n\n<p>Use it to keep notebooks, files, queries, and dashboards neatly arranged. Workspace objects can be organized in the workspace browser. (<a href=\"https:\/\/docs.databricks.com\/aws\/en\/workspace\/workspace-assets\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Git Folder<\/h3>\n\n\n\n<p>A <strong>Git Folder<\/strong> is Databricks\u2019 integrated Git repository experience.<br>It was formerly called <strong>Repos<\/strong>.<\/p>\n\n\n\n<p><strong>Simple example:<\/strong><br>If your team stores notebooks and Python files in GitHub, GitLab, or Azure DevOps, a Git folder lets you clone that repo into Databricks and work with branches, commits, and CI\/CD more cleanly. (<a href=\"https:\/\/docs.databricks.com\/aws\/en\/resources\/glossary?utm_source=chatgpt.com\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Notebook<\/h3>\n\n\n\n<p>A <strong>Notebook<\/strong> in workspace-level creation means you are creating a new coding or analysis document inside your workspace.<br>This is where most hands-on tutorials begin. (<a href=\"https:\/\/docs.databricks.com\/aws\/en\/resources\/glossary\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">File \/ Query<\/h3>\n\n\n\n<p>I think your term <strong>\u201cFile Query\u201d<\/strong> likely means either <strong>File<\/strong> or <strong>Query<\/strong>.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>A <strong>File<\/strong> is a normal workspace file, such as <code>.py<\/code>, <code>.sql<\/code>, <code>.yml<\/code>, or config files.<\/li>\n\n\n\n<li>A <strong>Query<\/strong> is a saved SQL statement you use to analyze data.<\/li>\n<\/ul>\n\n\n\n<p>Databricks now supports <strong>queries, dashboards, and alerts as workspace files<\/strong>, and the SQL editor is used to author and manage queries. (<a href=\"https:\/\/docs.databricks.com\/aws\/en\/workspace\/workspace-assets\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">ETL Pipeline<\/h3>\n\n\n\n<p>An <strong>ETL Pipeline<\/strong> is the flow that moves and transforms data:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Extract data from source<\/li>\n\n\n\n<li>Transform it<\/li>\n\n\n\n<li>Load it into target tables<\/li>\n<\/ul>\n\n\n\n<p>In modern Databricks, this is commonly done with <strong>Lakeflow Spark Declarative Pipelines<\/strong> or <strong>Lakeflow Connect<\/strong>, depending on the use case. (<a href=\"https:\/\/docs.databricks.com\/aws\/en\/resources\/glossary\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Dashboard<\/h3>\n\n\n\n<p>A <strong>Dashboard<\/strong> is a visual reporting layer built on top of your queries and metrics.<\/p>\n\n\n\n<p><strong>Simple example:<\/strong><br>Instead of showing raw SQL results, you show charts like:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>revenue by month<\/li>\n\n\n\n<li>orders by country<\/li>\n\n\n\n<li>top 10 customers<\/li>\n<\/ul>\n\n\n\n<p>Dashboards are first-class objects in the current workspace UI. (<a href=\"https:\/\/docs.databricks.com\/gcp\/en\/workspace\/navigate-workspace\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Genie Space<\/h3>\n\n\n\n<p>A <strong>Genie Space<\/strong> is a no-code chat space where business users can ask questions in natural language about their data.<\/p>\n\n\n\n<p><strong>Simple example:<\/strong><br>A business user types:<br>\u201cShow me last quarter revenue by region\u201d<br>Genie tries to understand the business language and generate the right answer based on curated data and instructions.<\/p>\n\n\n\n<p>Databricks describes Genie as a natural-language interface for business teams, with space-level curation and organizational terminology. (<a href=\"https:\/\/docs.databricks.com\/aws\/en\/genie\/?utm_source=chatgpt.com\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Vector Search Index<\/h3>\n\n\n\n<p>A <strong>Vector Search Index<\/strong> is a searchable AI index built from a Delta table so you can find semantically similar content instead of only exact matches.<\/p>\n\n\n\n<p><strong>Simple example:<\/strong><br>If you have thousands of product descriptions or support documents, vector search helps you find the most relevant ones even when the words are different but the meaning is similar.<\/p>\n\n\n\n<p>Databricks says the index is created from a Delta table and can be set to sync automatically when the source table changes. (<a href=\"https:\/\/docs.databricks.com\/aws\/en\/vector-search\/vector-search?utm_source=chatgpt.com\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Synced Table<\/h3>\n\n\n\n<p>A <strong>Synced Table<\/strong> is a read-only table that automatically synchronizes data from Unity Catalog into a database instance.<\/p>\n\n\n\n<p><strong>Simple understanding:<\/strong><br>Use it when you want governed lakehouse data to be made available in another serving-style database system without manually copying data every time. Databricks defines synced tables as read-only synchronized Postgres tables sourced from Unity Catalog tables. (<a href=\"https:\/\/docs.databricks.com\/aws\/en\/resources\/glossary\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Metric View<\/h3>\n\n\n\n<p>A <strong>Metric View<\/strong> is a governed way to define business metrics once and reuse them everywhere.<\/p>\n\n\n\n<p><strong>Simple example:<\/strong><br>Instead of every dashboard author writing a different formula for \u201cnet revenue\u201d or \u201cactive customer,\u201d you define it once in a metric view and use the same meaning everywhere.<\/p>\n\n\n\n<p>Databricks describes metric views as centralized, reusable, governed business metrics that can be used across dashboards, Genie spaces, and alerts. (<a href=\"https:\/\/docs.databricks.com\/aws\/en\/resources\/glossary\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">Catalog-level create options<\/h1>\n\n\n\n<p>These are typically objects you create under the <strong>Catalog<\/strong> or Unity Catalog area.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Catalog<\/h3>\n\n\n\n<p>At catalog level, creating a <strong>Catalog<\/strong> means creating a top-level governed container for data and AI assets.<\/p>\n\n\n\n<p><strong>Use case:<\/strong><br>Create <code>finance<\/code> for finance data, <code>hr<\/code> for HR data, or <code>prod<\/code> for production assets. Databricks supports standard, foreign, and shared catalogs. (<a href=\"https:\/\/docs.databricks.com\/aws\/en\/catalogs\/create-catalog\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">External Location<\/h3>\n\n\n\n<p>An <strong>External Location<\/strong> links a cloud storage path with a credential so Databricks can govern access to that location.<\/p>\n\n\n\n<p><strong>Simple example:<\/strong><br>Your raw files already live in S3.<br>You create an external location so Databricks can safely use that path under Unity Catalog governance.<\/p>\n\n\n\n<p>Databricks defines an external location as a securable object that combines a storage path with a storage credential. (<a href=\"https:\/\/docs.databricks.com\/aws\/en\/sql\/language-manual\/sql-ref-external-locations?utm_source=chatgpt.com\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Volume<\/h3>\n\n\n\n<p>A <strong>Volume<\/strong> is for non-tabular files under Unity Catalog governance.<\/p>\n\n\n\n<p><strong>Simple example:<\/strong><br>Use a volume for:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>CSV files<\/li>\n\n\n\n<li>JSON files<\/li>\n\n\n\n<li>images<\/li>\n\n\n\n<li>PDFs<\/li>\n\n\n\n<li>model artifacts<\/li>\n<\/ul>\n\n\n\n<p>If tables are for rows and columns, volumes are for files and folders. Volumes live in the Unity Catalog namespace as <code>catalog.schema.volume<\/code>. (<a href=\"https:\/\/docs.databricks.com\/aws\/en\/volumes\/\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Credential<\/h3>\n\n\n\n<p>Here, the term usually means <strong>Storage Credential<\/strong> in Unity Catalog, and sometimes <strong>Service Credential<\/strong> depending on the feature.<\/p>\n\n\n\n<p><strong>Simple example:<\/strong><br>A storage credential is the secure identity Databricks uses to access cloud storage.<br>Instead of giving each user direct S3 access, Databricks uses the credential in a governed way.<\/p>\n\n\n\n<p>Databricks documents storage credentials as the credential object used by external locations to access cloud storage. (<a href=\"https:\/\/docs.databricks.com\/aws\/en\/connect\/unity-catalog\/cloud-storage\/\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Connection<\/h3>\n\n\n\n<p>A <strong>Connection<\/strong> is used mainly for <strong>Lakehouse Federation<\/strong>.<br>It stores the connection details and credentials for an external database or external service.<\/p>\n\n\n\n<p><strong>Simple example:<\/strong><br>If you want Databricks to query MySQL, PostgreSQL, or another external system without moving all data first, you create a connection and then use it to create a foreign catalog. Databricks defines a connection as a Unity Catalog securable object for accessing external database systems. (<a href=\"https:\/\/docs.databricks.com\/aws\/en\/query-federation\/connections?utm_source=chatgpt.com\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Create Schema<\/h3>\n\n\n\n<p>At the catalog level, <strong>Create Schema<\/strong> means creating a sub-container inside the catalog so you can organize assets better.<\/p>\n\n\n\n<p><strong>Simple example:<\/strong><br>Inside catalog <code>finance<\/code>, create schemas like:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>raw<\/code><\/li>\n\n\n\n<li><code>silver<\/code><\/li>\n\n\n\n<li><code>gold<\/code><\/li>\n\n\n\n<li><code>reporting<\/code> (<a href=\"https:\/\/docs.databricks.com\/aws\/en\/resources\/glossary\">Databricks Documentation<\/a>)<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">Schema-level create options<\/h1>\n\n\n\n<p>Inside a schema, you usually create the actual useful assets.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Volume<\/h3>\n\n\n\n<p>At schema level, creating a <strong>Volume<\/strong> means adding a governed file area under that schema.<\/p>\n\n\n\n<p><strong>Use case:<\/strong><br>Store files like inbound CSVs, PDFs, JSON payloads, image files, or model support files. (<a href=\"https:\/\/docs.databricks.com\/aws\/en\/volumes\/\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Table<\/h3>\n\n\n\n<p>Creating a <strong>Table<\/strong> means creating the structured data object where your rows live.<\/p>\n\n\n\n<p><strong>Use case:<\/strong><br><code>finance.reporting.monthly_pnl<\/code> or <code>sales.curated.customers<\/code> (<a href=\"https:\/\/docs.databricks.com\/aws\/en\/resources\/glossary\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Model<\/h3>\n\n\n\n<p>A <strong>Model<\/strong> is a governed ML model managed in Databricks, commonly through MLflow and Unity Catalog.<\/p>\n\n\n\n<p><strong>Simple example:<\/strong><br>You train a churn prediction model, register it, version it, and later deploy it.<\/p>\n\n\n\n<p>Databricks says Models in Unity Catalog extend centralized access control, auditing, lineage, and cross-workspace discovery to ML models. (<a href=\"https:\/\/docs.databricks.com\/aws\/en\/machine-learning\/manage-model-lifecycle\/?utm_source=chatgpt.com\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Metric View<\/h3>\n\n\n\n<p>At schema level, a <strong>Metric View<\/strong> is where you define reusable business measures in a governed way.<\/p>\n\n\n\n<p><strong>Simple example:<\/strong><br><code>profit_margin<\/code>, <code>avg_order_value<\/code>, or <code>monthly_active_users<\/code><\/p>\n\n\n\n<p>This helps dashboards, alerts, and Genie all use the same business definitions. (<a href=\"https:\/\/docs.databricks.com\/aws\/en\/metric-views\/?utm_source=chatgpt.com\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">Other features on the left side of the workspace<\/h1>\n\n\n\n<h3 class=\"wp-block-heading\">Jobs &amp; Pipelines<\/h3>\n\n\n\n<p>This area is for running and orchestrating repeatable workflows.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Lakeflow Jobs<\/strong> = task orchestration and scheduling<\/li>\n\n\n\n<li><strong>Pipelines<\/strong> = declarative data pipeline definitions<\/li>\n<\/ul>\n\n\n\n<p>Databricks documents Lakeflow Jobs as workflow automation for coordinating multiple tasks, and Lakeflow Spark Declarative Pipelines as a framework for batch and streaming pipelines in SQL and Python. (<a href=\"https:\/\/docs.databricks.com\/aws\/en\/jobs\/?utm_source=chatgpt.com\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Compute<\/h3>\n\n\n\n<p><strong>Compute<\/strong> is the engine that runs your work.<\/p>\n\n\n\n<p>This includes things like:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>clusters<\/li>\n\n\n\n<li>SQL warehouses<\/li>\n\n\n\n<li>serverless compute<\/li>\n\n\n\n<li>serving endpoints in some workflows<\/li>\n<\/ul>\n\n\n\n<p>Simple meaning: without compute, your notebook or query does not actually run. Databricks\u2019 UI lets you create compute resources like clusters and SQL warehouses from the main create flow. (<a href=\"https:\/\/docs.databricks.com\/gcp\/en\/workspace\/navigate-workspace\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Discover<\/h3>\n\n\n\n<p><strong>Discover<\/strong> is a curated browsing experience for finding data assets and insights more easily.<\/p>\n\n\n\n<p><strong>Simple understanding:<\/strong><br>Instead of needing to know exact catalog and schema paths, business users can browse assets in a more business-friendly way. Databricks says Discover is a curated experience and that it is currently in <strong>Beta<\/strong>. (<a href=\"https:\/\/docs.databricks.com\/aws\/en\/discover\/discover-page?utm_source=chatgpt.com\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Databricks Marketplace<\/h3>\n\n\n\n<p><strong>Marketplace<\/strong> is where you can discover and consume shared data products.<\/p>\n\n\n\n<p><strong>Simple example:<\/strong><br>You may bring in public datasets, free sample datasets, or commercial data offerings from providers without building all that data yourself. Databricks says Marketplace uses Delta Sharing for secure sharing. (<a href=\"https:\/\/docs.databricks.com\/aws\/en\/marketplace\/?utm_source=chatgpt.com\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">SQL area<\/h1>\n\n\n\n<h3 class=\"wp-block-heading\">SQL Editor<\/h3>\n\n\n\n<p>The <strong>SQL Editor<\/strong> is where you write, run, save, and visualize SQL queries.<\/p>\n\n\n\n<p><strong>Simple example:<\/strong><br>You open SQL Editor, choose your SQL warehouse, write a query, run it, and maybe turn the result into a chart. (<a href=\"https:\/\/docs.databricks.com\/aws\/en\/sql\/user\/sql-editor\/?utm_source=chatgpt.com\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Queries<\/h3>\n\n\n\n<p><strong>Queries<\/strong> are saved SQL statements.<\/p>\n\n\n\n<p><strong>Use case:<\/strong><br>Instead of rewriting the same SQL every day, save it once and reuse it in dashboards, alerts, or jobs. Queries are part of the SQL area in the current UI. (<a href=\"https:\/\/docs.databricks.com\/gcp\/en\/workspace\/navigate-workspace\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Dashboards<\/h3>\n\n\n\n<p><strong>Dashboards<\/strong> are visual reports built from queries and metrics.<\/p>\n\n\n\n<p><strong>Use case:<\/strong><br>Show business KPIs to leaders without making them read SQL. (<a href=\"https:\/\/docs.databricks.com\/gcp\/en\/workspace\/navigate-workspace\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Genie<\/h3>\n\n\n\n<p><strong>Genie<\/strong> lets users ask data questions in natural language through a curated Genie space.<\/p>\n\n\n\n<p><strong>Use case:<\/strong><br>Business teams can ask questions in plain English instead of learning SQL. (<a href=\"https:\/\/docs.databricks.com\/aws\/en\/genie\/?utm_source=chatgpt.com\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Alerts<\/h3>\n\n\n\n<p><strong>Alerts<\/strong> automatically run a query on a schedule and notify you if a condition is met.<\/p>\n\n\n\n<p><strong>Simple example:<\/strong><br>Send an alert if:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>daily sales drop below target<\/li>\n\n\n\n<li>error count goes above threshold<\/li>\n\n\n\n<li>revenue spikes unusually high<\/li>\n<\/ul>\n\n\n\n<p>Databricks says alerts periodically run queries, evaluate conditions, and send notifications. (<a href=\"https:\/\/docs.databricks.com\/aws\/en\/sql\/user\/alerts\/?utm_source=chatgpt.com\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Query History<\/h3>\n\n\n\n<p><strong>Query History<\/strong> shows what queries were run, how long they took, and execution details.<\/p>\n\n\n\n<p><strong>Use case:<\/strong><br>This is useful for troubleshooting slow queries, checking usage, or understanding what was executed. (<a href=\"https:\/\/docs.databricks.com\/aws\/en\/sql\/user\/queries\/query-history?utm_source=chatgpt.com\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">SQL Warehouse<\/h3>\n\n\n\n<p>A <strong>SQL Warehouse<\/strong> is the compute resource used for SQL analytics in Databricks SQL.<\/p>\n\n\n\n<p><strong>Simple example:<\/strong><br>Your query does not run \u201cby itself.\u201d It runs on a SQL warehouse. Databricks recommends serverless SQL warehouses when available. (<a href=\"https:\/\/docs.databricks.com\/aws\/en\/compute\/sql-warehouse\/?utm_source=chatgpt.com\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">Data engineering terms<\/h1>\n\n\n\n<h3 class=\"wp-block-heading\">Data Engineering Run<\/h3>\n\n\n\n<p>In plain terms, this means the execution of your data workflow.<\/p>\n\n\n\n<p><strong>Simple example:<\/strong><br>A scheduled job runs every morning:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>read source data<\/li>\n\n\n\n<li>clean it<\/li>\n\n\n\n<li>join tables<\/li>\n\n\n\n<li>write final reporting tables<\/li>\n<\/ol>\n\n\n\n<p>In Databricks, that run may happen through a job, a pipeline, or an ingestion flow. (<a href=\"https:\/\/docs.databricks.com\/aws\/en\/jobs\/?utm_source=chatgpt.com\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Data Ingestion<\/h3>\n\n\n\n<p><strong>Data Ingestion<\/strong> means bringing data from outside systems into Databricks.<\/p>\n\n\n\n<p><strong>Simple example:<\/strong><br>You ingest data from:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>files<\/li>\n\n\n\n<li>databases<\/li>\n\n\n\n<li>SaaS tools<\/li>\n\n\n\n<li>cloud storage<\/li>\n\n\n\n<li>Kafka or event streams<\/li>\n<\/ul>\n\n\n\n<p>Databricks\u2019 current ingestion framework is <strong>Lakeflow Connect<\/strong>, which supports connectors for local files, enterprise applications, databases, cloud storage, and message buses. (<a href=\"https:\/\/docs.databricks.com\/gcp\/en\/workspace\/navigate-workspace\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">AI \/ ML area<\/h1>\n\n\n\n<h3 class=\"wp-block-heading\">Playground<\/h3>\n\n\n\n<p><strong>AI Playground<\/strong> is a chat-like place to test and compare LLMs.<\/p>\n\n\n\n<p><strong>Simple example:<\/strong><br>You can try prompts, compare model responses side by side, and prototype simple agents without starting with heavy code. (<a href=\"https:\/\/docs.databricks.com\/gcp\/en\/workspace\/navigate-workspace\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Agents<\/h3>\n\n\n\n<p><strong>Agents<\/strong> are AI applications that can reason, plan, and use tools to complete tasks.<\/p>\n\n\n\n<p><strong>Simple example:<\/strong><br>Instead of only answering one prompt, an agent may:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>search documents<\/li>\n\n\n\n<li>call tools<\/li>\n\n\n\n<li>retrieve context<\/li>\n\n\n\n<li>produce a final answer<\/li>\n<\/ul>\n\n\n\n<p>Databricks supports agent prototyping in AI Playground and custom agent development through its agent framework. (<a href=\"https:\/\/docs.databricks.com\/aws\/en\/generative-ai\/agent-framework\/create-agent\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">AI Gateway<\/h3>\n\n\n\n<p><strong>AI Gateway<\/strong> is the governance and monitoring layer for LLM endpoints, coding agents, and serving endpoints.<\/p>\n\n\n\n<p><strong>Simple example:<\/strong><br>If many teams use different AI models, AI Gateway helps centralize:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>access control<\/li>\n\n\n\n<li>usage monitoring<\/li>\n\n\n\n<li>provider management<\/li>\n\n\n\n<li>traffic control<\/li>\n<\/ul>\n\n\n\n<p>Databricks describes it as the solution for governing and monitoring LLM endpoints, coding agents, and model serving endpoints. (<a href=\"https:\/\/docs.databricks.com\/aws\/en\/resources\/glossary\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Experiments<\/h3>\n\n\n\n<p><strong>Experiments<\/strong> are organized containers for MLflow runs.<\/p>\n\n\n\n<p><strong>Simple example:<\/strong><br>If you train a model 20 times with different parameters, experiments help you keep track of:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>parameters<\/li>\n\n\n\n<li>metrics<\/li>\n\n\n\n<li>artifacts<\/li>\n\n\n\n<li>results<\/li>\n<\/ul>\n\n\n\n<p>Databricks says experiments organize MLflow runs, including training runs, agent traces, and LLM evaluations. (<a href=\"https:\/\/docs.databricks.com\/aws\/en\/mlflow\/experiments?utm_source=chatgpt.com\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Features<\/h3>\n\n\n\n<p><strong>Features<\/strong> are the input signals used by ML models.<\/p>\n\n\n\n<p><strong>Simple example:<\/strong><br>For churn prediction, features might be:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>last login date<\/li>\n\n\n\n<li>number of purchases<\/li>\n\n\n\n<li>support tickets count<\/li>\n<\/ul>\n\n\n\n<p>Databricks Feature Store helps manage feature engineering and serving more consistently. (<a href=\"https:\/\/docs.databricks.com\/aws\/en\/machine-learning\/feature-store\/?utm_source=chatgpt.com\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Models<\/h3>\n\n\n\n<p><strong>Models<\/strong> are trained ML artifacts that you want to track, govern, version, and possibly deploy.<\/p>\n\n\n\n<p><strong>Simple example:<\/strong><br>A fraud model version 3 may perform better than version 2, so you register and manage those versions in Databricks. Unity Catalog models bring governance, lineage, and discovery to ML models. (<a href=\"https:\/\/docs.databricks.com\/aws\/en\/machine-learning\/manage-model-lifecycle\/?utm_source=chatgpt.com\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Serving<\/h3>\n\n\n\n<p><strong>Serving<\/strong> means making a model or AI application available for real-time use through an endpoint or API.<\/p>\n\n\n\n<p><strong>Simple example:<\/strong><br>Your application sends customer data to a serving endpoint and gets back a prediction or an AI response.<\/p>\n\n\n\n<p>Databricks\u2019 serving layer is <strong>Mosaic AI Model Serving<\/strong>, which provides managed real-time and batch inference. (<a href=\"https:\/\/docs.databricks.com\/aws\/en\/machine-learning\/model-serving\/\">Databricks Documentation<\/a>)<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">Very simple summary for beginners<\/h1>\n\n\n\n<p>If you want one beginner-friendly memory trick, use this:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Workspace<\/strong> = where people work<\/li>\n\n\n\n<li><strong>Metastore<\/strong> = central metadata and governance brain<\/li>\n\n\n\n<li><strong>Catalog<\/strong> = top container<\/li>\n\n\n\n<li><strong>Schema<\/strong> = sub-container<\/li>\n\n\n\n<li><strong>Table<\/strong> = structured data<\/li>\n\n\n\n<li><strong>Volume<\/strong> = files<\/li>\n\n\n\n<li><strong>Notebook<\/strong> = working code document<\/li>\n\n\n\n<li><strong>Query<\/strong> = saved SQL<\/li>\n\n\n\n<li><strong>Dashboard<\/strong> = charts and business view<\/li>\n\n\n\n<li><strong>SQL Warehouse<\/strong> = compute for SQL<\/li>\n\n\n\n<li><strong>Job \/ Pipeline<\/strong> = automation<\/li>\n\n\n\n<li><strong>Genie<\/strong> = ask data questions in natural language<\/li>\n\n\n\n<li><strong>Vector Search Index<\/strong> = AI search over embeddings<\/li>\n\n\n\n<li><strong>Metric View<\/strong> = define KPIs once, reuse everywhere<\/li>\n\n\n\n<li><strong>Model Serving<\/strong> = deploy model as an endpoint (<a href=\"https:\/\/docs.databricks.com\/aws\/en\/resources\/glossary\">Databricks Documentation<\/a>)<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/aiopsschool.com\/blog\/wp-content\/uploads\/2026\/04\/mermaid-diagram-3-scaled.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"370\" src=\"https:\/\/aiopsschool.com\/blog\/wp-content\/uploads\/2026\/04\/mermaid-diagram-3-1024x370.png\" alt=\"\" class=\"wp-image-2888\" srcset=\"https:\/\/aiopsschool.com\/blog\/wp-content\/uploads\/2026\/04\/mermaid-diagram-3-1024x370.png 1024w, https:\/\/aiopsschool.com\/blog\/wp-content\/uploads\/2026\/04\/mermaid-diagram-3-300x108.png 300w, https:\/\/aiopsschool.com\/blog\/wp-content\/uploads\/2026\/04\/mermaid-diagram-3-768x277.png 768w, https:\/\/aiopsschool.com\/blog\/wp-content\/uploads\/2026\/04\/mermaid-diagram-3-1536x555.png 1536w, https:\/\/aiopsschool.com\/blog\/wp-content\/uploads\/2026\/04\/mermaid-diagram-3-2048x740.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"994\" height=\"941\" src=\"https:\/\/aiopsschool.com\/blog\/wp-content\/uploads\/2026\/04\/mermaid-diagram-4.png\" alt=\"\" class=\"wp-image-2889\" srcset=\"https:\/\/aiopsschool.com\/blog\/wp-content\/uploads\/2026\/04\/mermaid-diagram-4.png 994w, https:\/\/aiopsschool.com\/blog\/wp-content\/uploads\/2026\/04\/mermaid-diagram-4-300x284.png 300w, https:\/\/aiopsschool.com\/blog\/wp-content\/uploads\/2026\/04\/mermaid-diagram-4-768x727.png 768w\" sizes=\"auto, (max-width: 994px) 100vw, 994px\" \/><\/figure>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"231\" src=\"https:\/\/aiopsschool.com\/blog\/wp-content\/uploads\/2026\/04\/mermaid-diagram-5-1024x231.png\" alt=\"\" class=\"wp-image-2890\" srcset=\"https:\/\/aiopsschool.com\/blog\/wp-content\/uploads\/2026\/04\/mermaid-diagram-5-1024x231.png 1024w, https:\/\/aiopsschool.com\/blog\/wp-content\/uploads\/2026\/04\/mermaid-diagram-5-300x68.png 300w, https:\/\/aiopsschool.com\/blog\/wp-content\/uploads\/2026\/04\/mermaid-diagram-5-768x174.png 768w, https:\/\/aiopsschool.com\/blog\/wp-content\/uploads\/2026\/04\/mermaid-diagram-5-1536x347.png 1536w, https:\/\/aiopsschool.com\/blog\/wp-content\/uploads\/2026\/04\/mermaid-diagram-5.png 2031w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Simple walkthrough<\/h2>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/aiopsschool.com\/blog\/wp-content\/uploads\/2026\/04\/mermaid-diagram-6-scaled.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"537\" src=\"https:\/\/aiopsschool.com\/blog\/wp-content\/uploads\/2026\/04\/mermaid-diagram-6-1024x537.png\" alt=\"\" class=\"wp-image-2893\" srcset=\"https:\/\/aiopsschool.com\/blog\/wp-content\/uploads\/2026\/04\/mermaid-diagram-6-1024x537.png 1024w, https:\/\/aiopsschool.com\/blog\/wp-content\/uploads\/2026\/04\/mermaid-diagram-6-300x157.png 300w, https:\/\/aiopsschool.com\/blog\/wp-content\/uploads\/2026\/04\/mermaid-diagram-6-768x402.png 768w, https:\/\/aiopsschool.com\/blog\/wp-content\/uploads\/2026\/04\/mermaid-diagram-6-1536x805.png 1536w, https:\/\/aiopsschool.com\/blog\/wp-content\/uploads\/2026\/04\/mermaid-diagram-6-2048x1073.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">1. Metastore<\/h3>\n\n\n\n<p>This is the <strong>top governance layer<\/strong>. It does not hold the business report itself; it governs and tracks the objects underneath, such as catalogs, schemas, tables, volumes, and other governed assets. In Unity Catalog, the namespace is organized top-down, and the metastore is the central control point behind that governed structure.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">2. Catalog<\/h3>\n\n\n\n<p>You create a <strong>Catalog<\/strong> for a major business domain or environment.<br>In this example, the catalog is:<\/p>\n\n\n\n<p><code>hospital<\/code><\/p>\n\n\n\n<p>Think of catalog as the <strong>top business container<\/strong>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">3. Schema<\/h3>\n\n\n\n<p>Inside that catalog, you create a <strong>Schema<\/strong> to organize a specific subject area.<br>Example:<\/p>\n\n\n\n<p><code>hospital.appointments<\/code><\/p>\n\n\n\n<p>Think of schema as a <strong>sub-container<\/strong> under the catalog. Unity Catalog organizes data assets using catalog and schema levels before you reach tables, volumes, and other objects.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">4. Volume<\/h3>\n\n\n\n<p>A <strong>Volume<\/strong> is where raw files can live under governance.<\/p>\n\n\n\n<p>Example:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>daily CSV files from hospital booking systems<\/li>\n\n\n\n<li>JSON exports<\/li>\n\n\n\n<li>PDF instructions<\/li>\n\n\n\n<li>AI reference files<\/li>\n<\/ul>\n\n\n\n<p>So if a new appointment file arrives every day, it can first land in a <strong>volume<\/strong> as raw input. Volumes are the governed file-storage object in Unity Catalog for non-tabular files.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">5. Table<\/h3>\n\n\n\n<p>Then Databricks turns those raw files into <strong>tables<\/strong>.<\/p>\n\n\n\n<p>Example:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>hospital.appointments.raw_appointments<\/code><\/li>\n\n\n\n<li><code>hospital.appointments.clean_appointments<\/code><\/li>\n<\/ul>\n\n\n\n<p>A <strong>table<\/strong> is the structured form of the data.<br>This is what analysts and downstream dashboards usually query.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">6. Notebook<\/h3>\n\n\n\n<p>A <strong>Notebook<\/strong> is where a data engineer or analyst writes code to process the raw files.<\/p>\n\n\n\n<p>Example tasks inside notebook:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>read CSV from the volume<\/li>\n\n\n\n<li>clean null values<\/li>\n\n\n\n<li>standardize doctor names<\/li>\n\n\n\n<li>remove duplicate bookings<\/li>\n\n\n\n<li>write the final cleaned table<\/li>\n<\/ul>\n\n\n\n<p>So the notebook is the <strong>working document<\/strong> where transformation logic is written.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">7. Job \/ Pipeline<\/h3>\n\n\n\n<p>Once the notebook logic is ready, you usually do not want to run it manually every day.<br>So you put it into a <strong>Job \/ Pipeline<\/strong>.<\/p>\n\n\n\n<p>That means:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>every day at 1 AM<\/li>\n\n\n\n<li>read new files<\/li>\n\n\n\n<li>update raw table<\/li>\n\n\n\n<li>transform into clean table<\/li>\n\n\n\n<li>refresh business outputs<\/li>\n<\/ul>\n\n\n\n<p>Lakeflow Jobs is Databricks\u2019 workflow orchestration layer, and Lakeflow Spark Declarative Pipelines are used for managed batch and streaming data pipelines.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">8. Query<\/h3>\n\n\n\n<p>Now business analysts want answers.<br>They write a <strong>saved SQL query<\/strong> on the clean table.<\/p>\n\n\n\n<p>Example:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>total appointments by day<\/li>\n\n\n\n<li>cancellations by hospital<\/li>\n\n\n\n<li>top doctors by booking count<\/li>\n<\/ul>\n\n\n\n<p>A query is basically <strong>saved SQL logic<\/strong> that can be reused later.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">9. SQL Warehouse<\/h3>\n\n\n\n<p>The query needs compute to run.<br>That compute is the <strong>SQL Warehouse<\/strong>.<\/p>\n\n\n\n<p>So in simple words:<\/p>\n\n\n\n<p><strong>Query = the question<\/strong><br><strong>SQL Warehouse = the engine that runs the question<\/strong><\/p>\n\n\n\n<p>Databricks SQL runs on SQL warehouses and powers querying, visualization, and other SQL experiences.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">10. Dashboard<\/h3>\n\n\n\n<p>Once queries are ready, they can be shown visually in a <strong>Dashboard<\/strong>.<\/p>\n\n\n\n<p>Example dashboard charts:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>appointments trend by month<\/li>\n\n\n\n<li>cancellation percentage<\/li>\n\n\n\n<li>busiest hospitals<\/li>\n\n\n\n<li>top-performing departments<\/li>\n<\/ul>\n\n\n\n<p>So the dashboard is the <strong>business view<\/strong> built from queries.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">11. Metric View<\/h3>\n\n\n\n<p>Now imagine three different teams all calculate \u201cbooking success rate\u201d differently.<br>That creates confusion.<\/p>\n\n\n\n<p>A <strong>Metric View<\/strong> solves this by defining the KPI once in a governed, reusable way.<\/p>\n\n\n\n<p>Example metrics:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>total bookings<\/li>\n\n\n\n<li>completed appointments<\/li>\n\n\n\n<li>cancellation rate<\/li>\n\n\n\n<li>average appointments per doctor<\/li>\n<\/ul>\n\n\n\n<p>Metric Views are specifically designed in Unity Catalog to define governed, reusable business metrics, and Databricks notes they can be used consistently across tools such as dashboards, Genie spaces, and alerts.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">12. Genie<\/h3>\n\n\n\n<p>Now a hospital operations manager does not know SQL but wants an answer.<\/p>\n\n\n\n<p>They ask in <strong>Genie<\/strong>:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>\u201cWhich hospital had the most cancellations last week?\u201d<\/li>\n\n\n\n<li>\u201cShow appointment growth month by month.\u201d<\/li>\n\n\n\n<li>\u201cWhich departments had the highest no-show rate?\u201d<\/li>\n<\/ul>\n\n\n\n<p>Genie is Databricks\u2019 natural-language analytics experience for business users, tailored to company terminology and data context.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">13. Vector Search Index<\/h3>\n\n\n\n<p>Now suppose you also have:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>doctor profile text<\/li>\n\n\n\n<li>hospital policy documents<\/li>\n\n\n\n<li>patient FAQ documents<\/li>\n\n\n\n<li>support notes<\/li>\n<\/ul>\n\n\n\n<p>You can convert that content into embeddings and build a <strong>Vector Search Index<\/strong>.<\/p>\n\n\n\n<p>That allows semantic AI search like:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>\u201cfind hospitals with strong cardiology services\u201d<\/li>\n\n\n\n<li>\u201cfind documents related to ICU booking rules\u201d<\/li>\n\n\n\n<li>\u201cfind similar patient support cases\u201d<\/li>\n<\/ul>\n\n\n\n<p>Databricks\u2019 vector search indexes are created from Delta tables and support approximate nearest neighbor search for semantic similarity.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">14. Model Serving<\/h3>\n\n\n\n<p>Finally, if you want an application or API to use AI or predictions in real time, you expose it through <strong>Model Serving<\/strong>.<\/p>\n\n\n\n<p>Example:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>an endpoint that predicts cancellation risk<\/li>\n\n\n\n<li>an endpoint that answers hospital FAQ using retrieved documents<\/li>\n\n\n\n<li>an endpoint used by your website or internal portal<\/li>\n<\/ul>\n\n\n\n<p>So model serving is the <strong>production endpoint layer<\/strong> on top of your governed data and AI assets.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">The full story in one line<\/h2>\n\n\n\n<p><strong>Raw files land in a volume, get transformed into tables through notebooks and jobs\/pipelines, are queried through SQL warehouses, visualized in dashboards, explained through Genie, standardized through metric views, enriched with vector search, and exposed to applications through model serving.<\/strong><\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Databricks: latest high-level overview Databricks is now best understood as a unified data, analytics, and AI platform rather than only [&hellip;]<\/p>\n","protected":false},"author":4,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-2887","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/aiopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/2887","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/aiopsschool.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/aiopsschool.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/aiopsschool.com\/blog\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/aiopsschool.com\/blog\/wp-json\/wp\/v2\/comments?post=2887"}],"version-history":[{"count":4,"href":"https:\/\/aiopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/2887\/revisions"}],"predecessor-version":[{"id":2895,"href":"https:\/\/aiopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/2887\/revisions\/2895"}],"wp:attachment":[{"href":"https:\/\/aiopsschool.com\/blog\/wp-json\/wp\/v2\/media?parent=2887"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/aiopsschool.com\/blog\/wp-json\/wp\/v2\/categories?post=2887"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/aiopsschool.com\/blog\/wp-json\/wp\/v2\/tags?post=2887"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}