{"id":2905,"date":"2026-04-05T07:41:50","date_gmt":"2026-04-05T07:41:50","guid":{"rendered":"https:\/\/aiopsschool.com\/blog\/?p=2905"},"modified":"2026-04-05T07:41:50","modified_gmt":"2026-04-05T07:41:50","slug":"databricks-the-master-guide-to-databricks-workspaces-unity-catalog-metastores-storage-compute-and-cost","status":"publish","type":"post","link":"https:\/\/aiopsschool.com\/blog\/databricks-the-master-guide-to-databricks-workspaces-unity-catalog-metastores-storage-compute-and-cost\/","title":{"rendered":"Databricks &#8211; The Master Guide to Databricks Workspaces, Unity Catalog, Metastores, Storage, Compute, and Cost"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\">Who this guide is for<\/h2>\n\n\n\n<p>This guide is for people who are asking questions like:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>What is a Databricks workspace?<\/li>\n\n\n\n<li>What is Unity Catalog?<\/li>\n\n\n\n<li>What is a metastore?<\/li>\n\n\n\n<li>Why do I need both a workspace and a metastore?<\/li>\n\n\n\n<li>Where is my data actually stored?<\/li>\n\n\n\n<li>Which parts live in Databricks, and which parts live in AWS or Azure?<\/li>\n\n\n\n<li>Which things cost money, and how?<\/li>\n\n\n\n<li>When should my organization choose serverless or classic compute?<\/li>\n<\/ul>\n\n\n\n<p>This guide explains the concepts in simple language first, then goes deeper with architecture, storage, cost, ownership, and adoption patterns.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">Part 1: The simple mental model<\/h1>\n\n\n\n<h2 class=\"wp-block-heading\">The six most important terms<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">1. Workspace<\/h3>\n\n\n\n<p>A <strong>Databricks workspace<\/strong> is the place where users log in and work.<\/p>\n\n\n\n<p>It is the UI and working environment where people:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>open notebooks<\/li>\n\n\n\n<li>run queries<\/li>\n\n\n\n<li>create dashboards<\/li>\n\n\n\n<li>attach compute<\/li>\n\n\n\n<li>collaborate with teammates<\/li>\n\n\n\n<li>manage files and code<\/li>\n<\/ul>\n\n\n\n<p>Think of it as your <strong>office<\/strong>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">2. Unity Catalog<\/h3>\n\n\n\n<p><strong>Unity Catalog<\/strong> is Databricks&#8217; centralized system for organizing and governing data and AI assets.<\/p>\n\n\n\n<p>It gives you:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>one shared data namespace<\/li>\n\n\n\n<li>permissions and access control<\/li>\n\n\n\n<li>governance across workspaces<\/li>\n\n\n\n<li>auditing and data sharing support<\/li>\n<\/ul>\n\n\n\n<p>Think of it as the <strong>company rulebook plus directory for data<\/strong>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">3. Metastore<\/h3>\n\n\n\n<p>A <strong>metastore<\/strong> is the top-level root container inside Unity Catalog.<\/p>\n\n\n\n<p>It is the root of the data organization hierarchy.<\/p>\n\n\n\n<p>Think of it as the <strong>main library index<\/strong>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">4. Catalog<\/h3>\n\n\n\n<p>A <strong>catalog<\/strong> is a top-level business area inside the metastore.<\/p>\n\n\n\n<p>Examples:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>finance<\/code><\/li>\n\n\n\n<li><code>sales<\/code><\/li>\n\n\n\n<li><code>marketing<\/code><\/li>\n\n\n\n<li><code>sandbox<\/code><\/li>\n<\/ul>\n\n\n\n<p>Think of a catalog like a <strong>major department folder<\/strong>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">5. Schema<\/h3>\n\n\n\n<p>A <strong>schema<\/strong> is a subfolder inside a catalog.<\/p>\n\n\n\n<p>Examples:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>raw<\/code><\/li>\n\n\n\n<li><code>staging<\/code><\/li>\n\n\n\n<li><code>analytics<\/code><\/li>\n\n\n\n<li><code>gold<\/code><\/li>\n<\/ul>\n\n\n\n<p>Think of a schema like a <strong>subfolder inside a department<\/strong>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">6. Table<\/h3>\n\n\n\n<p>A <strong>table<\/strong> is an actual data object.<\/p>\n\n\n\n<p>Example:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>sales.analytics.orders<\/code><\/li>\n<\/ul>\n\n\n\n<p>Think of a table as the <strong>actual dataset people query<\/strong>.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">One sentence summary<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Workspace<\/strong> = where people work<\/li>\n\n\n\n<li><strong>Unity Catalog<\/strong> = the governance system<\/li>\n\n\n\n<li><strong>Metastore<\/strong> = the root container in Unity Catalog<\/li>\n\n\n\n<li><strong>Catalog<\/strong> = major data area<\/li>\n\n\n\n<li><strong>Schema<\/strong> = subfolder in that area<\/li>\n\n\n\n<li><strong>Table<\/strong> = actual dataset<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">Part 2: The most important relationship<\/h1>\n\n\n\n<p>A lot of confusion disappears once you separate these two ideas:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Workspace = user working environment<\/strong><\/li>\n\n\n\n<li><strong>Metastore = data governance root<\/strong><\/li>\n<\/ul>\n\n\n\n<p>A workspace is <strong>not<\/strong> the same as a metastore.<\/p>\n\n\n\n<p>A workspace gets <strong>attached to<\/strong> a metastore.<\/p>\n\n\n\n<p>That means:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>users work in the workspace<\/li>\n\n\n\n<li>the workspace uses the metastore to know which catalogs, schemas, tables, and permissions exist<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Diagram: workspace and metastore relationship<\/h2>\n\n\n\n<pre class=\"wp-block-code\"><code>flowchart LR\n    U&#91;Users] --&gt; W&#91;Databricks Workspace]\n    W --&gt; C&#91;Compute]\n    W --&gt; M&#91;Unity Catalog Metastore]\n    M --&gt; CAT&#91;Catalog]\n    CAT --&gt; SCH&#91;Schema]\n    SCH --&gt; TBL&#91;Table]\n<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Key idea<\/h3>\n\n\n\n<p>The workspace is where users run things.<br>The metastore is what gives structure and governance to the data those users see.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">Part 3: Why both are needed<\/h1>\n\n\n\n<h2 class=\"wp-block-heading\">Why a workspace is needed<\/h2>\n\n\n\n<p>You need a workspace because users need a place to:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>log in<\/li>\n\n\n\n<li>write notebooks<\/li>\n\n\n\n<li>run compute<\/li>\n\n\n\n<li>create jobs<\/li>\n\n\n\n<li>view dashboards<\/li>\n\n\n\n<li>collaborate<\/li>\n<\/ul>\n\n\n\n<p>Without a workspace, there is no user-facing working environment.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Why a metastore is needed<\/h2>\n\n\n\n<p>You need a metastore because data needs:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>a standard namespace<\/li>\n\n\n\n<li>permissions<\/li>\n\n\n\n<li>centralized control<\/li>\n\n\n\n<li>shared definitions across teams and workspaces<\/li>\n<\/ul>\n\n\n\n<p>Without a metastore, users may still have compute and notebooks, but data governance becomes fragmented.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Very simple analogy<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Workspace is the office<\/h3>\n\n\n\n<p>It has:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>desks<\/li>\n\n\n\n<li>screens<\/li>\n\n\n\n<li>tools<\/li>\n\n\n\n<li>people working<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Metastore is the central company library catalog<\/h3>\n\n\n\n<p>It tells you:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>what books exist<\/li>\n\n\n\n<li>what sections exist<\/li>\n\n\n\n<li>who can read them<\/li>\n\n\n\n<li>how everything is organized<\/li>\n<\/ul>\n\n\n\n<p>So:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>office != library catalog<\/li>\n\n\n\n<li>but the office uses the library catalog<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">Part 4: The full hierarchy<\/h1>\n\n\n\n<h2 class=\"wp-block-heading\">Data hierarchy in Unity Catalog<\/h2>\n\n\n\n<pre class=\"wp-block-code\"><code>flowchart TD\n    MS&#91;Metastore]\n    MS --&gt; C1&#91;Catalog: finance]\n    MS --&gt; C2&#91;Catalog: sales]\n    C1 --&gt; S1&#91;Schema: raw]\n    C1 --&gt; S2&#91;Schema: analytics]\n    C2 --&gt; S3&#91;Schema: raw]\n    C2 --&gt; S4&#91;Schema: analytics]\n    S2 --&gt; T1&#91;Table: invoices]\n    S2 --&gt; T2&#91;View: monthly_summary]\n    S4 --&gt; T3&#91;Table: orders]\n    S4 --&gt; T4&#91;Volume: documents]\n<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Example names<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>metastore: <code>company-us-east-1<\/code><\/li>\n\n\n\n<li>catalog: <code>sales<\/code><\/li>\n\n\n\n<li>schema: <code>analytics<\/code><\/li>\n\n\n\n<li>table: <code>orders<\/code><\/li>\n<\/ul>\n\n\n\n<p>Full table name:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>sales.analytics.orders<\/code><\/li>\n<\/ul>\n\n\n\n<p>That full name does <strong>not<\/strong> include the metastore name in everyday SQL usage.<br>The metastore sits above the catalog as the governance root.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">Part 5: Where things are actually stored<\/h1>\n\n\n\n<p>This is where many people get lost.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Important distinction<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">The metastore stores metadata and governance<\/h3>\n\n\n\n<p>It stores things like:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>object definitions<\/li>\n\n\n\n<li>permissions<\/li>\n\n\n\n<li>governance relationships<\/li>\n\n\n\n<li>references to managed or external data<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">The metastore does not store all the raw data bytes itself<\/h3>\n\n\n\n<p>The actual data files for tables and volumes are stored in <strong>object storage<\/strong>.<\/p>\n\n\n\n<p>That storage can be:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Databricks default storage<\/li>\n\n\n\n<li>AWS S3<\/li>\n\n\n\n<li>Azure Data Lake Storage<\/li>\n\n\n\n<li>Google Cloud Storage<\/li>\n<\/ul>\n\n\n\n<p>depending on setup.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Diagram: metadata versus actual data<\/h2>\n\n\n\n<pre class=\"wp-block-code\"><code>flowchart LR\n    W&#91;Workspace] --&gt; M&#91;Metastore]\n    M --&gt; META&#91;Metadata and Permissions]\n    M --&gt; CAT&#91;Catalog\/Schema\/Table Definitions]\n    M --&gt; ST&#91;Storage Location References]\n    ST --&gt; S3&#91;AWS S3 or ADLS or GCS or Databricks Default Storage]\n    S3 --&gt; FILES&#91;Actual Table and Volume Files]\n<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">What is metadata?<\/h2>\n\n\n\n<p>Metadata means information <strong>about<\/strong> the data, such as:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>table name<\/li>\n\n\n\n<li>schema columns<\/li>\n\n\n\n<li>owner<\/li>\n\n\n\n<li>permissions<\/li>\n\n\n\n<li>table location<\/li>\n\n\n\n<li>whether it is managed or external<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">What is actual data?<\/h2>\n\n\n\n<p>Actual data means:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Parquet files<\/li>\n\n\n\n<li>Delta files<\/li>\n\n\n\n<li>Iceberg files<\/li>\n\n\n\n<li>documents in volumes<\/li>\n\n\n\n<li>data files in cloud object storage<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">Part 6: Managed versus external data<\/h1>\n\n\n\n<p>This is one of the most useful ideas in Databricks.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Managed tables<\/h2>\n\n\n\n<p>A <strong>managed table<\/strong> means Databricks manages the table&#8217;s storage location based on the managed storage location configured at the metastore, catalog, or schema level.<\/p>\n\n\n\n<p>Use managed tables when:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>you want simpler lifecycle management<\/li>\n\n\n\n<li>you want stronger governance<\/li>\n\n\n\n<li>you want the easiest Databricks-native experience<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">External tables<\/h2>\n\n\n\n<p>An <strong>external table<\/strong> means the data already exists in your cloud storage, and you register it in Unity Catalog.<\/p>\n\n\n\n<p>Use external tables when:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>you already have data in S3 or ADLS<\/li>\n\n\n\n<li>multiple systems share the same files<\/li>\n\n\n\n<li>you want storage independence<\/li>\n\n\n\n<li>your data lake existed before Databricks<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Diagram: managed versus external<\/h2>\n\n\n\n<pre class=\"wp-block-code\"><code>flowchart TD\n    UC&#91;Unity Catalog]\n    UC --&gt; MT&#91;Managed Table]\n    UC --&gt; ET&#91;External Table]\n\n    MT --&gt; ML&#91;Managed Storage Location]\n    ML --&gt; OBJ1&#91;Cloud Object Storage Managed by Databricks Rules]\n\n    ET --&gt; EX&#91;External Location]\n    EX --&gt; OBJ2&#91;Existing S3 or ADLS or GCS Path]\n<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Rule of thumb<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>For greenfield adoption, managed tables are often simpler<\/li>\n\n\n\n<li>For existing enterprise lake storage, external tables are common<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">Part 7: What is created where<\/h1>\n\n\n\n<h2 class=\"wp-block-heading\">Workspace: where it is created<\/h2>\n\n\n\n<p>A workspace is created in the Databricks account layer.<\/p>\n\n\n\n<p>Depending on cloud and workspace type:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>classic workspace<\/li>\n\n\n\n<li>serverless workspace<\/li>\n<\/ul>\n\n\n\n<p>A workspace is associated with a region and cloud platform.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Metastore: where it is created<\/h2>\n\n\n\n<p>A metastore is created at the Databricks account level and linked to one or more workspaces in the same region.<\/p>\n\n\n\n<p>You typically create:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>one metastore per region<\/li>\n<\/ul>\n\n\n\n<p>not:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>one metastore per user<\/li>\n\n\n\n<li>one metastore per notebook<\/li>\n\n\n\n<li>one metastore per team unless there is a strong isolation reason<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Catalog: where it is created<\/h2>\n\n\n\n<p>Catalogs are created inside the metastore.<\/p>\n\n\n\n<p>Examples:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>finance<\/code><\/li>\n\n\n\n<li><code>sales<\/code><\/li>\n\n\n\n<li><code>sandbox<\/code><\/li>\n\n\n\n<li><code>ml<\/code><\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Schema: where it is created<\/h2>\n\n\n\n<p>Schemas are created inside a catalog.<\/p>\n\n\n\n<p>Examples:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>raw<\/code><\/li>\n\n\n\n<li><code>curated<\/code><\/li>\n\n\n\n<li><code>analytics<\/code><\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Table: where it is created<\/h2>\n\n\n\n<p>Tables are created inside a schema.<\/p>\n\n\n\n<p>Examples:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>finance.analytics.invoices<\/code><\/li>\n\n\n\n<li><code>sales.raw.orders<\/code><\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">Part 8: Why workspace creation asks AWS account or serverless, but metastore creation does not<\/h1>\n\n\n\n<p>This is one of the most important points.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Workspace creation asks about AWS account or serverless because workspace creation is about infrastructure and compute model<\/h2>\n\n\n\n<p>When creating a workspace, Databricks needs to know:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>where the workspace runs<\/li>\n\n\n\n<li>whether the environment is classic or serverless<\/li>\n\n\n\n<li>who owns more of the infrastructure<\/li>\n\n\n\n<li>how root\/workspace\/default storage should behave<\/li>\n<\/ul>\n\n\n\n<p>That is why AWS versus serverless matters at workspace creation.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Metastore creation does not ask AWS account or serverless in the same way because the metastore is not compute<\/h2>\n\n\n\n<p>Metastore creation is about:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>governance<\/li>\n\n\n\n<li>metadata<\/li>\n\n\n\n<li>namespace<\/li>\n\n\n\n<li>storage references for managed objects<\/li>\n<\/ul>\n\n\n\n<p>So the metastore is about <strong>how data is governed<\/strong>, not <strong>how compute is provisioned<\/strong>.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">Part 9: Which parts live in Databricks versus AWS or Azure<\/h1>\n\n\n\n<p>This depends on whether you use serverless or classic patterns.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Simplified ownership model<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Usually in Databricks-managed space<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>serverless compute<\/li>\n\n\n\n<li>serverless workspace runtime environment<\/li>\n\n\n\n<li>Databricks control plane<\/li>\n\n\n\n<li>default storage features<\/li>\n\n\n\n<li>metadata services and governance services<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Usually in customer cloud account<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>classic compute VMs<\/li>\n\n\n\n<li>customer-managed S3 or ADLS storage<\/li>\n\n\n\n<li>existing data lake storage<\/li>\n\n\n\n<li>networking setup for classic workspace deployments<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Diagram: ownership split<\/h2>\n\n\n\n<pre class=\"wp-block-code\"><code>flowchart LR\n    subgraph Databricks_Managed&#91;Databricks Managed]\n        DP&#91;Control Plane]\n        SC&#91;Serverless Compute]\n        DS&#91;Default Storage]\n        UC&#91;Unity Catalog Services]\n    end\n\n    subgraph Customer_Cloud&#91;Customer Cloud Account]\n        CC&#91;Classic Compute Resources]\n        CS&#91;Customer Object Storage]\n        NW&#91;Customer Networking]\n    end\n\n    W&#91;Workspace] --&gt; DP\n    W --&gt; SC\n    W --&gt; CC\n    M&#91;Metastore] --&gt; UC\n    M --&gt; DS\n    M --&gt; CS\n<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">AWS examples<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Can live in Databricks<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>serverless notebook compute<\/li>\n\n\n\n<li>serverless jobs compute<\/li>\n\n\n\n<li>serverless SQL warehouse infrastructure<\/li>\n\n\n\n<li>default storage for serverless workspace and default catalog<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Can live in AWS account<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>classic all-purpose clusters<\/li>\n\n\n\n<li>classic job clusters<\/li>\n\n\n\n<li>S3 buckets for external or managed storage<\/li>\n\n\n\n<li>IAM roles and networking for customer-managed storage access<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Azure examples<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Can live in Databricks<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>serverless compute<\/li>\n\n\n\n<li>Databricks-managed services<\/li>\n\n\n\n<li>default storage scenarios<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Can live in Azure subscription<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>classic compute resources<\/li>\n\n\n\n<li>ADLS storage for managed or external data<\/li>\n\n\n\n<li>managed identities or service principals for storage access<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">Part 10: Cost model &#8211; what costs money and how<\/h1>\n\n\n\n<h2 class=\"wp-block-heading\">First principle<\/h2>\n\n\n\n<p>Different parts of Databricks cost money in different ways.<\/p>\n\n\n\n<p>Not everything costs the same way.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">1. Workspace<\/h2>\n\n\n\n<p>Creating a workspace by itself is not usually the main cost driver.<br>The main costs come from what you use inside it.<\/p>\n\n\n\n<p>Typical cost drivers connected to workspaces:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>compute usage<\/li>\n\n\n\n<li>storage usage<\/li>\n\n\n\n<li>network and cloud services<\/li>\n\n\n\n<li>premium features depending on plan<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">2. Metastore<\/h2>\n\n\n\n<p>A metastore itself is generally not the big cost driver.<\/p>\n\n\n\n<p>A metastore mainly represents governance and metadata organization.<\/p>\n\n\n\n<p>Costs appear when you use:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>managed storage<\/li>\n\n\n\n<li>cloud storage<\/li>\n\n\n\n<li>compute that reads and writes data<\/li>\n\n\n\n<li>serverless or classic workloads against that data<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">3. Compute<\/h2>\n\n\n\n<p>Compute is usually the biggest cost driver.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Serverless compute<\/h3>\n\n\n\n<p>You pay based on Databricks usage, usually measured through DBUs and serverless usage.<\/p>\n\n\n\n<p>Examples:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>serverless notebooks<\/li>\n\n\n\n<li>serverless jobs<\/li>\n\n\n\n<li>serverless SQL warehouses<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Classic compute<\/h3>\n\n\n\n<p>You pay for:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Databricks DBUs<\/li>\n\n\n\n<li>cloud infrastructure such as EC2 or Azure VMs<\/li>\n\n\n\n<li>storage and networking around those compute resources<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">4. Storage<\/h2>\n\n\n\n<p>Storage costs depend on where the data lives.<\/p>\n\n\n\n<p>Examples:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>S3 charges in AWS<\/li>\n\n\n\n<li>ADLS charges in Azure<\/li>\n\n\n\n<li>Databricks default storage use for supported features and serverless workspace defaults<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">5. Data transfer and networking<\/h2>\n\n\n\n<p>Depending on setup, cloud networking and data transfer can also cost money.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">Part 11: Cost by component<\/h1>\n\n\n\n<h2 class=\"wp-block-heading\">Cost table in plain English<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Component<\/th><th>Main purpose<\/th><th>Usually costs by itself?<\/th><th>Main cost type<\/th><\/tr><\/thead><tbody><tr><td>Workspace<\/td><td>User environment<\/td><td>Usually not the main cost<\/td><td>Platform usage around workspace<\/td><\/tr><tr><td>Unity Catalog<\/td><td>Governance system<\/td><td>Not usually the main cost<\/td><td>Indirect through storage and usage<\/td><\/tr><tr><td>Metastore<\/td><td>Top-level governance root<\/td><td>Usually low direct cost concern<\/td><td>Indirect through data and compute<\/td><\/tr><tr><td>Catalog<\/td><td>Organizing data<\/td><td>No meaningful direct cost alone<\/td><td>None by itself<\/td><\/tr><tr><td>Schema<\/td><td>Organizing data<\/td><td>No meaningful direct cost alone<\/td><td>None by itself<\/td><\/tr><tr><td>Table<\/td><td>Actual data object<\/td><td>Yes, indirectly<\/td><td>Storage + compute to read\/write<\/td><\/tr><tr><td>Serverless notebook compute<\/td><td>Interactive notebook execution<\/td><td>Yes<\/td><td>Serverless DBU usage<\/td><\/tr><tr><td>SQL warehouse<\/td><td>SQL analytics compute<\/td><td>Yes<\/td><td>SQL\/serverless usage<\/td><\/tr><tr><td>Classic cluster<\/td><td>Customer-cloud compute<\/td><td>Yes<\/td><td>DBU + cloud VM\/infrastructure<\/td><\/tr><tr><td>Storage location<\/td><td>Holds data files<\/td><td>Yes<\/td><td>S3\/ADLS\/GCS\/default storage<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">Part 12: The three deployment patterns every organization should understand<\/h1>\n\n\n\n<h2 class=\"wp-block-heading\">Pattern A: Serverless-first organization<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">What it means<\/h3>\n\n\n\n<p>The organization prefers Databricks-managed serverless compute for notebooks, jobs, and SQL.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Good for<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>fast setup<\/li>\n\n\n\n<li>low infrastructure burden<\/li>\n\n\n\n<li>internal training<\/li>\n\n\n\n<li>analytics teams that want simplicity<\/li>\n\n\n\n<li>organizations with limited cloud-infra admin support<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Typical characteristics<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>serverless workspace<\/li>\n\n\n\n<li>Unity Catalog enabled<\/li>\n\n\n\n<li>default storage for default catalog<\/li>\n\n\n\n<li>optional customer cloud storage for additional catalogs<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Pros<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>fastest time to value<\/li>\n\n\n\n<li>minimal infra management<\/li>\n\n\n\n<li>simpler operations<\/li>\n\n\n\n<li>easier for many users<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Cons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>less low-level infrastructure control<\/li>\n\n\n\n<li>some organizations prefer more customer-owned storage and networking<\/li>\n\n\n\n<li>advanced customization may push teams toward classic resources<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Pattern B: Classic customer-cloud-heavy organization<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">What it means<\/h3>\n\n\n\n<p>The organization uses classic compute in its own cloud account and stores data in customer-managed storage.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Good for<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>large enterprise data lake already exists<\/li>\n\n\n\n<li>strong cloud platform team<\/li>\n\n\n\n<li>strict networking requirements<\/li>\n\n\n\n<li>high customization needs<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Typical characteristics<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>classic workspace<\/li>\n\n\n\n<li>Unity Catalog enabled<\/li>\n\n\n\n<li>S3 or ADLS used heavily<\/li>\n\n\n\n<li>external locations and customer-owned storage patterns<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Pros<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>more control<\/li>\n\n\n\n<li>easier alignment with existing cloud architecture<\/li>\n\n\n\n<li>fits established enterprise landing zones<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Cons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>more setup complexity<\/li>\n\n\n\n<li>more operations burden<\/li>\n\n\n\n<li>slower onboarding for new teams<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Pattern C: Hybrid organization<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">What it means<\/h3>\n\n\n\n<p>The organization uses both:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>serverless for many interactive workloads<\/li>\n\n\n\n<li>classic compute for special workloads<\/li>\n\n\n\n<li>customer cloud storage for durable enterprise data<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Good for<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>most medium and large organizations<\/li>\n\n\n\n<li>teams adopting gradually<\/li>\n\n\n\n<li>mixed analytics and engineering workloads<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Pros<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>balance of simplicity and control<\/li>\n\n\n\n<li>practical migration path<\/li>\n\n\n\n<li>good for phased modernization<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Cons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>governance and FinOps must be clear<\/li>\n\n\n\n<li>architecture becomes more complex if standards are weak<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">Part 13: A very practical \u201cwho does what\u201d view<\/h1>\n\n\n\n<h2 class=\"wp-block-heading\">Account admin<\/h2>\n\n\n\n<p>Usually responsible for:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>creating workspaces<\/li>\n\n\n\n<li>creating or assigning metastores<\/li>\n\n\n\n<li>enabling Unity Catalog<\/li>\n\n\n\n<li>setting broad governance patterns<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Workspace admin<\/h2>\n\n\n\n<p>Usually responsible for:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>workspace-level permissions<\/li>\n\n\n\n<li>compute access and policies<\/li>\n\n\n\n<li>serverless usage policies<\/li>\n\n\n\n<li>user onboarding inside the workspace<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Data platform team<\/h2>\n\n\n\n<p>Usually responsible for:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>storage design<\/li>\n\n\n\n<li>external locations<\/li>\n\n\n\n<li>catalog strategy<\/li>\n\n\n\n<li>naming conventions<\/li>\n\n\n\n<li>security and permissions<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Data users<\/h2>\n\n\n\n<p>Usually responsible for:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>creating notebooks<\/li>\n\n\n\n<li>running queries<\/li>\n\n\n\n<li>using approved catalogs\/schemas<\/li>\n\n\n\n<li>building dashboards and pipelines<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">Part 14: What gets shared across workspaces<\/h1>\n\n\n\n<p>This is a very important reason Unity Catalog exists.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">If multiple workspaces use the same metastore<\/h2>\n\n\n\n<p>Then they can share the same:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>catalogs<\/li>\n\n\n\n<li>schemas<\/li>\n\n\n\n<li>tables<\/li>\n\n\n\n<li>volumes<\/li>\n\n\n\n<li>permissions model<\/li>\n<\/ul>\n\n\n\n<p>This means you might have:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Dev workspace<\/li>\n\n\n\n<li>Test workspace<\/li>\n\n\n\n<li>Prod workspace<\/li>\n<\/ul>\n\n\n\n<p>all attached to the same metastore, or to separate metastores depending on your governance design.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Diagram: multiple workspaces sharing one metastore<\/h2>\n\n\n\n<pre class=\"wp-block-code\"><code>flowchart TD\n    W1&#91;Dev Workspace] --&gt; M&#91;Regional Metastore]\n    W2&#91;Test Workspace] --&gt; M\n    W3&#91;Prod Workspace] --&gt; M\n\n    M --&gt; CAT1&#91;Catalog: finance]\n    M --&gt; CAT2&#91;Catalog: sales]\n    M --&gt; CAT3&#91;Catalog: sandbox]\n<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">When to share one metastore<\/h2>\n\n\n\n<p>Use one metastore across multiple workspaces when:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>they are in the same region<\/li>\n\n\n\n<li>you want shared governance<\/li>\n\n\n\n<li>teams need a common namespace<\/li>\n\n\n\n<li>access control can be handled through permissions<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">When separate metastores may make sense<\/h2>\n\n\n\n<p>Use separate metastores when:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>regions differ<\/li>\n\n\n\n<li>legal or residency requirements differ<\/li>\n\n\n\n<li>business units require stronger separation<\/li>\n\n\n\n<li>platform governance intentionally isolates environments<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">Part 15: Where to create what in real life<\/h1>\n\n\n\n<h2 class=\"wp-block-heading\">Workspace creation checklist<\/h2>\n\n\n\n<p>Create a workspace when:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>a new team needs its own working environment<\/li>\n\n\n\n<li>you want separate admin boundaries<\/li>\n\n\n\n<li>you want separate dev\/test\/prod environments<\/li>\n\n\n\n<li>you need a new regional deployment<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Metastore creation checklist<\/h2>\n\n\n\n<p>Create a metastore when:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>a region does not yet have one<\/li>\n\n\n\n<li>you intentionally need a separate governance root<\/li>\n\n\n\n<li>legal or organization boundaries require isolation<\/li>\n<\/ul>\n\n\n\n<p>Do <strong>not<\/strong> create a new metastore just because:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>a new analyst joined<\/li>\n\n\n\n<li>a new notebook is created<\/li>\n\n\n\n<li>a new project starts<\/li>\n\n\n\n<li>a new schema is needed<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Catalog creation checklist<\/h2>\n\n\n\n<p>Create a catalog when:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>you need a major logical data domain<\/li>\n\n\n\n<li>you want department-level ownership<\/li>\n\n\n\n<li>you need separate storage or governance boundaries<\/li>\n<\/ul>\n\n\n\n<p>Examples:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>finance<\/code><\/li>\n\n\n\n<li><code>sales<\/code><\/li>\n\n\n\n<li><code>marketing<\/code><\/li>\n\n\n\n<li><code>sandbox<\/code><\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Schema creation checklist<\/h2>\n\n\n\n<p>Create a schema when:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>you want a sub-domain inside a catalog<\/li>\n\n\n\n<li>you want lifecycle stages like raw\/staging\/analytics<\/li>\n\n\n\n<li>you want a team-specific area under a catalog<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Table creation checklist<\/h2>\n\n\n\n<p>Create a table when:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>you have actual structured data to manage and query<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">Part 16: Where the data is stored in serverless versus classic setups<\/h1>\n\n\n\n<h2 class=\"wp-block-heading\">Serverless-first example<\/h2>\n\n\n\n<pre class=\"wp-block-code\"><code>flowchart LR\n    U&#91;User] --&gt; W&#91;Serverless Workspace]\n    W --&gt; SN&#91;Serverless Notebook Compute]\n    W --&gt; M&#91;Metastore]\n    M --&gt; C&#91;Default Catalog]\n    C --&gt; T&#91;Managed Table]\n    T --&gt; DS&#91;Databricks Default Storage or Customer Cloud Storage]\n<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Typical behavior<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>workspace is serverless<\/li>\n\n\n\n<li>compute is Databricks-managed<\/li>\n\n\n\n<li>default catalog may use default storage<\/li>\n\n\n\n<li>additional catalogs may use customer cloud storage<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Classic AWS example<\/h2>\n\n\n\n<pre class=\"wp-block-code\"><code>flowchart LR\n    U&#91;User] --&gt; W&#91;Classic Workspace]\n    W --&gt; CL&#91;Classic Compute in AWS Account]\n    W --&gt; M&#91;Metastore]\n    M --&gt; C&#91;Catalog]\n    C --&gt; T&#91;Managed or External Table]\n    T --&gt; S3&#91;S3 in Customer AWS Account]\n<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Typical behavior<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>workspace exists in Databricks environment<\/li>\n\n\n\n<li>classic compute runs in customer AWS account<\/li>\n\n\n\n<li>data commonly lives in S3<\/li>\n\n\n\n<li>metastore governs access to that data<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Azure example<\/h2>\n\n\n\n<pre class=\"wp-block-code\"><code>flowchart LR\n    U&#91;User] --&gt; W&#91;Azure Databricks Workspace]\n    W --&gt; CL&#91;Classic or Serverless Compute]\n    W --&gt; M&#91;Metastore]\n    M --&gt; C&#91;Catalog]\n    C --&gt; T&#91;Managed or External Table]\n    T --&gt; ADLS&#91;Azure Data Lake Storage]\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">Part 17: Common misunderstandings and the correct version<\/h1>\n\n\n\n<h2 class=\"wp-block-heading\">Misunderstanding 1<\/h2>\n\n\n\n<p>&#8220;Workspace stores everything, so why do I need metastore?&#8221;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Correct version<\/h3>\n\n\n\n<p>Workspace stores the working environment and some workspace assets.<br>Unity Catalog metastore governs the enterprise data namespace and permissions.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Misunderstanding 2<\/h2>\n\n\n\n<p>&#8220;If I create a workspace, a separate metastore must be created for it.&#8221;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Correct version<\/h3>\n\n\n\n<p>Not always.<br>A workspace is often assigned to an existing regional metastore.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Misunderstanding 3<\/h2>\n\n\n\n<p>&#8220;If I use serverless, then Unity Catalog is not needed.&#8221;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Correct version<\/h3>\n\n\n\n<p>Serverless is a compute model.<br>Unity Catalog is the governance model.<br>They solve different problems.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Misunderstanding 4<\/h2>\n\n\n\n<p>&#8220;Metastore contains all actual data files.&#8221;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Correct version<\/h3>\n\n\n\n<p>Metastore contains metadata and governance.<br>Actual data files live in object storage.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Misunderstanding 5<\/h2>\n\n\n\n<p>&#8220;If I name a usage policy 5usd, Databricks will stop at 5 USD.&#8221;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Correct version<\/h3>\n\n\n\n<p>A serverless usage policy is mainly for attribution and tagging, not an automatic hard budget cap by default.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">Part 18: Cost-conscious learning guide<\/h1>\n\n\n\n<p>If you are learning Databricks and want low cost, this is the safest path.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">For cheapest learning<\/h2>\n\n\n\n<p>Use:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>one workspace already provided by your org<\/li>\n\n\n\n<li>the existing metastore already assigned<\/li>\n\n\n\n<li>a sandbox catalog or schema if allowed<\/li>\n\n\n\n<li>tiny sample datasets<\/li>\n\n\n\n<li>minimal notebook runtime<\/li>\n<\/ul>\n\n\n\n<p>Avoid creating:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>unnecessary workspaces<\/li>\n\n\n\n<li>unnecessary metastores<\/li>\n\n\n\n<li>extra SQL warehouses<\/li>\n\n\n\n<li>large classic clusters<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Prefer for learning<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>serverless notebook with tiny sample data<\/li>\n\n\n\n<li>very short sessions<\/li>\n\n\n\n<li>notebook SQL instead of starting many tools<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Avoid for learning<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>large data scans<\/li>\n\n\n\n<li>long-running notebooks<\/li>\n\n\n\n<li>large warehouses<\/li>\n\n\n\n<li>repeated run-all sessions<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">Part 19: Adoption guide for organizations<\/h1>\n\n\n\n<h2 class=\"wp-block-heading\">Stage 1: Small team adoption<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Typical pattern<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>one workspace<\/li>\n\n\n\n<li>one regional metastore<\/li>\n\n\n\n<li>one or two catalogs<\/li>\n\n\n\n<li>mostly serverless or personal compute<\/li>\n\n\n\n<li>small governance model<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Good catalog design<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>sandbox<\/code><\/li>\n\n\n\n<li><code>shared<\/code><\/li>\n\n\n\n<li><code>analytics<\/code><\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Why this works<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>simple<\/li>\n\n\n\n<li>low friction<\/li>\n\n\n\n<li>fast onboarding<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Stage 2: Department adoption<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Typical pattern<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>dev and prod workspaces<\/li>\n\n\n\n<li>shared regional metastore<\/li>\n\n\n\n<li>business-domain catalogs<\/li>\n\n\n\n<li>stronger permissions<\/li>\n\n\n\n<li>serverless plus some classic jobs<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Good catalog design<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>finance<\/code><\/li>\n\n\n\n<li><code>sales<\/code><\/li>\n\n\n\n<li><code>marketing<\/code><\/li>\n\n\n\n<li><code>ml<\/code><\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Why this works<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>separates ownership by business area<\/li>\n\n\n\n<li>still keeps centralized governance<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Stage 3: Enterprise platform adoption<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Typical pattern<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>multiple workspaces by environment and team<\/li>\n\n\n\n<li>one metastore per region<\/li>\n\n\n\n<li>clear storage architecture<\/li>\n\n\n\n<li>managed tables for some workloads, external tables for others<\/li>\n\n\n\n<li>standardized compute policies<\/li>\n\n\n\n<li>strong FinOps and security practices<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Good catalog design<\/h3>\n\n\n\n<p>Based on data domain, environment, regulatory boundary, or platform standards.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Why this works<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>scalable governance<\/li>\n\n\n\n<li>supports multiple teams<\/li>\n\n\n\n<li>avoids chaos from workspace-by-workspace data silos<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">Part 20: Recommended decision guide<\/h1>\n\n\n\n<h2 class=\"wp-block-heading\">Choose serverless workspace when<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>you want fast setup<\/li>\n\n\n\n<li>you want less infrastructure work<\/li>\n\n\n\n<li>your workloads fit serverless-supported patterns<\/li>\n\n\n\n<li>you want easy training or sandbox environments<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Choose classic-heavy setup when<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>you need more infrastructure control<\/li>\n\n\n\n<li>your org already has a strong cloud landing zone<\/li>\n\n\n\n<li>you require customer-owned networking and storage patterns<\/li>\n\n\n\n<li>special workloads need custom compute behavior<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Choose one metastore per region when<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>you want standard Databricks governance practice<\/li>\n\n\n\n<li>multiple workspaces in that region should share the same namespace<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Create a new catalog when<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>a major data domain needs separation<\/li>\n\n\n\n<li>ownership needs to be clear<\/li>\n\n\n\n<li>storage or governance boundaries differ<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Create a new schema when<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>you need a logical sub-area under a catalog<\/li>\n\n\n\n<li>raw\/curated\/analytics separation is needed<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Create a new table when<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>you have actual data to store or register<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">Part 21: Beginner tutorial path<\/h1>\n\n\n\n<h2 class=\"wp-block-heading\">Tutorial 1: Understand the layers<\/h2>\n\n\n\n<p>Answer these five questions in your own environment:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>What workspace am I using?<\/li>\n\n\n\n<li>Is Unity Catalog enabled?<\/li>\n\n\n\n<li>Which metastore is attached?<\/li>\n\n\n\n<li>Which catalogs exist?<\/li>\n\n\n\n<li>Which compute types can I use?<\/li>\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading\">Tutorial 2: Find the data hierarchy<\/h2>\n\n\n\n<p>Try to identify:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>one catalog<\/li>\n\n\n\n<li>one schema<\/li>\n\n\n\n<li>one table<\/li>\n<\/ul>\n\n\n\n<p>Example:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>catalog = <code>samples<\/code><\/li>\n\n\n\n<li>schema = <code>nyctaxi<\/code><\/li>\n\n\n\n<li>table = <code>trips<\/code><\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Tutorial 3: Create a simple sandbox structure<\/h2>\n\n\n\n<p>If your permissions allow it:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>create catalog <code>sandbox<\/code><\/li>\n\n\n\n<li>create schema <code>rajesh<\/code><\/li>\n\n\n\n<li>create a small test table<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Tutorial 4: Learn managed versus external<\/h2>\n\n\n\n<p>Practice with:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>one managed table<\/li>\n\n\n\n<li>one external table<\/li>\n<\/ul>\n\n\n\n<p>Observe:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>where each one points<\/li>\n\n\n\n<li>who controls the storage path<\/li>\n\n\n\n<li>how permissions are applied<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Tutorial 5: Understand cost practically<\/h2>\n\n\n\n<p>Run:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>one tiny serverless notebook query<\/li>\n\n\n\n<li>one tiny SQL warehouse query<\/li>\n<\/ul>\n\n\n\n<p>Then compare:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>which compute was used<\/li>\n\n\n\n<li>which billing view records it<\/li>\n\n\n\n<li>what tags or usage policies appeared<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">Part 22: A complete end-to-end example<\/h1>\n\n\n\n<p>Imagine a company called Acme.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Their setup<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Region: <code>us-east-1<\/code><\/li>\n\n\n\n<li>Workspaces: <code>dev<\/code>, <code>prod<\/code><\/li>\n\n\n\n<li>One regional metastore<\/li>\n\n\n\n<li>Catalogs: <code>finance<\/code>, <code>sales<\/code>, <code>sandbox<\/code><\/li>\n\n\n\n<li>Schemas under <code>sales<\/code>: <code>raw<\/code>, <code>analytics<\/code><\/li>\n\n\n\n<li>Table: <code>sales.analytics.orders<\/code><\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">How it works<\/h2>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Users log into the <code>dev<\/code> workspace.<\/li>\n\n\n\n<li>They attach a notebook to serverless compute.<\/li>\n\n\n\n<li>The workspace is already linked to the regional metastore.<\/li>\n\n\n\n<li>The metastore exposes the <code>sales<\/code> catalog.<\/li>\n\n\n\n<li>Inside that catalog, they query <code>sales.analytics.orders<\/code>.<\/li>\n\n\n\n<li>Unity Catalog checks permissions.<\/li>\n\n\n\n<li>The actual table files are read from object storage.<\/li>\n\n\n\n<li>Compute cost is generated by the serverless notebook run.<\/li>\n\n\n\n<li>Storage cost is generated by the cloud storage used for the table files.<\/li>\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading\">Why this is powerful<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>workspace gives user experience<\/li>\n\n\n\n<li>compute gives execution<\/li>\n\n\n\n<li>metastore gives governance<\/li>\n\n\n\n<li>storage gives persistence<\/li>\n<\/ul>\n\n\n\n<p>All four layers work together.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">Part 23: The shortest version possible<\/h1>\n\n\n\n<p>If you remember only this, remember this:<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The core model<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Workspace<\/strong> = where users work<\/li>\n\n\n\n<li><strong>Compute<\/strong> = where code runs<\/li>\n\n\n\n<li><strong>Unity Catalog<\/strong> = the governance system<\/li>\n\n\n\n<li><strong>Metastore<\/strong> = the root of that governance system<\/li>\n\n\n\n<li><strong>Catalog \/ Schema \/ Table<\/strong> = the data organization hierarchy<\/li>\n\n\n\n<li><strong>Storage<\/strong> = where the actual files live<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">The cost model<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>governance objects themselves are usually not the main cost<\/li>\n\n\n\n<li>compute and storage are the real cost drivers<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">The cloud model<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>serverless = more Databricks-managed<\/li>\n\n\n\n<li>classic = more customer-cloud-managed<\/li>\n\n\n\n<li>data can live in Databricks default storage or in customer cloud storage depending on design<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">Part 24: Final recommended best practices<\/h1>\n\n\n\n<h2 class=\"wp-block-heading\">For individuals learning Databricks<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>do not create new metastores unless required<\/li>\n\n\n\n<li>use the existing workspace and metastore<\/li>\n\n\n\n<li>work in a sandbox catalog\/schema<\/li>\n\n\n\n<li>use small datasets and short notebook sessions<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">For small teams<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>keep one regional metastore<\/li>\n\n\n\n<li>define a simple catalog strategy early<\/li>\n\n\n\n<li>use serverless first where possible<\/li>\n\n\n\n<li>avoid over-engineering storage on day one<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">For enterprises<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>standardize catalog naming<\/li>\n\n\n\n<li>design storage intentionally<\/li>\n\n\n\n<li>document when to use managed versus external<\/li>\n\n\n\n<li>treat one metastore per region as the default starting point<\/li>\n\n\n\n<li>use multiple workspaces where admin boundaries or lifecycle differences are needed<\/li>\n\n\n\n<li>use FinOps tagging and usage monitoring early<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">Final summary<\/h1>\n\n\n\n<p>Databricks has several layers that work together:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Workspace<\/strong> gives users a place to work.<\/li>\n\n\n\n<li><strong>Compute<\/strong> runs notebooks, jobs, and queries.<\/li>\n\n\n\n<li><strong>Unity Catalog<\/strong> governs data and AI assets.<\/li>\n\n\n\n<li><strong>Metastore<\/strong> is the top-level root of Unity Catalog.<\/li>\n\n\n\n<li><strong>Catalogs, schemas, and tables<\/strong> organize data.<\/li>\n\n\n\n<li><strong>Storage<\/strong> holds the actual data files.<\/li>\n<\/ol>\n\n\n\n<p>When people get confused, it is usually because they mix:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>working environment<\/li>\n\n\n\n<li>governance<\/li>\n\n\n\n<li>storage<\/li>\n\n\n\n<li>compute<\/li>\n\n\n\n<li>cloud ownership<\/li>\n<\/ul>\n\n\n\n<p>Once you keep those separate, the whole Databricks architecture becomes much easier to understand.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">One-page cheat sheet<\/h1>\n\n\n\n<h2 class=\"wp-block-heading\">What it is<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Workspace = office<\/li>\n\n\n\n<li>Compute = engine<\/li>\n\n\n\n<li>Unity Catalog = rulebook<\/li>\n\n\n\n<li>Metastore = root directory<\/li>\n\n\n\n<li>Catalog = department folder<\/li>\n\n\n\n<li>Schema = subfolder<\/li>\n\n\n\n<li>Table = dataset<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Where it lives<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Workspace = Databricks environment<\/li>\n\n\n\n<li>Metastore = Databricks account-level governance object<\/li>\n\n\n\n<li>Table files = object storage<\/li>\n\n\n\n<li>Compute = Databricks-managed serverless or customer-cloud classic<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">What costs money<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>compute<\/li>\n\n\n\n<li>storage<\/li>\n\n\n\n<li>networking<\/li>\n\n\n\n<li>some platform usage<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">What usually does not matter as a direct cost by itself<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>catalog<\/li>\n\n\n\n<li>schema<\/li>\n\n\n\n<li>metastore object itself<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Who this guide is for This guide is for people who are asking questions like: This guide explains the concepts [&hellip;]<\/p>\n","protected":false},"author":4,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-2905","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/aiopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/2905","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/aiopsschool.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/aiopsschool.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/aiopsschool.com\/blog\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/aiopsschool.com\/blog\/wp-json\/wp\/v2\/comments?post=2905"}],"version-history":[{"count":1,"href":"https:\/\/aiopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/2905\/revisions"}],"predecessor-version":[{"id":2906,"href":"https:\/\/aiopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/2905\/revisions\/2906"}],"wp:attachment":[{"href":"https:\/\/aiopsschool.com\/blog\/wp-json\/wp\/v2\/media?parent=2905"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/aiopsschool.com\/blog\/wp-json\/wp\/v2\/categories?post=2905"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/aiopsschool.com\/blog\/wp-json\/wp\/v2\/tags?post=2905"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}