databricks

Published: Apr 3, 2026

Databricks Cluster Policies for Cost Control: A Practical Guide

10 min read·Tags: databricks, cost-optimization, cluster-policies, finops, governance

Databricks Cluster Policies for Cost Control: A Practical Guide

Databricks is powerful. It's also remarkably easy to accidentally spin up a 32-node Standard_E64s_v3 cluster and forget to terminate it. Anyone who's managed a Databricks workspace for more than a month has a story like this.

Cluster policies are Databricks' mechanism for preventing these expensive mistakes while still giving your team the flexibility they need. Done right, they're invisible guardrails — your engineers can work freely within a safe boundary.

What Are Cluster Policies?

A cluster policy is a JSON document that constrains the values users can set when creating or editing clusters. Policies can:

Fix a value (e.g., always enable auto-termination)
Limit a value to a range or list (e.g., max 8 workers)
Set defaults (e.g., default to spot instances)
Hide fields from the UI (simplify the creation experience)
Require specific values (e.g., must tag clusters with a cost center)

Policies are assigned to users, groups, or service principals. A user can only create clusters using policies they have access to (unless they're a workspace admin).

Why Cluster Policies Matter (With Numbers)

Consider a team of 10 data engineers. Without policies:

Scenario	Config	Hourly Cost
Overpowered dev cluster	8x Standard_E16s_v3 (on-demand)	~$18/hr
Forgotten overnight cluster	4x Standard_DS3_v2 (on-demand)	~$30 total
Production job over-provisioned	16x Standard_E32s_v3	~$60/hr

A single forgotten cluster running for a weekend = ~$150 in waste. Multiply by 10 engineers over a year, and you're looking at thousands in avoidable spend.

With cluster policies enforcing auto-termination and spot instances:

Auto-terminates after 30 min = ~$1.50 wasted instead of $30
Uses spot instances = ~60% cheaper baseline

Policy Definition Language

Policies are JSON documents. Each attribute maps to a constraint type:

{
  "attribute_name": {
    "type": "fixed | range | allowlist | blocklist | regex | unlimited | forbidden",
    "value": "...",
    "minValue": 0,
    "maxValue": 10,
    "values": [],
    "defaultValue": "...",
    "hidden": true
  }
}

Policy Examples

1. Basic Cost Control Policy (for all data engineers)

{
  "autotermination_minutes": {
    "type": "range",
    "minValue": 10,
    "maxValue": 120,
    "defaultValue": 30
  },
  "num_workers": {
    "type": "range",
    "minValue": 1,
    "maxValue": 8
  },
  "node_type_id": {
    "type": "allowlist",
    "values": [
      "Standard_DS3_v2",
      "Standard_DS4_v2",
      "Standard_DS5_v2",
      "Standard_E4s_v3",
      "Standard_E8s_v3"
    ],
    "defaultValue": "Standard_DS3_v2"
  },
  "azure_attributes.availability": {
    "type": "fixed",
    "value": "SPOT_WITH_FALLBACK_AZURE",
    "hidden": true
  },
  "spark_version": {
    "type": "regex",
    "pattern": "^(14|15)\\.[0-9]+\\.x-scala2\\.12$",
    "defaultValue": "15.4.x-scala2.12"
  },
  "custom_tags.team": {
    "type": "fixed",
    "value": "data-engineering"
  },
  "custom_tags.cost_center": {
    "type": "allowlist",
    "values": ["harbinger", "research", "infra"]
  }
}

This policy:

Forces auto-termination between 10-120 minutes (default 30)
Limits cluster size to 8 workers max
Restricts to cost-effective instance types
Forces spot instances (transparent to user)
Ensures LTS Spark versions
Requires cost center tagging for chargeback

2. Single-Node Policy (for interactive development)

For lightweight exploration and testing, a single-node policy keeps costs minimal:

{
  "num_workers": {
    "type": "fixed",
    "value": 0,
    "hidden": true
  },
  "spark_conf.spark.databricks.cluster.profile": {
    "type": "fixed",
    "value": "singleNode",
    "hidden": true
  },
  "autotermination_minutes": {
    "type": "fixed",
    "value": 60,
    "hidden": true
  },
  "node_type_id": {
    "type": "allowlist",
    "values": ["Standard_DS3_v2", "Standard_DS4_v2"],
    "defaultValue": "Standard_DS3_v2"
  }
}

3. Production Job Policy (for automated workflows)

Production jobs need reliability over cost savings. This policy allows larger clusters but on spot with fallback:

{
  "num_workers": {
    "type": "range",
    "minValue": 2,
    "maxValue": 32
  },
  "autotermination_minutes": {
    "type": "fixed",
    "value": 0,
    "hidden": true
  },
  "azure_attributes.availability": {
    "type": "fixed",
    "value": "SPOT_WITH_FALLBACK_AZURE",
    "hidden": true
  },
  "azure_attributes.spot_bid_max_price": {
    "type": "fixed",
    "value": -1,
    "hidden": true
  },
  "custom_tags.environment": {
    "type": "fixed",
    "value": "production"
  }
}

4. Unrestricted Policy (for workspace admins only)

Give your platform team full flexibility while still tracking usage:

{
  "custom_tags.team": {
    "type": "fixed",
    "value": "platform"
  }
}

Creating Policies via CLI

# Create a policy from a JSON file
databricks cluster-policies create \
  --name "Data Engineers - Cost Controlled" \
  --definition @policies/engineer_policy.json

# Update an existing policy
databricks cluster-policies edit \
  --policy-id ABCD1234 \
  --name "Data Engineers - Cost Controlled" \
  --definition @policies/engineer_policy.json

# List all policies
databricks cluster-policies list

Creating Policies via Terraform

If you manage Databricks infrastructure with Terraform:

resource "databricks_cluster_policy" "engineer_policy" {
  name = "Data Engineers - Cost Controlled"
  definition = jsonencode({
    "autotermination_minutes" = {
      type         = "range"
      minValue     = 10
      maxValue     = 120
      defaultValue = 30
    }
    "num_workers" = {
      type     = "range"
      minValue = 1
      maxValue = 8
    }
    "azure_attributes.availability" = {
      type   = "fixed"
      value  = "SPOT_WITH_FALLBACK_AZURE"
      hidden = true
    }
    "custom_tags.cost_center" = {
      type   = "allowlist"
      values = ["harbinger", "research", "infra"]
    }
  })
}

resource "databricks_permissions" "engineer_policy_access" {
  cluster_policy_id = databricks_cluster_policy.engineer_policy.id

  access_control {
    group_name       = "data-engineers"
    permission_level = "CAN_USE"
  }
}

Enforcing Policies at Scale

Remove Default Policy Access

By default, workspace users have access to the "Unrestricted" policy. To enforce your custom policies, restrict it to admins only:

databricks permissions set \
  --resource-type cluster-policies \
  --resource-id 0 \
  --access-controls '[{"group_name": "admins", "permission_level": "CAN_USE"}]'

Monitor Policy Compliance

Use Databricks system tables to track cluster creation and policy adherence:

-- Find clusters created without a policy (ungoverned spend)
SELECT
  cluster_id,
  cluster_name,
  created_by,
  create_time,
  node_type_id,
  num_workers,
  autotermination_minutes
FROM system.compute.clusters
WHERE policy_id IS NULL
  AND create_time >= CURRENT_TIMESTAMP - INTERVAL 30 DAYS
ORDER BY create_time DESC;

-- Total DBU consumption by policy
SELECT
  c.policy_id,
  cp.name AS policy_name,
  SUM(u.dbu_consumption) AS total_dbus,
  SUM(u.cloud_cost) AS total_cloud_cost_usd
FROM system.billing.usage u
JOIN system.compute.clusters c ON u.cluster_id = c.cluster_id
LEFT JOIN system.compute.cluster_policies cp ON c.policy_id = cp.policy_id
WHERE u.usage_date >= CURRENT_DATE - INTERVAL 30 DAYS
GROUP BY 1, 2
ORDER BY total_dbus DESC;

Implementing a FinOps Review Process

Cluster policies are a technical control. They work best paired with a lightweight process:

Weekly cost review — query system.billing.usage and share with team leads
Tag enforcement — require cost_center and owner tags; use these for chargeback reports
Policy review cadence — review policy limits quarterly; adjust as team needs change
Alert on spend anomalies — set Databricks SQL alerts on system.billing.usage for unexpected spikes

-- Alert query: daily spend over $50 (adjust threshold for your team)
SELECT
  usage_date,
  SUM(list_price * usage_quantity) AS daily_cost_usd
FROM system.billing.usage
WHERE usage_date = CURRENT_DATE - INTERVAL 1 DAY
HAVING daily_cost_usd > 50;

Policy Hierarchy: Job vs Interactive Clusters

One common confusion: cluster policies apply differently to interactive clusters (created manually) vs job clusters (created by Databricks Workflows).

Interactive clusters — fully governed by policies; users must select a policy
Job clusters — the job definition includes a cluster spec; policy is optional but recommended
Shared job clusters — reuse a running cluster; no policy enforcement at spin-up

For job clusters, enforce policies in your Job definitions and CI/CD pipeline:

{
  "job_clusters": [
    {
      "job_cluster_key": "default",
      "new_cluster": {
        "policy_id": "ABCD1234",
        "spark_version": "15.4.x-scala2.12",
        "num_workers": 4
      }
    }
  ]
}

Measuring the Impact

After implementing policies at a mid-sized team (15 engineers), here's what typically changes:

Metric	Before Policies	After Policies
Avg cluster termination	4.2 hours after last use	28 minutes
Clusters using spot instances	22%	94%
Monthly compute spend	Baseline	35-50% reduction
Ungoverned clusters per month	Uncounted	Less than 5 (admins only)

Wrapping Up

Cluster policies are one of the highest-ROI governance investments you can make in a Databricks workspace. A few hours of setup translates to continuous cost savings, better standardization, and fewer 2am alerts about unexpected cloud bills.

Start with the basics: mandatory auto-termination, spot instances, and size limits. Then layer in tagging requirements and monitoring once the foundation is solid.

At Harbinger Explorer, our Databricks workspace runs fully policy-governed. Every cluster our team spins up is within guardrails — keeping infrastructure costs lean so we can focus on building intelligence, not managing bills.

Try Harbinger Explorer free for 7 days — we practice what we preach. Start your free trial at harbingerexplorer.com.

View all articles

Try Harbinger Explorer for free

Connect any API, upload files, and explore with AI — all in your browser. No credit card required.

Start Free Trial

Harbinger Explorer

Databricks Cluster Policies for Cost Control: A Practical Guide

Databricks Cluster Policies for Cost Control: A Practical Guide

What Are Cluster Policies?

Why Cluster Policies Matter (With Numbers)

Policy Definition Language

Policy Examples

1. Basic Cost Control Policy (for all data engineers)

2. Single-Node Policy (for interactive development)

3. Production Job Policy (for automated workflows)

4. Unrestricted Policy (for workspace admins only)

Creating Policies via CLI

Creating Policies via Terraform

Enforcing Policies at Scale

Remove Default Policy Access

Monitor Policy Compliance

Implementing a FinOps Review Process

Policy Hierarchy: Job vs Interactive Clusters

Measuring the Impact

Wrapping Up

Continue Reading

Databricks Autoloader: The Complete Guide

CI/CD Pipelines for Databricks Projects: A Production-Ready Guide

Databricks Asset Bundles (DABs): The Complete Deployment Guide

Try Harbinger Explorer for free

Databricks Cluster Policies for Cost Control: A Practical Guide

What Are Cluster Policies?

Why Cluster Policies Matter (With Numbers)

Policy Definition Language

Policy Examples

1. Basic Cost Control Policy (for all data engineers)

2. Single-Node Policy (for interactive development)

3. Production Job Policy (for automated workflows)

4. Unrestricted Policy (for workspace admins only)

Creating Policies via CLI

Creating Policies via Terraform

Enforcing Policies at Scale

Remove Default Policy Access

Monitor Policy Compliance

Implementing a FinOps Review Process

Policy Hierarchy: Job vs Interactive Clusters

Measuring the Impact

Wrapping Up

Continue Reading

Databricks Autoloader: The Complete Guide

CI/CD Pipelines for Databricks Projects: A Production-Ready Guide

Databricks Asset Bundles (DABs): The Complete Deployment Guide

Try Harbinger Explorer for free

Command Palette