Harbinger Explorer

Back to Knowledge Hub
databricks
Published:

Secrets Management in Databricks Workspaces: Best Practices and Patterns

9 min read·Tags: databricks, security, secrets, azure-key-vault, devops

Secrets Management in Databricks Workspaces: Best Practices and Patterns

Ask any data engineer about their biggest security regret, and the answer is almost always the same: credentials hardcoded in a notebook, committed to Git, and never rotated. It happens everywhere, and it's entirely avoidable.

Databricks provides a secrets management system that keeps credentials out of your code. This guide covers everything from basic secret scopes to Azure Key Vault integration and access control patterns.


Why You Need a Secrets Strategy

The most common credential exposures in data platforms happen because:

  1. Notebook-based development — it's easy to just spark.conf.set("fs.azure.account.key...", "ACTUAL_KEY") in a notebook cell
  2. No code review for notebooks — diff visibility is poor, reviewers miss hardcoded values
  3. Shared workspaces — one team's notebook accidentally leaks credentials visible to another team
  4. Logging — stack traces and print statements inadvertently capture credentials

Databricks Secret Scopes solve all of these by providing a centralized, ACL-controlled credential store where secret values are never displayed — even to workspace admins.


Understanding Secret Scopes

A Secret Scope is a named collection of key-value pairs. Two types exist:

  • Databricks-managed scopes — secrets stored in Databricks' encrypted backend
  • Azure Key Vault-backed scopes — Databricks acts as a proxy to an Azure Key Vault; secrets are stored and managed in Key Vault

When to Use Each

FeatureDatabricks-managedAKV-backed
Setup complexityLowMedium
Rotation supportManual via API/CLINative AKV rotation policies
Cross-service accessDatabricks onlyAny Azure service
Audit loggingDatabricks audit logsAKV + Databricks audit logs
RBAC granularityScope-level ACLsAKV access policies + Databricks ACLs
CostFreeAKV transactions (~$0.03/10k ops)

Recommendation: Use AKV-backed scopes for production. Use Databricks-managed scopes for local development or when AKV isn't available.


Setting Up Databricks-Managed Scopes

Create a Scope

# Using Databricks CLI
databricks secrets create-scope \
  --scope harbinger \
  --initial-manage-principal users

# Verify
databricks secrets list-scopes

Add Secrets

# Add a secret
databricks secrets put-secret harbinger db_password --string-value "your-password-here"

# From a file (useful for certificates, private keys)
databricks secrets put-secret harbinger ssl_cert --bytes-value @/path/to/cert.pem

# List secrets in a scope (shows keys, NOT values)
databricks secrets list-secrets harbinger

Access in Notebooks and Code

# In a Databricks notebook or job
# Access a secret — value is NEVER printed/logged, even if you try
password = dbutils.secrets.get(scope="harbinger", key="db_password")

# Use it directly in connection strings
jdbc_url = "jdbc:postgresql://mydb.postgres.database.azure.com:5432/harbinger"

df = (
    spark.read
        .format("jdbc")
        .option("url", jdbc_url)
        .option("dbtable", "events")
        .option("user", dbutils.secrets.get(scope="harbinger", key="db_user"))
        .option("password", dbutils.secrets.get(scope="harbinger", key="db_password"))
        .load()
)

Important: If you try to print(password), Databricks will display [REDACTED]. The value cannot be extracted from the secure context.


Azure Key Vault-Backed Scopes (Recommended for Production)

Prerequisites

  1. An Azure Key Vault instance
  2. A Databricks service principal with Key Vault Secrets User role
  3. The AKV DNS name and Resource ID

Create the Scope via REST API

curl -X POST \
  -H "Authorization: Bearer $DATABRICKS_TOKEN" \
  -H "Content-Type: application/json" \
  https://<workspace-url>/api/2.0/secrets/scopes/create \
  -d '{
    "scope": "harbinger-kv",
    "scope_backend_type": "AZURE_KEYVAULT",
    "backend_azure_keyvault": {
      "resource_id": "/subscriptions/<sub-id>/resourceGroups/<rg>/providers/Microsoft.KeyVault/vaults/<vault-name>",
      "dns_name": "https://<vault-name>.vault.azure.net/"
    },
    "initial_manage_principal": "users"
  }'

Managing Secrets in Key Vault

# Add secrets in AKV (they automatically appear in Databricks)
az keyvault secret set \
  --vault-name harbinger-kv \
  --name "cosmos-db-key" \
  --value "your-cosmos-key-here"

# Set expiration for automatic rotation awareness
az keyvault secret set \
  --vault-name harbinger-kv \
  --name "api-key-gdelt" \
  --value "your-api-key" \
  --expires "2025-01-01T00:00:00Z"

Accessing AKV-backed Secrets

The access pattern is identical from the Databricks side:

# Same API, Databricks transparently fetches from AKV
api_key = dbutils.secrets.get(scope="harbinger-kv", key="api-key-gdelt")
cosmos_key = dbutils.secrets.get(scope="harbinger-kv", key="cosmos-db-key")

Access Control (ACLs)

Secret scopes support three permission levels: READ, WRITE, and MANAGE.

# Grant read-only access to a group
databricks secrets put-acl harbinger \
  --principal data-engineers \
  --permission READ

# Grant write access to a service principal (for CI/CD)
databricks secrets put-acl harbinger \
  --principal "harbinger-cicd-sp" \
  --permission WRITE

# Grant manage access to admins
databricks secrets put-acl harbinger \
  --principal admins \
  --permission MANAGE

# View current ACLs
databricks secrets list-acls harbinger

Principle of Least Privilege

Structure your scopes by access pattern:

harbinger-shared/     READ for all data-engineers
  - storage-account-key
  - shared-api-keys

harbinger-prod/       READ for prod-jobs-sp only
  - prod-db-password
  - prod-api-keys

harbinger-dev/        READ+WRITE for data-engineers
  - dev-db-password
  - dev-api-keys

Using Secrets in Databricks Workflows

In Job Clusters via Spark Config

Instead of accessing secrets in code, you can inject them as Spark config at cluster startup:

{
  "spark_conf": {
    "fs.azure.account.key.harbingerprod.dfs.core.windows.net": "{{secrets/harbinger/storage-account-key}}",
    "spark.databricks.delta.preview.enabled": "true"
  }
}

This pattern means your application code never calls dbutils.secrets — credentials are injected at the infrastructure level.

In Python Packages (Non-Notebook Code)

When running Python code as a wheel (not a notebook), dbutils isn't automatically available. Import it explicitly:

from pyspark.dbutils import DBUtils
from pyspark.sql import SparkSession

def get_secret(scope: str, key: str) -> str:
    spark = SparkSession.getActiveSession()
    dbutils = DBUtils(spark)
    return dbutils.secrets.get(scope=scope, key=key)

Common Anti-Patterns

Never Hardcode Credentials

# NEVER DO THIS
storage_key = "2XaAbcd1234abcdefghijklmnoqrstuvwxyz=="
spark.conf.set("fs.azure.account.key.mystorage.dfs.core.windows.net", storage_key)

Never Use Widgets for Credentials

Widget values are visible in notebook UI history and run parameters. Never put credentials in widgets.

The Correct Pattern

def build_jdbc_options(scope: str) -> dict:
    return {
        "url": "jdbc:postgresql://host:5432/db",
        "user": dbutils.secrets.get(scope, "db-user"),
        "password": dbutils.secrets.get(scope, "db-password"),
        "driver": "org.postgresql.Driver"
    }

jdbc_opts = build_jdbc_options("harbinger-prod")
df = spark.read.format("jdbc").options(**jdbc_opts).option("dbtable", "events").load()

Rotation and Auditing

Secret Rotation

For Databricks-managed scopes, rotation is manual:

databricks secrets put-secret harbinger db_password --string-value "new-password-after-rotation"

For AKV-backed scopes, set up AKV rotation policies using the Azure CLI:

az keyvault key rotation-policy update \
  --vault-name harbinger-kv \
  --name api-key-external \
  --value @rotation-policy.json

Audit Logging

All secret access is logged in Databricks audit logs. You can query them via system tables:

SELECT
  timestamp,
  user_identity.email AS user,
  request_params.scopeName AS scope,
  request_params.key AS secret_key,
  response.status_code
FROM system.access.audit
WHERE action_name = 'getSecret'
  AND timestamp >= CURRENT_TIMESTAMP - INTERVAL 7 DAYS
ORDER BY timestamp DESC;

Secrets in Databricks Asset Bundles

When using DAB for CI/CD, reference secrets in your bundle config without exposing values:

resources:
  jobs:
    my_job:
      tasks:
        - task_key: main
          new_cluster:
            spark_conf:
              fs.azure.account.key.${var.storage_account}.dfs.core.windows.net: "{{secrets/${var.scope}/storage-key}}"

The {{secrets/scope/key}} interpolation is resolved at runtime by Databricks, never stored in your YAML.


Wrapping Up

Good secrets management is non-negotiable for production Databricks workspaces. The combination of AKV-backed scopes, scope-level ACLs, and dbutils.secrets gives you a secure, auditable, rotation-friendly credential management system.

The investment is modest — an hour to set up properly — and the payoff is never having to send a panicked "rotate all credentials" message at 2am.

At Harbinger Explorer, every external API key, database password, and service token lives in Azure Key Vault, accessed via Databricks secret scopes. Zero credentials in code, zero surprises in production.


Try Harbinger Explorer free for 7 days — our platform follows all the security best practices described here, keeping your API credentials and workspace tokens safe. Start your free trial at harbingerexplorer.com.


Continue Reading

Try Harbinger Explorer for free

Connect any API, upload files, and explore with AI — all in your browser. No credit card required.

Start Free Trial

Command Palette

Search for a command to run...