Secrets Management in Databricks Workspaces: Best Practices and Patterns
Secrets Management in Databricks Workspaces: Best Practices and Patterns
Ask any data engineer about their biggest security regret, and the answer is almost always the same: credentials hardcoded in a notebook, committed to Git, and never rotated. It happens everywhere, and it's entirely avoidable.
Databricks provides a secrets management system that keeps credentials out of your code. This guide covers everything from basic secret scopes to Azure Key Vault integration and access control patterns.
Why You Need a Secrets Strategy
The most common credential exposures in data platforms happen because:
- Notebook-based development — it's easy to just
spark.conf.set("fs.azure.account.key...", "ACTUAL_KEY")in a notebook cell - No code review for notebooks — diff visibility is poor, reviewers miss hardcoded values
- Shared workspaces — one team's notebook accidentally leaks credentials visible to another team
- Logging — stack traces and print statements inadvertently capture credentials
Databricks Secret Scopes solve all of these by providing a centralized, ACL-controlled credential store where secret values are never displayed — even to workspace admins.
Understanding Secret Scopes
A Secret Scope is a named collection of key-value pairs. Two types exist:
- Databricks-managed scopes — secrets stored in Databricks' encrypted backend
- Azure Key Vault-backed scopes — Databricks acts as a proxy to an Azure Key Vault; secrets are stored and managed in Key Vault
When to Use Each
| Feature | Databricks-managed | AKV-backed |
|---|---|---|
| Setup complexity | Low | Medium |
| Rotation support | Manual via API/CLI | Native AKV rotation policies |
| Cross-service access | Databricks only | Any Azure service |
| Audit logging | Databricks audit logs | AKV + Databricks audit logs |
| RBAC granularity | Scope-level ACLs | AKV access policies + Databricks ACLs |
| Cost | Free | AKV transactions (~$0.03/10k ops) |
Recommendation: Use AKV-backed scopes for production. Use Databricks-managed scopes for local development or when AKV isn't available.
Setting Up Databricks-Managed Scopes
Create a Scope
# Using Databricks CLI
databricks secrets create-scope \
--scope harbinger \
--initial-manage-principal users
# Verify
databricks secrets list-scopes
Add Secrets
# Add a secret
databricks secrets put-secret harbinger db_password --string-value "your-password-here"
# From a file (useful for certificates, private keys)
databricks secrets put-secret harbinger ssl_cert --bytes-value @/path/to/cert.pem
# List secrets in a scope (shows keys, NOT values)
databricks secrets list-secrets harbinger
Access in Notebooks and Code
# In a Databricks notebook or job
# Access a secret — value is NEVER printed/logged, even if you try
password = dbutils.secrets.get(scope="harbinger", key="db_password")
# Use it directly in connection strings
jdbc_url = "jdbc:postgresql://mydb.postgres.database.azure.com:5432/harbinger"
df = (
spark.read
.format("jdbc")
.option("url", jdbc_url)
.option("dbtable", "events")
.option("user", dbutils.secrets.get(scope="harbinger", key="db_user"))
.option("password", dbutils.secrets.get(scope="harbinger", key="db_password"))
.load()
)
Important: If you try to print(password), Databricks will display [REDACTED]. The value cannot be extracted from the secure context.
Azure Key Vault-Backed Scopes (Recommended for Production)
Prerequisites
- An Azure Key Vault instance
- A Databricks service principal with
Key Vault Secrets Userrole - The AKV DNS name and Resource ID
Create the Scope via REST API
curl -X POST \
-H "Authorization: Bearer $DATABRICKS_TOKEN" \
-H "Content-Type: application/json" \
https://<workspace-url>/api/2.0/secrets/scopes/create \
-d '{
"scope": "harbinger-kv",
"scope_backend_type": "AZURE_KEYVAULT",
"backend_azure_keyvault": {
"resource_id": "/subscriptions/<sub-id>/resourceGroups/<rg>/providers/Microsoft.KeyVault/vaults/<vault-name>",
"dns_name": "https://<vault-name>.vault.azure.net/"
},
"initial_manage_principal": "users"
}'
Managing Secrets in Key Vault
# Add secrets in AKV (they automatically appear in Databricks)
az keyvault secret set \
--vault-name harbinger-kv \
--name "cosmos-db-key" \
--value "your-cosmos-key-here"
# Set expiration for automatic rotation awareness
az keyvault secret set \
--vault-name harbinger-kv \
--name "api-key-gdelt" \
--value "your-api-key" \
--expires "2025-01-01T00:00:00Z"
Accessing AKV-backed Secrets
The access pattern is identical from the Databricks side:
# Same API, Databricks transparently fetches from AKV
api_key = dbutils.secrets.get(scope="harbinger-kv", key="api-key-gdelt")
cosmos_key = dbutils.secrets.get(scope="harbinger-kv", key="cosmos-db-key")
Access Control (ACLs)
Secret scopes support three permission levels: READ, WRITE, and MANAGE.
# Grant read-only access to a group
databricks secrets put-acl harbinger \
--principal data-engineers \
--permission READ
# Grant write access to a service principal (for CI/CD)
databricks secrets put-acl harbinger \
--principal "harbinger-cicd-sp" \
--permission WRITE
# Grant manage access to admins
databricks secrets put-acl harbinger \
--principal admins \
--permission MANAGE
# View current ACLs
databricks secrets list-acls harbinger
Principle of Least Privilege
Structure your scopes by access pattern:
harbinger-shared/ READ for all data-engineers
- storage-account-key
- shared-api-keys
harbinger-prod/ READ for prod-jobs-sp only
- prod-db-password
- prod-api-keys
harbinger-dev/ READ+WRITE for data-engineers
- dev-db-password
- dev-api-keys
Using Secrets in Databricks Workflows
In Job Clusters via Spark Config
Instead of accessing secrets in code, you can inject them as Spark config at cluster startup:
{
"spark_conf": {
"fs.azure.account.key.harbingerprod.dfs.core.windows.net": "{{secrets/harbinger/storage-account-key}}",
"spark.databricks.delta.preview.enabled": "true"
}
}
This pattern means your application code never calls dbutils.secrets — credentials are injected at the infrastructure level.
In Python Packages (Non-Notebook Code)
When running Python code as a wheel (not a notebook), dbutils isn't automatically available. Import it explicitly:
from pyspark.dbutils import DBUtils
from pyspark.sql import SparkSession
def get_secret(scope: str, key: str) -> str:
spark = SparkSession.getActiveSession()
dbutils = DBUtils(spark)
return dbutils.secrets.get(scope=scope, key=key)
Common Anti-Patterns
Never Hardcode Credentials
# NEVER DO THIS
storage_key = "2XaAbcd1234abcdefghijklmnoqrstuvwxyz=="
spark.conf.set("fs.azure.account.key.mystorage.dfs.core.windows.net", storage_key)
Never Use Widgets for Credentials
Widget values are visible in notebook UI history and run parameters. Never put credentials in widgets.
The Correct Pattern
def build_jdbc_options(scope: str) -> dict:
return {
"url": "jdbc:postgresql://host:5432/db",
"user": dbutils.secrets.get(scope, "db-user"),
"password": dbutils.secrets.get(scope, "db-password"),
"driver": "org.postgresql.Driver"
}
jdbc_opts = build_jdbc_options("harbinger-prod")
df = spark.read.format("jdbc").options(**jdbc_opts).option("dbtable", "events").load()
Rotation and Auditing
Secret Rotation
For Databricks-managed scopes, rotation is manual:
databricks secrets put-secret harbinger db_password --string-value "new-password-after-rotation"
For AKV-backed scopes, set up AKV rotation policies using the Azure CLI:
az keyvault key rotation-policy update \
--vault-name harbinger-kv \
--name api-key-external \
--value @rotation-policy.json
Audit Logging
All secret access is logged in Databricks audit logs. You can query them via system tables:
SELECT
timestamp,
user_identity.email AS user,
request_params.scopeName AS scope,
request_params.key AS secret_key,
response.status_code
FROM system.access.audit
WHERE action_name = 'getSecret'
AND timestamp >= CURRENT_TIMESTAMP - INTERVAL 7 DAYS
ORDER BY timestamp DESC;
Secrets in Databricks Asset Bundles
When using DAB for CI/CD, reference secrets in your bundle config without exposing values:
resources:
jobs:
my_job:
tasks:
- task_key: main
new_cluster:
spark_conf:
fs.azure.account.key.${var.storage_account}.dfs.core.windows.net: "{{secrets/${var.scope}/storage-key}}"
The {{secrets/scope/key}} interpolation is resolved at runtime by Databricks, never stored in your YAML.
Wrapping Up
Good secrets management is non-negotiable for production Databricks workspaces. The combination of AKV-backed scopes, scope-level ACLs, and dbutils.secrets gives you a secure, auditable, rotation-friendly credential management system.
The investment is modest — an hour to set up properly — and the payoff is never having to send a panicked "rotate all credentials" message at 2am.
At Harbinger Explorer, every external API key, database password, and service token lives in Azure Key Vault, accessed via Databricks secret scopes. Zero credentials in code, zero surprises in production.
Try Harbinger Explorer free for 7 days — our platform follows all the security best practices described here, keeping your API credentials and workspace tokens safe. Start your free trial at harbingerexplorer.com.
Continue Reading
Databricks Autoloader: The Complete Guide
CI/CD Pipelines for Databricks Projects: A Production-Ready Guide
Build a robust CI/CD pipeline for your Databricks projects using GitHub Actions, Databricks Asset Bundles, and automated testing. Covers branching strategy, testing, and deployment.
Databricks Cluster Policies for Cost Control: A Practical Guide
Learn how to use Databricks cluster policies to enforce cost guardrails, standardize cluster configurations, and prevent cloud bill surprises without blocking your team's productivity.
Try Harbinger Explorer for free
Connect any API, upload files, and explore with AI — all in your browser. No credit card required.
Start Free Trial