Harbinger Explorer

Back to Knowledge Hub
databricks

Databricks Legacy Sunset: DBFS, Hive Metastore & What Replaces Them

7 min read·Tags: databricks, unity-catalog, migration, delta-lake, best-practices, dbfs, hive-metastore, breaking-changes

If you created a new Databricks account after December 18, 2025, you already know: DBFS root, mounts, the legacy Hive Metastore, and no-isolation shared clusters are gone. For everyone else running existing workspaces, the clock is ticking — Databricks has shipped the "Disable legacy features" account setting and the direction is clear. This guide covers every deprecated feature, its replacement, and the exact code changes your team needs to make.

What Changed on December 18, 2025

Databricks announced in the December 2025 release notes that all new accounts created after December 18, 2025 no longer have access to:

Legacy FeatureStatus for New AccountsStatus for Existing Accounts
DBFS root storageRemovedAvailable (opt-out via setting)
DBFS mountsRemovedAvailable (opt-out via setting)
Hive Metastore (HMS)RemovedAvailable (opt-out via setting)
No-isolation shared clustersRemovedAvailable (opt-out via setting)

For existing accounts, workspace admins can flip the "Disable legacy features" toggle in Account Settings → Feature Enablement to proactively remove these features. This is the recommended path — don't wait for Databricks to flip it for you.

Why this matters now: Every pipeline still referencing /mnt/, dbfs:/, or hive_metastore. is on borrowed time. When Databricks eventually enforces this across all accounts, anything not migrated breaks.

DBFS Root → Unity Catalog Volumes

DBFS root was the shared, unmanaged blob storage that every workspace could write to at dbfs:/. It had no access control beyond workspace-level permissions, no lineage, and no governance. Unity Catalog Volumes replace it with governed, namespace-aware file storage.

❌ Deprecated: DBFS Root Access

# Writing files to DBFS root — NO LONGER WORKS on new accounts
dbutils.fs.put("/FileStore/config/pipeline_params.json", json.dumps(params))
df.write.format("parquet").save("dbfs:/output/daily_report/")

# Reading from DBFS root
raw = spark.read.parquet("dbfs:/raw_data/events/")

✅ New Best Practice: Unity Catalog Volumes

# Unity Catalog Volumes — governed, namespace-aware file storage
# Path pattern: /Volumes/<catalog>/<schema>/<volume>/<path>

# Writing files to a managed volume
dbutils.fs.put(
    "/Volumes/prod_catalog/analytics/config/pipeline_params.json",
    json.dumps(params)
)
df.write.format("parquet").save(
    "/Volumes/prod_catalog/analytics/output/daily_report/"
)

# Reading from a volume
raw = spark.read.parquet("/Volumes/prod_catalog/raw/events/")
-- Spark SQL: Create a managed volume
CREATE VOLUME IF NOT EXISTS prod_catalog.analytics.config
COMMENT 'Pipeline configuration files';

-- Create an external volume pointing to cloud storage
CREATE EXTERNAL VOLUME prod_catalog.raw.landing
LOCATION 's3://my-bucket/landing/'
COMMENT 'Raw landing zone from upstream systems';

Key differences: Volumes live under the three-level namespace (catalog.schema.volume), support fine-grained ACLs via Unity Catalog, and are tracked in lineage. External volumes point to existing cloud storage; managed volumes are fully Databricks-managed.

Hive Metastore → Unity Catalog

The workspace-local Hive Metastore had no cross-workspace visibility, no row/column-level security, and no data lineage. Unity Catalog is the replacement — and it's been GA since 2023, so there's no excuse left.

❌ Deprecated: Hive Metastore Tables

-- Spark SQL dialect
-- Tables registered in the legacy Hive Metastore
CREATE TABLE hive_metastore.default.events (
    event_id STRING,
    event_type STRING,
    event_ts TIMESTAMP,
    payload STRING
)
USING DELTA
LOCATION 's3://my-bucket/hive-tables/events/';

-- Querying without catalog prefix defaults to hive_metastore
SELECT * FROM default.events WHERE event_ts > '2025-01-01';

✅ New Best Practice: Unity Catalog Tables

-- Spark SQL dialect
-- Tables registered in Unity Catalog — three-level namespace
CREATE TABLE IF NOT EXISTS prod_catalog.events.raw_events (
    event_id STRING,
    event_type STRING,
    event_ts TIMESTAMP,
    payload STRING
)
USING DELTA
COMMENT 'Raw events from upstream ingestion'
TBLPROPERTIES ('quality' = 'bronze');

-- Always use the full three-level name
SELECT * FROM prod_catalog.events.raw_events
WHERE event_ts > '2025-01-01';
# PySpark: Migrating an existing Hive table to Unity Catalog
# Step 1: Read from old location
old_df = spark.table("hive_metastore.default.events")

# Step 2: Write to Unity Catalog as a managed table
old_df.write.mode("overwrite").saveAsTable("prod_catalog.events.raw_events")

# Step 3: Verify row counts match
old_count = spark.table("hive_metastore.default.events").count()
new_count = spark.table("prod_catalog.events.raw_events").count()
assert old_count == new_count, f"Row count mismatch: {old_count} vs {new_count}"

For large tables, use DEEP CLONE instead of a full rewrite:

-- Spark SQL dialect
-- DEEP CLONE preserves Delta history and is more efficient for large tables
CREATE TABLE prod_catalog.events.raw_events
DEEP CLONE hive_metastore.default.events;

Refer to the Unity Catalog best practices guide for namespace design patterns — a common mistake is creating too many catalogs instead of using schemas for logical separation.

Mounts → External Locations + Storage Credentials

dbutils.fs.mount() was the go-to for connecting cloud storage to Databricks. It stored credentials at the workspace level with no auditability. External Locations and Storage Credentials replace this with centralized, auditable cloud access.

❌ Deprecated: DBFS Mounts

# Mounting an S3 bucket — credentials stored in workspace scope
dbutils.fs.mount(
    source="s3a://my-data-lake/raw/",
    mount_point="/mnt/raw_data",
    extra_configs={
        "fs.s3a.access.key": dbutils.secrets.get("aws", "access_key"),
        "fs.s3a.secret.key": dbutils.secrets.get("aws", "secret_key")
    }
)

# Reading from mount
df = spark.read.parquet("/mnt/raw_data/events/2025/")

✅ New Best Practice: External Locations

-- Spark SQL dialect
-- Step 1: Create a storage credential (done once by admin)
CREATE STORAGE CREDENTIAL IF NOT EXISTS aws_prod_cred
WITH (
    AWS_IAM_ROLE = 'arn:aws:iam::123456789:role/databricks-external-access'
)
COMMENT 'Production S3 access via IAM role';

-- Step 2: Create an external location using that credential
CREATE EXTERNAL LOCATION IF NOT EXISTS raw_landing
URL 's3://my-data-lake/raw/'
WITH (STORAGE CREDENTIAL aws_prod_cred)
COMMENT 'Raw data landing zone';
# Reading from external location — no mount needed
df = spark.read.parquet("s3://my-data-lake/raw/events/2025/")

# Or use an external volume for file-level access
df = spark.read.parquet("/Volumes/prod_catalog/raw/landing/events/2025/")

The advantage: Storage Credentials use IAM roles (AWS) or managed identities (Azure) instead of raw keys. Access is auditable through Unity Catalog audit logs, and permissions can be granted at the catalog/schema/table level.

No-Isolation Shared → Shared Access Mode with Unity Catalog

No-isolation shared clusters ran all users' code in the same process with no credential isolation. A notebook from one user could access another user's secrets. Shared access mode clusters with Unity Catalog fix this with per-user credential isolation.

FeatureNo-Isolation Shared (Legacy)Shared Access Mode (UC)
Credential isolation❌ None✅ Per-user
Unity Catalog support❌ No✅ Full
init scripts✅ Unrestricted⚠️ Restricted (allowlisted)
Arbitrary JARs✅ Allowed❌ Blocked
ML runtime libs✅ Full⚠️ Limited (use single-user for ML)

Migration note: If your workflows depend on custom JARs or unrestricted init scripts, you'll need to move those to single-user clusters. Shared access mode intentionally restricts these for security — that's the point, not a bug.

Time Travel + VACUUM Changes in Runtime 18.0

Runtime 18.0 introduced a subtle but important behavioral change: the default retention period for VACUUM is now enforced more strictly, and Time Travel queries on tables with aggressive vacuum schedules may fail where they previously succeeded.

-- Spark SQL dialect
-- Check your current table retention settings
DESCRIBE DETAIL prod_catalog.events.raw_events;

-- Set explicit retention to avoid surprises
ALTER TABLE prod_catalog.events.raw_events
SET TBLPROPERTIES (
    'delta.logRetentionDuration' = 'interval 30 days',
    'delta.deletedFileRetentionDuration' = 'interval 7 days'
);

-- VACUUM with explicit retention — don't rely on defaults
VACUUM prod_catalog.events.raw_events RETAIN 168 HOURS;

This change extends to serverless compute starting January 2026. If you run VACUUM on serverless SQL warehouses, test your Time Travel queries after upgrading — anything relying on the previous default behavior may break silently.

Migration Checklist

Here's the step-by-step for teams migrating off legacy features. Work through this in order — later steps depend on earlier ones.

Phase 1: Audit (Week 1)

  • Run dbutils.fs.ls("dbfs:/") and catalog everything stored on DBFS root
  • Search all notebooks and repos for /mnt/, dbfs:/, hive_metastore. references
  • List all existing mounts: dbutils.fs.mounts()
  • Inventory all Hive Metastore databases: SHOW DATABASES IN hive_metastore
  • Identify clusters running in no-isolation shared mode
  • Document all init scripts and custom JARs on shared clusters

Phase 2: Set Up Unity Catalog (Week 2)

  • Ensure Unity Catalog is enabled at the account level
  • Design your catalog/schema namespace (see UC best practices)
  • Create Storage Credentials for each cloud storage account
  • Create External Locations for all mount targets
  • Create Volumes for file-based workflows (configs, uploads, exports)

Phase 3: Migrate Data (Week 3-4)

  • DEEP CLONE all Hive Metastore tables to Unity Catalog
  • Verify row counts and schema parity for every migrated table
  • Move DBFS root files to Volumes
  • Update all notebook paths from /mnt/ and dbfs:/ to Volume paths or direct cloud URLs
  • Update all SQL references from hive_metastore.db.table to catalog.schema.table

Phase 4: Cut Over (Week 5)

  • Switch shared clusters from no-isolation to shared access mode
  • Move ML workloads requiring custom JARs to single-user clusters
  • Run integration tests on all migrated pipelines
  • Flip the "Disable legacy features" toggle in account settings
  • Monitor for 1 week, then unmount old mounts and archive DBFS root data

What's Coming in 2026

The Databricks roadmap signals several upcoming changes worth tracking:

  • Governed Tags GA — Tag-based policies for column-level security across catalogs, moving from Public Preview to General Availability
  • Sample Data Explorer GA — Built-in sample datasets for Unity Catalog, replacing the old databricks-datasets DBFS path
  • Delta Sharing URL changes — Share URLs are getting a new format; existing integrations will need updates
  • Serverless VACUUM enforcement — The Runtime 18.0 VACUUM behavior changes fully apply to serverless compute as of January 2026
  • Continued deprecation pressure — Expect Databricks to eventually force-disable legacy features on all accounts, similar to how they handled the DBR 13.x end-of-support

The pattern is clear: Unity Catalog is the only path forward. Every quarter, the gap between legacy and UC-native widens, and the cost of migration goes up. If you're still running Hive Metastore tables in production, the best time to migrate was six months ago. The second best time is now.

If you're looking to explore and validate your migrated datasets quickly, Harbinger Explorer lets you run natural-language queries against CSV exports and uploaded files directly in the browser — useful for spot-checking row counts and schema changes during migration without spinning up a cluster.

Start with the audit. Run SHOW DATABASES IN hive_metastore, count your mounts, and grep your repos for dbfs:/. That's your migration scope — and from there, it's just execution.


Continue Reading

Try Harbinger Explorer for free

Connect any API, upload files, and explore with AI — all in your browser. No credit card required.

Start Free Trial

Command Palette

Search for a command to run...