cloud-architecture

Published: Apr 3, 2026

Data Strategy for Cloud Migrations: A Platform Engineer's Playbook

12 min read·Tags: cloud-migration, data-strategy, platform-engineering, terraform, data-pipelines

Data Strategy for Cloud Migrations: A Platform Engineer's Playbook

Cloud migration projects fail more often at the data layer than anywhere else. Networking, compute, and IAM get thorough attention — but data is often treated as an afterthought, moved in bulk the night before cutover, and prayed over. This guide exists to change that pattern.

Whether you're lifting a 50TB data warehouse from on-prem Oracle to BigQuery, re-platforming a Kafka estate from bare metal to Amazon MSK, or migrating a fleet of Spark ETL jobs to Databricks on Azure, the underlying data strategy questions remain the same: When do you move what? How do you validate it? And what's your rollback plan?

The Four Phases of a Data Migration

Before writing a single line of Terraform, map your migration to four discrete phases. Skipping phases is how projects end up with phantom data loss at 2 AM.

Loading diagram...

Phase 1 — Inventory & Classification

Every byte of data your systems produce falls into one of four categories:

Classification	Description	Migration Risk	Example
Hot	Actively read/written, latency-sensitive	High	OLTP tables, event streams
Warm	Read frequently, written in batch	Medium	Aggregated reports, feature stores
Cold	Archived, rarely read	Low	Compliance archives, raw event logs
Transient	Cache, temp tables, in-flight state	N/A (rebuild)	Redis caches, Kafka consumer offsets

Classify before you move anything. Hot data needs a live replication strategy. Cold data can be bulk-copied off-hours. Transient data is rebuilt on the target.

Use a combination of query logs, column-level lineage tools, and manual interviews with data consumers to produce this inventory. Harbinger Explorer can accelerate this by scanning metadata across multi-cloud estates and surfacing dependency graphs automatically.

Phase 2 — Dual-Write & Shadow Mode

For hot data, never hard-cutover. Instead, enter a dual-write phase where writes land on both source and target systems simultaneously, and reads continue from the source.

# Example: Debezium CDC connector for dual-write shadow replication
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaConnector
metadata:
  name: orders-cdc-shadow
  labels:
    strimzi.io/cluster: migration-connect
spec:
  class: io.debezium.connector.postgresql.PostgresConnector
  tasksMax: 4
  config:
    database.hostname: source-postgres.internal
    database.port: "5432"
    database.user: debezium_reader
    database.password: ${env:DEBEZIUM_PASSWORD}
    database.dbname: orders
    database.server.name: orders_shadow
    table.include.list: public.orders,public.order_items,public.customers
    slot.name: debezium_shadow_slot
    publication.autocreate.mode: filtered
    # Write to shadow topic for target ingestion
    topic.prefix: shadow.migration
    transforms: Reroute
    transforms.Reroute.type: io.debezium.transforms.ByLogicalTableRouter
    transforms.Reroute.topic.regex: "shadow\.migration\.public\.(.*)"
    transforms.Reroute.topic.replacement: "target.ingest.$1"

During shadow mode, run reconciliation jobs on a schedule (hourly minimum) comparing row counts, checksums, and sample record comparisons between source and target.

Phase 3 — Cutover & Validation

Cutover is not a moment — it's a window. Define it explicitly in your runbook:

#!/bin/bash
# migration-cutover.sh — execute in a tmux session with logging

set -euo pipefail
LOG_FILE="/var/log/migration/cutover-$(date +%Y%m%d-%H%M%S).log"

echo "=== CUTOVER START: $(date -u) ===" | tee -a $LOG_FILE

# 1. Drain write traffic to source
echo "Step 1: Enabling write-drain flag in feature flag service..." | tee -a $LOG_FILE
curl -X PATCH https://flags.internal/v1/flags/db_write_drain   -H "Content-Type: application/json"   -d '{"enabled": true}' | tee -a $LOG_FILE

# 2. Wait for in-flight transactions to settle
echo "Step 2: Waiting 30s for in-flight writes..." | tee -a $LOG_FILE
sleep 30

# 3. Final reconciliation check
echo "Step 3: Running final reconciliation..." | tee -a $LOG_FILE
python3 /opt/migration/reconcile.py --source postgres://source-db --target bigquery://project/dataset --fail-on-diff

# 4. Switch DNS / connection strings
echo "Step 4: Updating connection string secret in Vault..." | tee -a $LOG_FILE
vault kv put secret/db/orders   connection_string="postgresql://target-db.internal:5432/orders"   migrated_at="$(date -u +%Y-%m-%dT%H:%M:%SZ)"

# 5. Enable reads from target
echo "Step 5: Flipping read flag..." | tee -a $LOG_FILE
curl -X PATCH https://flags.internal/v1/flags/db_read_source   -H "Content-Type: application/json"   -d '{"enabled": false}'

echo "=== CUTOVER COMPLETE: $(date -u) ===" | tee -a $LOG_FILE

Phase 4 — Decommission & Observability

Don't decommission source systems until you have 30 days of clean production data flowing through the target. Set up cross-system observability:

# Terraform: CloudWatch metric alarms for post-migration data quality
resource "aws_cloudwatch_metric_alarm" "data_freshness" {
  alarm_name          = "migration-data-freshness-breach"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 2
  metric_name         = "MaxAgeMinutes"
  namespace           = "DataPlatform/Migration"
  period              = 300
  statistic           = "Maximum"
  threshold           = 15
  alarm_description   = "Data freshness degraded post-migration — possible pipeline stall"
  alarm_actions       = [aws_sns_topic.oncall.arn]

  dimensions = {
    Dataset = "orders"
    Stage   = "production"
  }
}

Schema Evolution Strategy

Schema changes during migration are a multiplying complexity factor. Every schema migration becomes three problems: the source schema, the migration mapping, and the target schema.

Use a Schema Registry

Whether you're on Avro, Protobuf, or JSON Schema, run a schema registry on both sides of the migration and enforce backward compatibility:

# confluent-schema-registry config snippet
schema.compatibility.level: BACKWARD_TRANSITIVE

BACKWARD_TRANSITIVE means any new schema version can be read by all old consumers — critical when source and target consumers coexist during shadow mode.

Column Mapping Patterns

Source Pattern	Target Pattern	Migration Tool
Camel case columns	Snake case	dbt `rename` macro
Implicit nullability	Explicit NOT NULL	Schema migration script
NUMERIC(18,4)	DECIMAL(18,4)	Type casting in Spark
Timestamp with TZ	UTC-normalized TIMESTAMP	Spark `from_utc_timestamp`
Composite PK	Surrogate key + composite index	dbt snapshot

Data Validation Framework

The gold standard is a three-tier validation approach:

Structural — Schema matches, no missing columns, types compatible
Statistical — Row counts, null rates, value distributions within tolerance
Semantic — Business rules hold (e.g., order total = sum of line items)

# Lightweight reconciliation using PySpark (Great Expectations alternative)
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, count, sum as spark_sum, abs as spark_abs

spark = SparkSession.builder.appName("MigrationReconcile").getOrCreate()

source_df = spark.read.format("jdbc").options(
    url="jdbc:postgresql://source-db:5432/orders",
    dbtable="public.orders",
    user="reader",
    password=dbutils.secrets.get("migration", "source-db-password")
).load()

target_df = spark.read.format("bigquery").option("table", "project.dataset.orders").load()

# Structural check
assert set(source_df.columns) == set(target_df.columns), "Column mismatch!"

# Statistical check
source_stats = source_df.agg(
    count("*").alias("row_count"),
    spark_sum("total_amount").alias("total_amount_sum")
).collect()[0]

target_stats = target_df.agg(
    count("*").alias("row_count"),
    spark_sum("total_amount").alias("total_amount_sum")
).collect()[0]

tolerance = 0.001  # 0.1% tolerance
row_diff_pct = abs(source_stats["row_count"] - target_stats["row_count"]) / source_stats["row_count"]
sum_diff_pct = abs(source_stats["total_amount_sum"] - target_stats["total_amount_sum"]) / source_stats["total_amount_sum"]

assert row_diff_pct < tolerance, f"Row count divergence: {row_diff_pct:.4%}"
assert sum_diff_pct < tolerance, f"Sum divergence: {sum_diff_pct:.4%}"

print("✅ Reconciliation passed")

Rollback Planning

Every migration phase needs a rollback procedure documented before cutover begins. A rollback that hasn't been rehearsed in a staging environment is not a rollback plan — it's a wish.

Phase	Rollback Trigger	Rollback Action	RTO
Shadow mode	Replication lag > 5min	Disable CDC, fix connector	10min
Cutover	Error rate > 1%	Revert feature flags	2min
Post-cutover	Data quality breach	Re-enable source, re-open shadow	15min

Observability Stack for Migration Projects

Post-migration, your observability should answer: "Is the new platform delivering data with the same quality and freshness as the old one?"

Instrument three signal types:

Pipeline latency — p50/p95/p99 end-to-end job duration
Data freshness — max age of the latest record in critical tables
Error rate — failed job runs as a percentage of total runs

If you're managing multiple migrated workloads across teams, a platform-level view becomes essential. Tools like Harbinger Explorer give you a unified operational view across cloud data assets without requiring per-team instrumentation overhead.

Conclusion

A cloud migration data strategy isn't a one-time document — it's a living operational practice spanning months of careful, phased execution. The teams that succeed treat data migration as a product delivery: they define acceptance criteria, run automated validation, and plan for failure.

The key takeaways:

Classify data before moving any of it
Use dual-write shadow mode for hot data; never hard-cutover
Automate reconciliation — manual spot checks don't scale
Define rollback procedures and rehearse them
Stay in observability mode for 30 days post-cutover before decommissioning

Try Harbinger Explorer free for 7 days — get unified visibility across your cloud data estate, track migration progress across teams, and catch data quality issues before they reach production.

View all articles

Try Harbinger Explorer for free

Connect any API, upload files, and explore with AI — all in your browser. No credit card required.

Start Free Trial

Harbinger Explorer

Data Strategy for Cloud Migrations: A Platform Engineer's Playbook

Data Strategy for Cloud Migrations: A Platform Engineer's Playbook

The Four Phases of a Data Migration

Phase 1 — Inventory & Classification

Phase 2 — Dual-Write & Shadow Mode

Phase 3 — Cutover & Validation

Phase 4 — Decommission & Observability

Schema Evolution Strategy

Use a Schema Registry

Column Mapping Patterns

Data Validation Framework

Rollback Planning

Observability Stack for Migration Projects

Conclusion

Continue Reading

GDPR Compliance for Cloud Data Platforms: A Technical Deep Dive

Cloud Cost Allocation Strategies for Data Teams

API Gateway Architecture Patterns for Data Platforms

Try Harbinger Explorer for free

Data Strategy for Cloud Migrations: A Platform Engineer's Playbook

The Four Phases of a Data Migration

Phase 1 — Inventory & Classification

Phase 2 — Dual-Write & Shadow Mode

Phase 3 — Cutover & Validation

Phase 4 — Decommission & Observability

Schema Evolution Strategy

Use a Schema Registry

Column Mapping Patterns

Data Validation Framework

Rollback Planning

Observability Stack for Migration Projects

Conclusion

Continue Reading

GDPR Compliance for Cloud Data Platforms: A Technical Deep Dive

Cloud Cost Allocation Strategies for Data Teams

API Gateway Architecture Patterns for Data Platforms

Try Harbinger Explorer for free

Command Palette