cloud-architecture

Published: Apr 3, 2026

How to Choose the Right Cloud Database: A Decision Framework for Architects

11 min read·Tags: databases, cloud-architecture, postgresql, dynamodb, bigquery, vector-db

How to Choose the Right Cloud Database: A Decision Framework for Architects

Choosing a cloud database is one of the highest-leverage architectural decisions a platform team makes. Get it right and you have a foundation that scales, performs, and stays maintainable for years. Get it wrong and you're facing a painful, expensive migration at the worst possible time — when your user base is growing.

This guide provides a structured decision framework and deep-dive comparisons across every major database category.

The Decision Framework

Before evaluating specific products, answer these five questions:

Loading diagram...

Category 1: Relational Databases (OLTP)

Best for: Transactional workloads, complex queries, referential integrity, financial data, user accounts.

Managed Options Comparison

Database	Best Cloud	Max Storage	Connection Limits	Managed?
Amazon Aurora PostgreSQL	AWS	Unlimited (serverless v2)	5,000	Full
Cloud SQL PostgreSQL	GCP	64 TB	10,000	Full
Azure Database for PostgreSQL	Azure	64 TB	~1,000	Full
Neon	Multi-cloud	Unlimited	Unlimited (serverless)	Full
PlanetScale	Multi-cloud	Unlimited	Unlimited	Full

When to Use PostgreSQL

PostgreSQL is the default choice for most new applications. It handles:

JSONB: semi-structured data without schema migration
Full-text search: tsvector for lightweight search without Elasticsearch
Time-series: with TimescaleDB extension
Vector search: with pgvector extension
Geospatial: with PostGIS extension

The ability to extend PostgreSQL often delays the need for specialised stores by years.

Terraform: Aurora Serverless v2

resource "aws_rds_cluster" "harbinger" {
  cluster_identifier     = "harbinger-prod"
  engine                 = "aurora-postgresql"
  engine_version         = "15.4"
  database_name          = "harbinger"
  master_username        = "admin"
  manage_master_user_password = true  # Secrets Manager integration
  
  serverlessv2_scaling_configuration {
    min_capacity = 0.5
    max_capacity = 64
  }

  backup_retention_period = 14
  preferred_backup_window = "02:00-03:00"
  
  enabled_cloudwatch_logs_exports = ["postgresql"]
  
  deletion_protection = true
  skip_final_snapshot = false
  final_snapshot_identifier = "harbinger-prod-final"

  tags = {
    Environment = "production"
  }
}

Category 2: NoSQL Document / Key-Value

Best for: High-throughput key-value access, flexible schemas, global distribution, event sourcing.

DynamoDB vs Firestore vs MongoDB Atlas

Feature	DynamoDB	Firestore	MongoDB Atlas
Cloud	AWS only	GCP / multi	Multi-cloud
Pricing model	On-demand / provisioned	Per operation	Instance-based
Query flexibility	Single-table design required	Rich queries	Full MongoDB query API
Real-time subscriptions	Via Streams + Lambda	Native	Change Streams
Global distribution	Global Tables	Native multi-region	Atlas Global Clusters
Secondary indexes	GSI/LSI	Composite indexes	Any field
Max item size	400 KB	1 MB	16 MB

DynamoDB Single-Table Design Pattern

# Storing multiple entity types in one DynamoDB table
# using composite keys and GSIs

table_schema = {
    "TableName": "harbinger-events",
    "KeySchema": [
        {"AttributeName": "PK", "KeyType": "HASH"},   # e.g. COUNTRY#US
        {"AttributeName": "SK", "KeyType": "RANGE"},  # e.g. EVENT#2024-01-15#uuid
    ],
    "GlobalSecondaryIndexes": [
        {
            "IndexName": "EventTypeIndex",
            "KeySchema": [
                {"AttributeName": "GSI1PK", "KeyType": "HASH"},  # EVENT_TYPE#CONFLICT
                {"AttributeName": "GSI1SK", "KeyType": "RANGE"}, # DATE#2024-01-15
            ],
            "Projection": {"ProjectionType": "ALL"},
        }
    ],
    "BillingMode": "PAY_PER_REQUEST",
}

# Access patterns this enables:
# 1. Get all events for a country: PK=COUNTRY#US, SK begins_with EVENT#
# 2. Get events by type: GSI1PK=EVENT_TYPE#CONFLICT, GSI1SK between dates
# 3. Get single event: PK=COUNTRY#US, SK=EVENT#2024-01-15#abc123

Category 3: Time-Series Databases

Best for: Metrics, IoT sensor data, financial tick data, monitoring telemetry.

Time-Series Store Comparison

Store	Write Throughput	Query Language	Retention Policies	Cloud Native
Amazon Timestream	Millions/sec	SQL-like	Automatic tiering	AWS
InfluxDB Cloud	High	Flux / InfluxQL	Configurable	Multi
TimescaleDB	High	Full SQL	SQL-based	Multi
Prometheus	Moderate	PromQL	Configurable	Any K8s
OpenTSDB	Very high	HTTP API	Configurable	Self-hosted

TimescaleDB Hypertable

-- Create hypertable for geopolitical risk scores
CREATE TABLE risk_scores (
    time        TIMESTAMPTZ NOT NULL,
    country     TEXT NOT NULL,
    risk_score  FLOAT NOT NULL,
    event_count INTEGER,
    source      TEXT
);

-- Convert to hypertable partitioned by time
SELECT create_hypertable('risk_scores', 'time', chunk_time_interval => INTERVAL '1 day');

-- Add compression policy (compress chunks older than 7 days)
ALTER TABLE risk_scores SET (
    timescaledb.compress,
    timescaledb.compress_segmentby = 'country'
);
SELECT add_compression_policy('risk_scores', INTERVAL '7 days');

-- Add retention policy (drop data older than 2 years)
SELECT add_retention_policy('risk_scores', INTERVAL '2 years');

-- Query: risk score trend with gap-filling
SELECT
    time_bucket_gapfill('1 hour', time) AS bucket,
    country,
    AVG(risk_score) as avg_risk,
    locf(AVG(risk_score)) as filled_risk  -- last observation carried forward
FROM risk_scores
WHERE country = 'UA'
    AND time > NOW() - INTERVAL '30 days'
GROUP BY bucket, country
ORDER BY bucket;

Category 4: Analytical Databases (OLAP)

Best for: Business intelligence, ad-hoc analysis, aggregations over billions of rows.

OLAP Comparison: BigQuery vs Redshift vs Snowflake vs Databricks

Feature	BigQuery	Redshift	Snowflake	Databricks SQL
Pricing	Per query (TB scanned)	Per cluster-hour	Per credit	Per DBU
Serverless	Yes (fully)	Serverless option	Yes (serverless)	Serverless option
Storage format	Capacitor (proprietary)	Parquet/ORC	FDN (proprietary)	Delta Lake
ML integration	BigQuery ML	Redshift ML	Snowpark ML	MLflow native
Streaming ingest	BigQuery Storage Write	Kinesis Firehose	Snowpipe	Autoloader
Git integration	Limited	Limited	Limited	Native (DBX)
Max query result	10 GB	Unlimited	Unlimited	Unlimited
Time travel	7 days	5 days	90 days	30 days (Delta)

Cost Optimisation Strategies

BigQuery:

-- Partition tables to reduce bytes scanned
CREATE TABLE harbinger_prod.events.geopolitical
PARTITION BY DATE(event_date)
CLUSTER BY country_code, event_type
AS SELECT * FROM staging.raw_events;

-- Use partition filters in queries (avoid full table scans)
SELECT country_code, COUNT(*) as event_count
FROM harbinger_prod.events.geopolitical
WHERE DATE(event_date) BETWEEN '2024-01-01' AND '2024-03-31'  -- partition pruning
GROUP BY country_code;

Category 5: Vector Databases

Best for: Semantic search, RAG (retrieval-augmented generation), recommendation systems, similarity matching.

Vector Database Comparison

Database	Indexing	Max Vectors	Filtering	Cloud Managed
Pinecone	HNSW	Billions	Yes (metadata)	Yes
Weaviate	HNSW + BM25	Billions	Yes	Yes (Cloud)
Qdrant	HNSW	Billions	Yes	Yes (Cloud)
pgvector	HNSW / IVFFlat	~10M (practical)	Full SQL	Via PostgreSQL
Chroma	HNSW	Millions	Yes	Self-hosted
Milvus	Multiple	Billions	Yes	Zilliz Cloud

pgvector for Intelligence Platforms

-- Enable pgvector
CREATE EXTENSION IF NOT EXISTS vector;

-- Store article embeddings alongside metadata
CREATE TABLE article_embeddings (
    id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    article_id  TEXT NOT NULL,
    title       TEXT NOT NULL,
    country     TEXT,
    event_date  DATE,
    embedding   vector(1536),  -- OpenAI text-embedding-3-small dimensions
    created_at  TIMESTAMPTZ DEFAULT NOW()
);

-- Create HNSW index for fast approximate nearest neighbour search
CREATE INDEX ON article_embeddings USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

-- Semantic search: find similar geopolitical events
SELECT
    article_id,
    title,
    country,
    event_date,
    1 - (embedding <=> $1::vector) AS similarity
FROM article_embeddings
WHERE country = 'UA'  -- pre-filter by country
    AND event_date > '2024-01-01'
ORDER BY embedding <=> $1::vector
LIMIT 20;

The Polyglot Persistence Pattern

Most production platforms use multiple database types — the key is matching each store to its strengths:

Loading diagram...

Decision Checklist

Use this checklist before committing to a database:

Workload characteristics

Primary access pattern documented (point lookup / range scan / aggregation)
Expected read/write ratio
Peak QPS estimates (p50, p99)
Data volume at launch and 2-year projection

Consistency requirements

Do you need ACID transactions? (choose RDBMS)
Can you tolerate eventual consistency? (enables NoSQL/distributed)
Do you need multi-region writes? (CockroachDB, DynamoDB Global Tables)

Operational

Managed service vs. self-hosted?
Team familiarity with query language?
Migration path if requirements change?
Cost model understood (per query vs. per hour vs. per row)?

Conclusion

The right cloud database decision is always contextual. Resist the temptation to standardise on one database for everything — the performance gap between an appropriately chosen store and a general-purpose one can be 10–100x.

Platforms processing diverse data types — like Harbinger Explorer, which correlates news events, risk scores, time-series indicators, and semantic signals — often run 3–5 different database types in production, each optimised for its specific access pattern.

Try Harbinger Explorer free for 7 days — built on a polyglot persistence architecture designed for intelligence-grade performance.

View all articles

Try Harbinger Explorer for free

Connect any API, upload files, and explore with AI — all in your browser. No credit card required.

Start Free Trial

Harbinger Explorer

How to Choose the Right Cloud Database: A Decision Framework for Architects

How to Choose the Right Cloud Database: A Decision Framework for Architects

The Decision Framework

Category 1: Relational Databases (OLTP)

Managed Options Comparison

When to Use PostgreSQL

Terraform: Aurora Serverless v2

Category 2: NoSQL Document / Key-Value

DynamoDB vs Firestore vs MongoDB Atlas

DynamoDB Single-Table Design Pattern

Category 3: Time-Series Databases

Time-Series Store Comparison

TimescaleDB Hypertable

Category 4: Analytical Databases (OLAP)

OLAP Comparison: BigQuery vs Redshift vs Snowflake vs Databricks

Cost Optimisation Strategies

Category 5: Vector Databases

Vector Database Comparison

pgvector for Intelligence Platforms

The Polyglot Persistence Pattern

Decision Checklist

Conclusion

Continue Reading

GDPR Compliance for Cloud Data Platforms: A Technical Deep Dive

Cloud Cost Allocation Strategies for Data Teams

API Gateway Architecture Patterns for Data Platforms

Try Harbinger Explorer for free

How to Choose the Right Cloud Database: A Decision Framework for Architects

The Decision Framework

Category 1: Relational Databases (OLTP)

Managed Options Comparison

When to Use PostgreSQL

Terraform: Aurora Serverless v2

Category 2: NoSQL Document / Key-Value

DynamoDB vs Firestore vs MongoDB Atlas

DynamoDB Single-Table Design Pattern

Category 3: Time-Series Databases

Time-Series Store Comparison

TimescaleDB Hypertable

Category 4: Analytical Databases (OLAP)

OLAP Comparison: BigQuery vs Redshift vs Snowflake vs Databricks

Cost Optimisation Strategies

Category 5: Vector Databases

Vector Database Comparison

pgvector for Intelligence Platforms

The Polyglot Persistence Pattern

Decision Checklist

Conclusion

Continue Reading

GDPR Compliance for Cloud Data Platforms: A Technical Deep Dive

Cloud Cost Allocation Strategies for Data Teams

API Gateway Architecture Patterns for Data Platforms

Try Harbinger Explorer for free

Command Palette