EngineeringApr 3, 2026

Data Platform Team Structure: Centralized vs Embedded vs Hub-and-Spoke

8 min read·Tags: data platform, data team structure, centralized data team, embedded analytics, hub-and-spoke, data engineering, analytics engineering, data governance

Most data teams grow by accident. You hire a data engineer, then a BI developer, then an analyst — and suddenly you have a team with no clear structure, unclear ownership, and everyone wondering who's responsible when the pipeline breaks. Intentional data platform team structure is the difference between a team that scales and one that becomes a bottleneck.

TL;DR

Three dominant models exist: Centralized (one team serves everyone), Embedded (data people sit inside business units), and Hub-and-Spoke (a central platform team plus embedded practitioners). Each model has distinct failure modes. Your org size, data maturity, and business structure determine which fits — not industry trends.

Why Team Structure Matters More Than Technology

Companies spend months choosing between Databricks and Snowflake, but often don't think carefully about who owns what data, who's on-call for pipelines, and who gets to say "no" to a data request. The result: duplicated pipelines, inconsistent metric definitions, and a central data team that's permanently overwhelmed.

Team structure determines:

Who builds and maintains data infrastructure
Who has authority over data quality and governance
How quickly business teams get answers to data questions
Whether data work is a bottleneck or a multiplier

Getting this wrong is expensive. Reorganizing a data team mid-flight — while pipelines are running and dashboards are live — is one of the most disruptive things you can do to a data organization.

The Three Models

Model 1: Centralized Data Team

One data team serves the entire organization. All data engineers, analysts, and data scientists report into a single department — often reporting to a CDO, CTO, or VP of Data.

Loading diagram...

① A single leadership chain gives the data team unified prioritization authority. ② Shared infrastructure means no duplication — one pipeline, one warehouse, one governance model. ③ Business teams submit requests and wait for the central team to deliver.

Strengths:

Strong data governance — one team enforces standards
No duplication of infrastructure or tooling
Easier to build deep platform expertise (fewer generalists required)
Clear accountability — one team owns data quality

Failure modes:

Becomes a bottleneck as business demand grows — every team wants data, one team provides it
Distance from the business leads to slow feedback loops ("this dashboard doesn't reflect how we actually work")
Prioritization becomes political — whose request gets done first?
Data engineers become ticket-fillers rather than platform builders

When centralized works: Early-stage companies (fewer than ~100 employees), regulated industries where governance must be airtight, organizations with homogeneous data needs across business units.

When it breaks down: When request queues routinely run 2+ weeks, when business units start building shadow analytics in spreadsheets, when the data team can't keep up with business growth.

Model 2: Embedded Analytics

Data practitioners — engineers, analysts, data scientists — sit directly inside business units. The marketing team has its own data analyst; the product team has its own data engineer. There's no central data team, or a very thin one.

Loading diagram...

① Each business unit owns its own data practitioners — full control, immediate access. ② Data work is prioritized by the business unit itself, not a central team. ③ A thin central infrastructure team may handle shared tooling (warehouse, Airflow), or not exist at all.

Strengths:

No bottleneck — business units move at their own speed
Data practitioners build deep domain knowledge in their area
Tight feedback loops — analysts understand the business context intimately
Business unit ownership creates clear accountability

Failure modes:

Metric inconsistency — three different teams define "revenue" three different ways
Infrastructure duplication — every team rebuilds the same pipeline patterns
Lonely data people — embedded analysts often lack peers to learn from or review their work
Governance gaps — nobody's watching cross-team data quality

When embedded works: Companies where business units operate very independently, organizations with strong data culture and high data literacy among business leaders, situations where speed-to-insight is the primary concern.

When it breaks down: When the organization tries to answer a cross-functional question and nobody agrees on the numbers, when embedded analysts become isolated and struggle without a technical community, when infrastructure costs balloon from duplication.

Model 3: Hub-and-Spoke

The hybrid model: a central platform team (the hub) builds and maintains shared infrastructure, governance, and standards. Embedded practitioners (the spokes) sit inside business units and use that infrastructure to serve their domain.

Loading diagram...

① The hub team is responsible for the platform — the infrastructure everyone else runs on. They don't take business requests; they build the tools and systems that make embedded teams effective. ② Embedded practitioners in each spoke use the shared infrastructure and follow shared standards, but are managed and prioritized by their business unit. ③ The hub sets the rules (metric definitions, data contracts, access policies). Spokes work within those rules.

Strengths:

Combines centralized governance with embedded speed
No duplication of core infrastructure
Business units retain ownership of their data work
Shared standards mean cross-functional analysis is actually possible
Platform team can focus on building leverage, not handling tickets

Failure modes:

Coordination overhead — hub and spokes must communicate regularly or standards drift
Tension between hub authority and spoke autonomy ("we don't care about your data contract, we need this shipped")
Hub team can still become a bottleneck if they try to review and approve everything
Requires management support on both sides to work

When hub-and-spoke works: Mid-to-large organizations (typically 200+ employees with dedicated data headcount), companies with distinct business units that generate their own data, organizations that have already experienced the pain of both centralized and embedded models.

When it struggles: If leadership won't enforce platform standards, the hub becomes toothless. If the hub team is understaffed, embedded practitioners work around them. The model requires genuine organizational commitment, not just a reorg diagram.

Model Comparison

	Centralized	Embedded	Hub-and-Spoke
Speed to deliver	Slow (queue-based)	Fast	Medium (depends on hub)
Governance quality	High	Low	High (if enforced)
Infrastructure duplication	None	High	Low
Domain knowledge	Low	High	Medium
Metric consistency	High	Low	High
Coordination overhead	Low	Medium	High
Scales to org growth	Poorly	Moderately	Well
Minimum viable team size	Small	Medium	Large

Key Roles in a Modern Data Platform Team

Regardless of structure, these are the roles that matter most — and what they actually do:

Data Engineer Builds and maintains pipelines, transforms raw data into usable structures, manages orchestration and infrastructure. Owns the reliability of data movement.

Analytics Engineer Works at the intersection of data engineering and analytics. Builds modular, tested dbt models. Translates raw data into business-ready tables. Increasingly the most important hire for data-mature orgs.

Data Analyst Answers business questions using available data. Owns dashboards and reports. In embedded models, becomes the primary data point of contact for a business unit.

Data Platform Engineer (or Data Infrastructure Engineer) Focuses on the platform itself — the warehouse, orchestration framework, monitoring, access control. In hub-and-spoke, this is the hub team. In centralized, this role may not exist separately.

Analytics Engineer vs. Data Engineer: The Blurry Line

This distinction matters. Data engineers typically own ingestion and raw data reliability. Analytics engineers own the transformation layer — the models that business logic is built on. Many teams blur this and suffer for it: engineers who should be building pipelines get pulled into modeling, and analysts who should be asking questions get stuck writing SQL transformations that break in production.

The Org Design Decision Nobody Wants to Make

The structural model you choose is a function of three things: org size, data maturity, and political will to enforce standards.

Org Size	Data Maturity	Recommended Model
< 50 people	Low	Centralized (small team, one or two people)
50–200 people	Medium	Centralized with embedded analysts
200+ people	High	Hub-and-Spoke
Any size	Low (shadow IT problem)	Centralized + governance investment
Decentralized org	Any	Hub-and-Spoke or Embedded with strict standards

One thing that's consistently underrated: the analytics engineer role. Teams that invest in analytics engineers — people who can bridge the gap between raw data and business-ready metrics — consistently outperform teams that try to split the work between senior data engineers (who are expensive) and junior analysts (who lack the SQL depth).

Practical Recommendation

Start centralized. Most early data teams should be one or two people building shared infrastructure and answering the highest-priority questions. As business unit data needs diverge and request queues grow, move toward hub-and-spoke. Embedded-only works only if your business units are genuinely independent and you don't need cross-functional metrics.

Don't restructure reactively. Every time your data team is described as "a bottleneck," the instinct is to add headcount or restructure. Usually the better fix is better tooling, clearer prioritization, or a more explicit data contract between the data team and stakeholders. Restructuring is expensive and disruptive — do it when the model is genuinely wrong, not when the team is just understaffed.

For teams exploring how to structure self-serve data access — so business units can query data without filing a ticket — tools like Harbinger Explorer let analysts query their own data directly in the browser using natural-language prompts. It won't replace your platform architecture, but it reduces the request volume that lands on your central team while embedded analysts build up SQL skills over time.

The Right Structure Is the One That Actually Works

Hub-and-spoke is the right answer for most mature data organizations. But "right answer" only matters if leadership supports it, the platform team is funded, and the business units are willing to follow the standards. A well-run centralized team beats a dysfunctional hub-and-spoke every time.

Pick the simplest model that solves your current bottleneck. Then invest in the tooling and practices that make self-service possible — because the best team structure is one that doesn't require the data team to be in the room for every answer.

View all articles

Try Harbinger Explorer for free

Connect any API, upload files, and explore with AI — all in your browser. No credit card required.

Start Free Trial

Harbinger Explorer

Data Platform Team Structure: Centralized vs Embedded vs Hub-and-Spoke

TL;DR

Why Team Structure Matters More Than Technology

The Three Models

Model 1: Centralized Data Team

Model 2: Embedded Analytics

Model 3: Hub-and-Spoke

Model Comparison

Key Roles in a Modern Data Platform Team

The Org Design Decision Nobody Wants to Make

Practical Recommendation

The Right Structure Is the One That Actually Works

Continue Reading

Continue Reading

Data Deduplication Strategies: Hash, Fuzzy, and Record Linkage

Airflow vs Dagster vs Prefect: An Honest Comparison

Change Data Capture Explained

Try Harbinger Explorer for free

TL;DR

Why Team Structure Matters More Than Technology

The Three Models

Model 1: Centralized Data Team

Model 2: Embedded Analytics

Model 3: Hub-and-Spoke

Model Comparison

Key Roles in a Modern Data Platform Team

The Org Design Decision Nobody Wants to Make

Practical Recommendation

The Right Structure Is the One That Actually Works

Continue Reading

Continue Reading

Data Deduplication Strategies: Hash, Fuzzy, and Record Linkage

Airflow vs Dagster vs Prefect: An Honest Comparison

Change Data Capture Explained

Try Harbinger Explorer for free

Command Palette