Data Platform Team Structure: Centralized vs Embedded vs Hub-and-Spoke
Most data teams grow by accident. You hire a data engineer, then a BI developer, then an analyst — and suddenly you have a team with no clear structure, unclear ownership, and everyone wondering who's responsible when the pipeline breaks. Intentional data platform team structure is the difference between a team that scales and one that becomes a bottleneck.
TL;DR
Three dominant models exist: Centralized (one team serves everyone), Embedded (data people sit inside business units), and Hub-and-Spoke (a central platform team plus embedded practitioners). Each model has distinct failure modes. Your org size, data maturity, and business structure determine which fits — not industry trends.
Why Team Structure Matters More Than Technology
Companies spend months choosing between Databricks and Snowflake, but often don't think carefully about who owns what data, who's on-call for pipelines, and who gets to say "no" to a data request. The result: duplicated pipelines, inconsistent metric definitions, and a central data team that's permanently overwhelmed.
Team structure determines:
- Who builds and maintains data infrastructure
- Who has authority over data quality and governance
- How quickly business teams get answers to data questions
- Whether data work is a bottleneck or a multiplier
Getting this wrong is expensive. Reorganizing a data team mid-flight — while pipelines are running and dashboards are live — is one of the most disruptive things you can do to a data organization.
The Three Models
Model 1: Centralized Data Team
One data team serves the entire organization. All data engineers, analysts, and data scientists report into a single department — often reporting to a CDO, CTO, or VP of Data.
Loading diagram...
① A single leadership chain gives the data team unified prioritization authority. ② Shared infrastructure means no duplication — one pipeline, one warehouse, one governance model. ③ Business teams submit requests and wait for the central team to deliver.
Strengths:
- Strong data governance — one team enforces standards
- No duplication of infrastructure or tooling
- Easier to build deep platform expertise (fewer generalists required)
- Clear accountability — one team owns data quality
Failure modes:
- Becomes a bottleneck as business demand grows — every team wants data, one team provides it
- Distance from the business leads to slow feedback loops ("this dashboard doesn't reflect how we actually work")
- Prioritization becomes political — whose request gets done first?
- Data engineers become ticket-fillers rather than platform builders
When centralized works: Early-stage companies (fewer than ~100 employees), regulated industries where governance must be airtight, organizations with homogeneous data needs across business units.
When it breaks down: When request queues routinely run 2+ weeks, when business units start building shadow analytics in spreadsheets, when the data team can't keep up with business growth.
Model 2: Embedded Analytics
Data practitioners — engineers, analysts, data scientists — sit directly inside business units. The marketing team has its own data analyst; the product team has its own data engineer. There's no central data team, or a very thin one.
Loading diagram...
① Each business unit owns its own data practitioners — full control, immediate access. ② Data work is prioritized by the business unit itself, not a central team. ③ A thin central infrastructure team may handle shared tooling (warehouse, Airflow), or not exist at all.
Strengths:
- No bottleneck — business units move at their own speed
- Data practitioners build deep domain knowledge in their area
- Tight feedback loops — analysts understand the business context intimately
- Business unit ownership creates clear accountability
Failure modes:
- Metric inconsistency — three different teams define "revenue" three different ways
- Infrastructure duplication — every team rebuilds the same pipeline patterns
- Lonely data people — embedded analysts often lack peers to learn from or review their work
- Governance gaps — nobody's watching cross-team data quality
When embedded works: Companies where business units operate very independently, organizations with strong data culture and high data literacy among business leaders, situations where speed-to-insight is the primary concern.
When it breaks down: When the organization tries to answer a cross-functional question and nobody agrees on the numbers, when embedded analysts become isolated and struggle without a technical community, when infrastructure costs balloon from duplication.
Model 3: Hub-and-Spoke
The hybrid model: a central platform team (the hub) builds and maintains shared infrastructure, governance, and standards. Embedded practitioners (the spokes) sit inside business units and use that infrastructure to serve their domain.
Loading diagram...
① The hub team is responsible for the platform — the infrastructure everyone else runs on. They don't take business requests; they build the tools and systems that make embedded teams effective. ② Embedded practitioners in each spoke use the shared infrastructure and follow shared standards, but are managed and prioritized by their business unit. ③ The hub sets the rules (metric definitions, data contracts, access policies). Spokes work within those rules.
Strengths:
- Combines centralized governance with embedded speed
- No duplication of core infrastructure
- Business units retain ownership of their data work
- Shared standards mean cross-functional analysis is actually possible
- Platform team can focus on building leverage, not handling tickets
Failure modes:
- Coordination overhead — hub and spokes must communicate regularly or standards drift
- Tension between hub authority and spoke autonomy ("we don't care about your data contract, we need this shipped")
- Hub team can still become a bottleneck if they try to review and approve everything
- Requires management support on both sides to work
When hub-and-spoke works: Mid-to-large organizations (typically 200+ employees with dedicated data headcount), companies with distinct business units that generate their own data, organizations that have already experienced the pain of both centralized and embedded models.
When it struggles: If leadership won't enforce platform standards, the hub becomes toothless. If the hub team is understaffed, embedded practitioners work around them. The model requires genuine organizational commitment, not just a reorg diagram.
Model Comparison
| Centralized | Embedded | Hub-and-Spoke | |
|---|---|---|---|
| Speed to deliver | Slow (queue-based) | Fast | Medium (depends on hub) |
| Governance quality | High | Low | High (if enforced) |
| Infrastructure duplication | None | High | Low |
| Domain knowledge | Low | High | Medium |
| Metric consistency | High | Low | High |
| Coordination overhead | Low | Medium | High |
| Scales to org growth | Poorly | Moderately | Well |
| Minimum viable team size | Small | Medium | Large |
Key Roles in a Modern Data Platform Team
Regardless of structure, these are the roles that matter most — and what they actually do:
Data Engineer Builds and maintains pipelines, transforms raw data into usable structures, manages orchestration and infrastructure. Owns the reliability of data movement.
Analytics Engineer Works at the intersection of data engineering and analytics. Builds modular, tested dbt models. Translates raw data into business-ready tables. Increasingly the most important hire for data-mature orgs.
Data Analyst Answers business questions using available data. Owns dashboards and reports. In embedded models, becomes the primary data point of contact for a business unit.
Data Platform Engineer (or Data Infrastructure Engineer) Focuses on the platform itself — the warehouse, orchestration framework, monitoring, access control. In hub-and-spoke, this is the hub team. In centralized, this role may not exist separately.
Analytics Engineer vs. Data Engineer: The Blurry Line
This distinction matters. Data engineers typically own ingestion and raw data reliability. Analytics engineers own the transformation layer — the models that business logic is built on. Many teams blur this and suffer for it: engineers who should be building pipelines get pulled into modeling, and analysts who should be asking questions get stuck writing SQL transformations that break in production.
The Org Design Decision Nobody Wants to Make
The structural model you choose is a function of three things: org size, data maturity, and political will to enforce standards.
| Org Size | Data Maturity | Recommended Model |
|---|---|---|
| < 50 people | Low | Centralized (small team, one or two people) |
| 50–200 people | Medium | Centralized with embedded analysts |
| 200+ people | High | Hub-and-Spoke |
| Any size | Low (shadow IT problem) | Centralized + governance investment |
| Decentralized org | Any | Hub-and-Spoke or Embedded with strict standards |
One thing that's consistently underrated: the analytics engineer role. Teams that invest in analytics engineers — people who can bridge the gap between raw data and business-ready metrics — consistently outperform teams that try to split the work between senior data engineers (who are expensive) and junior analysts (who lack the SQL depth).
Practical Recommendation
Start centralized. Most early data teams should be one or two people building shared infrastructure and answering the highest-priority questions. As business unit data needs diverge and request queues grow, move toward hub-and-spoke. Embedded-only works only if your business units are genuinely independent and you don't need cross-functional metrics.
Don't restructure reactively. Every time your data team is described as "a bottleneck," the instinct is to add headcount or restructure. Usually the better fix is better tooling, clearer prioritization, or a more explicit data contract between the data team and stakeholders. Restructuring is expensive and disruptive — do it when the model is genuinely wrong, not when the team is just understaffed.
For teams exploring how to structure self-serve data access — so business units can query data without filing a ticket — tools like Harbinger Explorer let analysts query their own data directly in the browser using natural-language prompts. It won't replace your platform architecture, but it reduces the request volume that lands on your central team while embedded analysts build up SQL skills over time.
The Right Structure Is the One That Actually Works
Hub-and-spoke is the right answer for most mature data organizations. But "right answer" only matters if leadership supports it, the platform team is funded, and the business units are willing to follow the standards. A well-run centralized team beats a dysfunctional hub-and-spoke every time.
Pick the simplest model that solves your current bottleneck. Then invest in the tooling and practices that make self-service possible — because the best team structure is one that doesn't require the data team to be in the room for every answer.
Continue Reading
Continue Reading
Data Deduplication Strategies: Hash, Fuzzy, and Record Linkage
Airflow vs Dagster vs Prefect: An Honest Comparison
An unbiased comparison of Airflow, Dagster, and Prefect — covering architecture, DX, observability, and real trade-offs to help you pick the right orchestrator.
Change Data Capture Explained
A practical guide to CDC patterns — log-based, trigger-based, and polling — with Debezium configuration examples and Kafka Connect integration.
Try Harbinger Explorer for free
Connect any API, upload files, and explore with AI — all in your browser. No credit card required.
Start Free Trial