Back to Knowledge Hub
Data Strategy

Data Governance Framework: A Practical Guide for Data Teams

10 min read·Tags: data governance, data quality, data strategy, data catalog, data ownership, compliance, metadata management

Data Governance Framework: A Practical Guide for Data Teams

You've been there: a stakeholder asks where a number comes from, and three people give three different answers. Or someone discovers that the "customer" table in the warehouse has four different definitions across teams. Or a GDPR request comes in and nobody knows which systems hold PII.

A data governance framework solves these problems — not with bureaucracy, but with clear ownership, shared definitions, and tooling that enforces the rules. This guide shows you how to build one that actually works.

What Data Governance Really Means

Data governance is the system of decision rights, policies, and processes that determines how data is collected, stored, used, and retired across an organization. Strip away the consultant-speak and it boils down to three questions:

  1. Who owns this data? (Accountability)
  2. What are the rules? (Policies)
  3. How do we enforce them? (Processes + Tooling)

If you can answer those three questions for every dataset in your organization, you have governance. Everything else is implementation detail.

What Governance Is NOT

MisconceptionReality
A one-time projectAn ongoing operating model
An IT-only responsibilityA cross-functional discipline
A tool you can buyA system of people, processes, and tools
Locking data downEnabling safe, fast access to data
Writing 200-page policy docs nobody readsLightweight, enforceable rules embedded in workflows

The Four Pillars of a Data Governance Framework

Every effective governance framework rests on four pillars. Skip one and the whole thing wobbles.

1. Data Ownership & Stewardship

Someone must be accountable for every dataset. Not "the data team" — a specific person.

Roles you actually need:

RoleResponsibilityWho Fills It
Data OwnerBusiness accountability — defines what the data means, who can access it, retention rulesDomain/business lead (e.g., Head of Finance owns financial data)
Data StewardDay-to-day governance — maintains metadata, resolves quality issues, enforces policiesSenior analyst or engineer within the domain
Data EngineerTechnical implementation — pipelines, access controls, quality checksEngineering team
Data Governance LeadCross-cutting coordination — resolves conflicts, maintains standardsDedicated role or part of a data platform team

The key insight: ownership lives with the business, not with IT. The finance team owns financial data. The marketing team owns campaign data. Engineers build the infrastructure; they don't define what "active customer" means.

2. Policies & Standards

Policies are the rules. Standards are how you implement them. Keep both short and enforceable.

Core policies every team needs:

  • Data Classification Policy — What sensitivity levels exist (public, internal, confidential, restricted) and how each is handled
  • Access Policy — Who can access what, how access is requested and revoked
  • Retention Policy — How long data is kept and when it's deleted
  • Quality Policy — What quality thresholds exist and what happens when they're breached
  • Lineage Policy — How upstream/downstream dependencies are tracked

Here's a practical example of a data classification standard implemented as a SQL comment convention:

-- PostgreSQL: Column-level classification using COMMENT
COMMENT ON COLUMN customers.email IS 'classification:confidential;pii:true;retention:3y';
COMMENT ON COLUMN customers.country IS 'classification:internal;pii:false;retention:indefinite';
COMMENT ON COLUMN orders.total_amount IS 'classification:internal;pii:false;retention:7y';

This is lightweight but machine-readable. A downstream scanner can parse these comments and enforce access rules automatically.

3. Data Quality

Governance without quality enforcement is just documentation. You need automated checks that run in your pipelines.

The five dimensions of data quality:

DimensionQuestion It AnswersExample Check
CompletenessIs all expected data present?NOT NULL rate > 99% on required fields
AccuracyDoes the data reflect reality?Revenue totals match source system within 0.1%
ConsistencyDo related datasets agree?Customer count in CRM = customer count in warehouse
TimelinessIs the data fresh enough?Pipeline completes within 2 hours of source update
UniquenessAre there duplicates?Primary key uniqueness = 100%

Here's how you'd implement basic quality checks in a dbt project:

# dbt schema.yml — Data quality tests
version: 2

models:
  - name: dim_customers
    description: "Customer dimension — owned by Sales team"
    meta:
      owner: "sales-team"
      classification: "confidential"
      contains_pii: true
    columns:
      - name: customer_id
        description: "Unique customer identifier"
        tests:
          - unique
          - not_null
      - name: email
        description: "Customer email — PII, confidential"
        tests:
          - not_null
          - unique
      - name: created_at
        tests:
          - not_null
          - dbt_utils.expression_is_true:
              expression: "created_at <= current_timestamp"
      - name: country_code
        tests:
          - not_null
          - accepted_values:
              values: ['DE', 'US', 'GB', 'FR', 'NL', 'AT', 'CH']
              config:
                severity: warn

This embeds governance directly into your transformation layer. When a test fails, the pipeline stops — no bad data reaches dashboards.

4. Metadata & Data Catalog

A data catalog is the single pane of glass where people find, understand, and trust data. Without one, governance lives in wikis nobody reads.

What your catalog must capture:

  • Technical metadata — table names, column types, row counts, freshness
  • Business metadata — plain-English descriptions, ownership, classification
  • Lineage metadata — where data comes from, what transformations it went through, what depends on it
  • Usage metadata — who queries it, how often, for what purpose

Popular catalog tools:

ToolBest ForPricing Model
DataHub (LinkedIn OSS)Teams comfortable with self-hostingFree (open source)
OpenMetadataModern, API-first approachFree (open source)
AtlanEnterprise teams wanting managed solutionPer-seat, starts ~$30k/yr [PRICING-CHECK]
AlationLarge enterprises with complex governance needsEnterprise pricing [PRICING-CHECK]
dbt Docs + dbt ExplorerTeams already using dbtFree (OSS) / included in dbt Cloud

Implementation Roadmap: From Zero to Governed

Don't try to govern everything at once. Start small, prove value, expand.

Phase 1: Foundation (Weeks 1–4)

Goal: Establish ownership for your most critical datasets.

  1. Identify your top 10 most-used datasets (check query logs)
  2. Assign an owner and steward for each
  3. Write one-paragraph descriptions for each dataset
  4. Document known quality issues — don't fix them yet, just acknowledge them

Deliverable: A simple spreadsheet or catalog entries for 10 datasets with owners.

Phase 2: Policies & Quality (Weeks 5–8)

Goal: Define the rules and start enforcing them.

  1. Write your data classification policy (keep it to one page)
  2. Classify the top 10 datasets
  3. Add automated quality tests to critical pipelines (start with dbt tests or Great Expectations)
  4. Set up a weekly 30-minute "data quality standup" with stewards

Deliverable: Classification policy, quality tests running in CI/CD, first quality metrics dashboard.

Phase 3: Scale & Automate (Weeks 9–16)

Goal: Extend governance to all production datasets.

  1. Roll out ownership to all production datasets
  2. Deploy a data catalog (or enhance your existing one)
  3. Implement automated lineage tracking
  4. Set up access request workflows
  5. Create onboarding docs for new team members

Deliverable: Full catalog coverage, automated lineage, self-service access requests.

Phase 4: Continuous Improvement (Ongoing)

Goal: Make governance a habit, not a project.

  1. Monthly governance review — are policies being followed?
  2. Quarterly ownership audit — have responsibilities shifted?
  3. Track governance KPIs (catalog coverage, quality score trends, time-to-access)
  4. Iterate on policies based on what's actually causing friction

Governance KPIs: Measuring What Matters

You can't manage what you don't measure. Track these metrics monthly:

KPITargetHow to Measure
Catalog coverage>90% of production tables documentedAutomated scan of warehouse vs catalog
Ownership assignment100% of production tables have an ownerCatalog metadata check
Quality test coverage>80% of critical tables have automated testsdbt/GE test count vs table count
Data freshness SLA>95% of pipelines meet freshness SLAPipeline monitoring tool
Access request turnaround<24 hours for standard requestsTicketing system metrics
PII classification100% of PII columns taggedAutomated PII scanner + catalog

When Governance Fails: Common Anti-Patterns

The Committee Trap: Creating a "Data Governance Council" that meets monthly, produces slide decks, but never ships anything. Governance must be embedded in daily workflows, not delegated to a committee.

The Tool-First Trap: Buying an expensive catalog tool before defining ownership or policies. The tool will sit empty. People first, processes second, tools third.

The Boil-the-Ocean Trap: Trying to govern every dataset from day one. You'll burn out and give up. Start with the 10 tables that matter most and expand from there.

The Compliance-Only Trap: Treating governance purely as a GDPR/SOX checkbox exercise. Compliance is a byproduct of good governance, not its purpose. The purpose is making data trustworthy and accessible.

Governance for Small Teams

You don't need a dedicated governance team to start. In my experience, teams of 3–10 data professionals can implement effective governance with these adjustments:

  • Combine roles: The data engineer who builds the pipeline is also the steward. The analytics lead is also the owner.
  • Use dbt as your catalog: schema.yml with descriptions, tests, and meta tags covers 80% of catalog needs for free.
  • Automate aggressively: Every manual governance step is a step that won't happen consistently. CI/CD for quality tests, automated freshness monitoring, git-based policy versioning.
  • Skip the heavyweight tools: A well-maintained dbt project + a Notion page with ownership mappings beats an empty Atlan instance every time.

If you're running a small team exploring data from multiple sources — APIs, CSVs, databases — tools like Harbinger Explorer let you query and catalog those sources directly in the browser with DuckDB WASM, which can serve as a lightweight data exploration layer while you build out governance around your core warehouse.

Getting Started Tomorrow

Here's what you can do right now, before any formal initiative:

  1. Pick your three most important tables. The ones that show up in every dashboard and every stakeholder question.
  2. Write a one-sentence description for each. Post it wherever your team communicates.
  3. Add one quality test per table. A NOT NULL check on the primary key counts. Ship it to production.
  4. Name an owner for each. Send them a message: "You own this table. If something breaks, you're the first call."

That's governance. Everything else is scaling it up.

Continue Reading


[PRICING-CHECK] Atlan and Alation pricing figures are estimates based on public information — verify with vendors for current rates.


Continue Reading

Try Harbinger Explorer for free

Connect any API, upload files, and explore with AI — all in your browser. No credit card required.

Start Free Trial