Skip to content

Governance Gaps

The missing enforcement layer between architectural intent and implementation reality.


The Governance Problem

Organizations have plenty of "governance" — documents, meetings, approvals — but lack computable governance that actively maintains alignment.

The Governance Stack

Layer Current State Gap
Intent Architecture docs, ADRs, policies Static, ignored
Planning Roadmaps, designs, estimates Decoupled from reality
Implementation Code, infrastructure Unvalidated
Observation Monitoring, logs Symptom-only
Response Post-mortems, tickets Reactive

The Gap: No layer actively validates implementation against intent in real-time.


Types of Governance Gaps

1. Intent-to-Code Gap

The Problem: - Architecture says: "All services use the API gateway" - Code says: Direct HTTP calls between services - Result: Violation invisibly persists

Substrate Solution:

Policy: api-gateway-first
Enforcement: HARD_MANDATORY
Detection: Static analysis of imports + runtime verification
Action: Block PR + Explain WHY

2. Code-to-Runtime Gap

The Problem: - Code declares: "Service runs on port 8080" - Runtime shows: Service on port 8081 - Result: Drift undetected

Substrate Solution:

SSH Runtime Connector
Frequency: Every 15 minutes
Detection: systemctl + ss + docker inspect
Action: Alert on divergence

3. Decision-to-Policy Gap

The Problem: - Post-mortem says: "We should prevent this" - Policy never created - Result: Same incident recurs

Substrate Solution:

Memory Decay Detection
Check: FailurePattern nodes without linked Policy
Action: Alert: "POST-019 has no preventive policy"
Recommendation: Create POLICY-013

4. Policy-to-Enforcement Gap

The Problem: - Policy documented: "No GPL in commercial code" - No automated check - Result: GPL library merged

Substrate Solution:

OPA/Rego Policies
Evaluation: Every PR
Enforcement: Block merge on violation
Audit: Complete evaluation log

5. Knowledge-to-Action Gap

The Problem: - Knowledge exists: "Alice knows why this constraint exists" - Alice leaves - Result: Constraint violated unknowingly

Substrate Solution:

WHY Layer
Capture: ADR + Post-mortem + Design rationale
Query: Natural language → Provenance
Result: Knowledge survives turnover


The Substrate Governance Model

Continuous Validation Loop

graph LR
    INTENT[Intent<br/>ADRs, Policies] -->|Define| RULE[Rules<br/>Rego Policies]
    RULE -->|Evaluate| CODE[Code<br/>PRs, Commits]
    CODE -->|Deploy| RUNTIME[Runtime<br/>K8s, VMs]
    RUNTIME -->|Verify| OBSERVE[Observation<br/>SSH, Metrics]
    OBSERVE -->|Detect Drift| INTENT

    RULE -.->|Block| CODE
    OBSERVE -.->|Alert| INTENT

Governance as Code

Traditional:

Architecture Review Board meets monthly
Reviews 5% of changes
Approves based on presentation quality
No enforcement mechanism

Substrate:

package substrate.governance

# Architecture policy as executable code
deny[msg] {
    # Must have approved ADR for new services
    new_service := input.changes.new_services[_]
    not has_approved_adr(new_service)
    msg := sprintf("New service %s requires approved ADR", [new_service.name])
}

deny[msg] {
    # Must not increase coupling beyond threshold
    service := input.services[_]
    new_deps := count(input.changes.added_dependencies[service.name])
    current_deps := count(service.dependencies)
    new_deps + current_deps > 10
    msg := sprintf("Service %s would have %d dependencies (max 10)", 
                   [service.name, new_deps + current_deps])
}

Evaluation: Every PR, <5ms, deterministic, explainable


Governance Layers

1. Preventive Governance

When: Before code is written
How: Simulation, what-if analysis

Example:

> "What if I split OrderService?"

Simulation:
- Affected services: 12
- Policy violations introduced: 4
- Blast radius: 3 hops
- Drift delta: +0.15

Recommendation: REVIEW REQUIRED
Consider: Creating ADR first, gradual migration

2. Detective Governance

When: During PR review
How: Static analysis, policy evaluation

Example:

❌ BLOCKED: architectural-boundary

Direct import from payment domain to auth domain.
Must route via api-gateway-prod.

See: ADR-047, POLICY-012
Exception: Request with justification

3. Corrective Governance

When: Post-deployment
How: Runtime verification, drift detection

Example:

⚠️ RUNTIME VIOLATION

Host: prod-worker-03
Undeclared service: unknown-payment-worker
Declared: payment-worker on prod-worker-01,02

Action: Investigate or add to graph
Risk: Shadow processing, audit failure

4. Adaptive Governance

When: Continuous
How: Verification queue, memory curation

Example:

Memory Gap Detected:
Service: inventory-service
Missing: ADR for database choice
Staleness: 180 days since last review

Action: Assign to architect for documentation


Governance Dashboards

Health Scorecard

Domain Drift Score Violations Memory Coverage Status
Payments 0.12 2 85% 🟢
Auth 0.08 0 92% 🟢
Orders 0.34 5 60% 🟡
Legacy 0.67 12 40% 🔴
Weekly Violations:
- Prevented at PR: 45
- Detected in runtime: 3
- Escaped to prod: 0

Trend: ↓ 20% from last month
Interpretation: Team learning, policies effective

Policy Compliance

Policy Scope Pass Rate Trend
api-gateway-first All services 98%
no-circular-deps Core domain 95%
service-ownership New services 100%
license-compliance All deps 99.5%

Governance Roles

Automated (System)

Role Actions
Enforcer Block PRs violating hard-mandatory policies
Monitor Detect runtime drift every 15 minutes
Curator Verify memory accuracy, flag staleness
Educator Suggest fixes, explain WHY

Human-in-the-Loop

Role Actions
Architect Define policies, approve exceptions
Tech Lead Review violations, guide team
Developer Understand constraints, request exceptions
Security Define compliance policies

Confidence-Based Routing

Confidence System Action Human Role
>90% Auto-accept None
60-90% Queue for review Team member (7 days)
<60% Escalate Team lead / Architect

Compliance and Audit

Automated Attestation

Auditor asks: "How do you ensure architectural standards are enforced?"

Traditional response: - Manual evidence collection: 80 hours - Spot checks of 5% of changes - Subjective compliance

Substrate response:

Evidence Package:
- Policy evaluation log: 100% of PRs, 12 months
- Violation detection: 2,847 caught, 3 escaped
- Remediation time: avg 4.2 hours
- Drift scores: Monthly trend, all domains
- Policy coverage: 12 active, 98% compliance

Format: JSON export, machine-readable
Time to generate: 5 minutes

Regulatory Compliance

Regulation Substrate Support
SOC 2 Automated evidence, audit trails
PCI-DSS Network boundary validation
GDPR Data flow mapping, retention policies
DORA Operational resilience metrics
FDA 21 CFR Part 11 Electronic records, audit trails

Measuring Governance Maturity

Level 1: Documented

  • Architecture exists in documents
  • Policies written but not enforced
  • Manual reviews spotty

Level 2: Detected

  • Automated detection of violations
  • Alerts for drift
  • Reactive response

Level 3: Prevented

  • PR blocking for violations
  • Simulation before implementation
  • Proactive governance

Level 4: Optimized

  • Self-healing systems
  • Predictive drift detection
  • Continuous improvement

Substrate target: Level 3+ for all customers


Success Metrics

Governance Effectiveness

Metric Baseline Target Best
Violations prevented 0% 90% 99%
Time to detect drift Months Minutes Real-time
Policy compliance Unknown 95% 99%
Memory coverage 20% 80% 95%
Audit prep time 80 hrs 4 hrs 1 hr

Business Outcomes

Outcome Measurement
Reduced incidents -50% architecture-related
Faster audits -90% preparation time
Knowledge retention +80% after turnover
Developer productivity +30% (less review overhead)