Governance Gaps¶

The missing enforcement layer between architectural intent and implementation reality.

The Governance Problem¶

Organizations have plenty of "governance" — documents, meetings, approvals — but lack computable governance that actively maintains alignment.

The Governance Stack¶

Layer	Current State	Gap
Intent	Architecture docs, ADRs, policies	Static, ignored
Planning	Roadmaps, designs, estimates	Decoupled from reality
Implementation	Code, infrastructure	Unvalidated
Observation	Monitoring, logs	Symptom-only
Response	Post-mortems, tickets	Reactive

The Gap: No layer actively validates implementation against intent in real-time.

Types of Governance Gaps¶

1. Intent-to-Code Gap¶

The Problem: - Architecture says: "All services use the API gateway" - Code says: Direct HTTP calls between services - Result: Violation invisibly persists

Substrate Solution:

Policy: api-gateway-first
Enforcement: HARD_MANDATORY
Detection: Static analysis of imports + runtime verification
Action: Block PR + Explain WHY

2. Code-to-Runtime Gap¶

The Problem: - Code declares: "Service runs on port 8080" - Runtime shows: Service on port 8081 - Result: Drift undetected

Substrate Solution:

SSH Runtime Connector
Frequency: Every 15 minutes
Detection: systemctl + ss + docker inspect
Action: Alert on divergence

3. Decision-to-Policy Gap¶

The Problem: - Post-mortem says: "We should prevent this" - Policy never created - Result: Same incident recurs

Substrate Solution:

Memory Decay Detection
Check: FailurePattern nodes without linked Policy
Action: Alert: "POST-019 has no preventive policy"
Recommendation: Create POLICY-013

4. Policy-to-Enforcement Gap¶

The Problem: - Policy documented: "No GPL in commercial code" - No automated check - Result: GPL library merged

Substrate Solution:

OPA/Rego Policies
Evaluation: Every PR
Enforcement: Block merge on violation
Audit: Complete evaluation log

5. Knowledge-to-Action Gap¶

The Problem: - Knowledge exists: "Alice knows why this constraint exists" - Alice leaves - Result: Constraint violated unknowingly

Substrate Solution:

WHY Layer
Capture: ADR + Post-mortem + Design rationale
Query: Natural language → Provenance
Result: Knowledge survives turnover

The Substrate Governance Model¶

Continuous Validation Loop¶

graph LR
    INTENT[Intent<br/>ADRs, Policies] -->|Define| RULE[Rules<br/>Rego Policies]
    RULE -->|Evaluate| CODE[Code<br/>PRs, Commits]
    CODE -->|Deploy| RUNTIME[Runtime<br/>K8s, VMs]
    RUNTIME -->|Verify| OBSERVE[Observation<br/>SSH, Metrics]
    OBSERVE -->|Detect Drift| INTENT

    RULE -.->|Block| CODE
    OBSERVE -.->|Alert| INTENT

Governance as Code¶

Traditional:

Architecture Review Board meets monthly
Reviews 5% of changes
Approves based on presentation quality
No enforcement mechanism

Substrate:

package substrate.governance

# Architecture policy as executable code
deny[msg] {
    # Must have approved ADR for new services
    new_service := input.changes.new_services[_]
    not has_approved_adr(new_service)
    msg := sprintf("New service %s requires approved ADR", [new_service.name])
}

deny[msg] {
    # Must not increase coupling beyond threshold
    service := input.services[_]
    new_deps := count(input.changes.added_dependencies[service.name])
    current_deps := count(service.dependencies)
    new_deps + current_deps > 10
    msg := sprintf("Service %s would have %d dependencies (max 10)", 
                   [service.name, new_deps + current_deps])
}

Evaluation: Every PR, <5ms, deterministic, explainable

Governance Layers¶

1. Preventive Governance¶

When: Before code is written
How: Simulation, what-if analysis

Example:

> "What if I split OrderService?"

Simulation:
- Affected services: 12
- Policy violations introduced: 4
- Blast radius: 3 hops
- Drift delta: +0.15

Recommendation: REVIEW REQUIRED
Consider: Creating ADR first, gradual migration

2. Detective Governance¶

When: During PR review
How: Static analysis, policy evaluation

Example:

❌ BLOCKED: architectural-boundary

Direct import from payment domain to auth domain.
Must route via api-gateway-prod.

See: ADR-047, POLICY-012
Exception: Request with justification

3. Corrective Governance¶

When: Post-deployment
How: Runtime verification, drift detection

Example:

⚠️ RUNTIME VIOLATION

Host: prod-worker-03
Undeclared service: unknown-payment-worker
Declared: payment-worker on prod-worker-01,02

Action: Investigate or add to graph
Risk: Shadow processing, audit failure

4. Adaptive Governance¶

When: Continuous
How: Verification queue, memory curation

Example:

Memory Gap Detected:
Service: inventory-service
Missing: ADR for database choice
Staleness: 180 days since last review

Action: Assign to architect for documentation

Governance Dashboards¶

Health Scorecard¶

Domain	Drift Score	Violations	Memory Coverage	Status
Payments	0.12	2	85%	🟢
Auth	0.08	0	92%	🟢
Orders	0.34	5	60%	🟡
Legacy	0.67	12	40%	🔴

Violation Trends¶

Weekly Violations:
- Prevented at PR: 45
- Detected in runtime: 3
- Escaped to prod: 0

Trend: ↓ 20% from last month
Interpretation: Team learning, policies effective

Policy Compliance¶

Policy	Scope	Pass Rate	Trend
api-gateway-first	All services	98%	↑
no-circular-deps	Core domain	95%	→
service-ownership	New services	100%	✓
license-compliance	All deps	99.5%	↑

Governance Roles¶

Automated (System)¶

Role	Actions
Enforcer	Block PRs violating hard-mandatory policies
Monitor	Detect runtime drift every 15 minutes
Curator	Verify memory accuracy, flag staleness
Educator	Suggest fixes, explain WHY

Human-in-the-Loop¶

Role	Actions
Architect	Define policies, approve exceptions
Tech Lead	Review violations, guide team
Developer	Understand constraints, request exceptions
Security	Define compliance policies

Confidence-Based Routing¶

Confidence	System Action	Human Role
>90%	Auto-accept	None
60-90%	Queue for review	Team member (7 days)
<60%	Escalate	Team lead / Architect

Compliance and Audit¶

Automated Attestation¶

Auditor asks: "How do you ensure architectural standards are enforced?"

Traditional response: - Manual evidence collection: 80 hours - Spot checks of 5% of changes - Subjective compliance

Substrate response:

Evidence Package:
- Policy evaluation log: 100% of PRs, 12 months
- Violation detection: 2,847 caught, 3 escaped
- Remediation time: avg 4.2 hours
- Drift scores: Monthly trend, all domains
- Policy coverage: 12 active, 98% compliance

Format: JSON export, machine-readable
Time to generate: 5 minutes

Regulatory Compliance¶

Regulation	Substrate Support
SOC 2	Automated evidence, audit trails
PCI-DSS	Network boundary validation
GDPR	Data flow mapping, retention policies
DORA	Operational resilience metrics
FDA 21 CFR Part 11	Electronic records, audit trails

Measuring Governance Maturity¶

Level 1: Documented¶

Architecture exists in documents
Policies written but not enforced
Manual reviews spotty

Level 2: Detected¶

Automated detection of violations
Alerts for drift
Reactive response

Level 3: Prevented¶

PR blocking for violations
Simulation before implementation
Proactive governance

Level 4: Optimized¶

Self-healing systems
Predictive drift detection
Continuous improvement

Substrate target: Level 3+ for all customers

Success Metrics¶

Governance Effectiveness¶

Metric	Baseline	Target	Best
Violations prevented	0%	90%	99%
Time to detect drift	Months	Minutes	Real-time
Policy compliance	Unknown	95%	99%
Memory coverage	20%	80%	95%
Audit prep time	80 hrs	4 hrs	1 hr

Business Outcomes¶

Outcome	Measurement
Reduced incidents	-50% architecture-related
Faster audits	-90% preparation time
Knowledge retention	+80% after turnover
Developer productivity	+30% (less review overhead)