Substrate Platform Documentation¶
Structural Integrity Platform for Software Teams¶
Substrate is a self-hosted governance workbench where software teams connect their code repositories, visualize architecture as a live graph, and query it through natural language.
What is Substrate?¶
Modern software teams struggle with two invisible problems:
- Structural Drift — The widening gap between architectural intent and production reality
- Memory Loss — The silent erosion of why the system was built the way it was
Substrate ingests source code from connected repositories, parses static dependencies, generates embeddings, and builds a unified knowledge graph that stays synchronized with your actual codebase.
Current Capabilities¶
| Capability | What It Means Today |
|---|---|
| Live Graph | Interactive Cytoscape visualization of code dependencies across one or more repository snapshots |
| Source Connectors | GitHub repository ingestion with automatic file discovery, import parsing, and dependency extraction |
| Semantic Search | Vector similarity search over file and chunk embeddings using local models |
| LLM Summaries | On-demand natural language summaries of any file in the graph |
| Sync Scheduling | Automatic periodic re-sync with configurable intervals |
| Multi-Snapshot Merge | Load and compare multiple sync snapshots simultaneously with divergence detection |
Quick Navigation¶
-
:material-hexagon-multiple:{ .lg .middle } Architecture
System architecture, data models, and deployment patterns
-
:material-cogs:{ .lg .middle } System Design
Detailed design of each service and component
-
:material-code-braces:{ .lg .middle } Developer Guide
API reference, environment variables, and frontend component docs
-
:material-presentation:{ .lg .middle } Business & Product
Pitch decks, target personas, and the problems Substrate solves
-
:material-map-marker-path:{ .lg .middle } Roadmap
Planned features and the transition to the Intended Graph (G_I)
Technology Highlights¶
- Fully Self-Hosted: All AI inference runs locally via lazy-lamacpp — zero data leaves your infrastructure
- PostgreSQL + Apache AGE: Single database for relational data, embeddings (pgvector), and graph queries (Cypher)
- No Mock Data: Every node, edge, and embedding comes from real repository analysis
- Fast Sync Cycle: Shallow clone → parse → embed → persist, typically completing in seconds to minutes depending on repo size
Getting Started¶
See the Architecture Overview to understand the system, or dive into System Design for detailed service documentation.