Skip to content

Local LLM & Embedding Setup

Substrate relies on lazy-lamacpp for local, privacy-first AI inference. This ensures that your source code never leaves your infrastructure.


Required Models

Model Role Service Name Default Port Model Recommendation
Embeddings llamacpp-model@embeddings 8101 Qwen3-Embedding-0.6B
Summaries llamacpp-model@dense 8102 Qwen2.5-7B-Instruct

Verification

You can verify the status of the local LLM stack using the make commands in the lazy-lamacpp directory:

cd ~/github/lazy-lamacpp
make status MODEL=embeddings
make status MODEL=dense

If a service is inactive, start it:

make start MODEL=embeddings


Common LLM Issues

1. "400 Bad Request" (Embedding Service)

Symptom: Ingestion logs show embed_batch_failed status=400. Reason: This usually occurs when a single file's content exceeds the model's context window or the server's input buffer. Substrate Handling: The ingestion service implements a Bisection Logic for embeddings: - If a batch of 32 files fails with a 400, it splits the batch into two groups of 16. - It recursively narrows down until it identifies the specific "poison-pill" file. - The offending file is recorded as having None for its embedding, allowing the rest of the sync to continue.

2. Slow Summary Generation

Symptom: "Generating summary..." in the UI takes >10 seconds. Reason: Dense models (7B+) are compute-intensive. If running on CPU, generation will be slow. Solution: Ensure the model is being offloaded to a GPU. Check the lazy-lamacpp/config/<model>.env for -ngl (number of GPU layers) settings.

3. Model Memory Contention

Symptom: One model crashes when another starts. Reason: Insufficient VRAM to hold multiple models simultaneously. Solution: Substrate uses an on-demand loading strategy. You can stop unused models using make stop MODEL=<name>.