Ollama Service
Local LLM inference for Semantic Judge cognitive analysis
Last Updated: January 15, 2026
Purpose
Ollama provides local LLM inference for: - Semantic Judge: Intent analysis of agent requests - Policy explanation: Human-readable policy decisions - Future: Oracle cost predictions
Key Feature: Runs locally for privacy and speed (<50ms inference).
Configuration
| Variable | Required | Default | Description |
|---|---|---|---|
OLLAMA_HOST |
No | http://localhost:11434 |
Ollama API endpoint |
OLLAMA_MODEL |
No | llama3.2 |
Model to use |
Connection
| Environment | Connection |
|---|---|
| Local (Docker) | http://localhost:11434 |
| Local (Native) | http://localhost:11434 |
| Production | N/A (local sidecar or external API fallback) |
Health Check
# Check Ollama is running
curl http://localhost:11434/api/tags
# Check model is loaded
curl http://localhost:11434/api/tags | jq '.models[] | select(.name | startswith("llama3.2"))'
# Test generation
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "Say hello",
"stream": false
}'
Model Management
Semantic Judge Integration
The Semantic Judge uses Ollama for intent classification:
// Simplified flow
prompt := fmt.Sprintf(`Classify this agent request:
Request: %s
Categories: SAFE, RISKY, DANGEROUS
Respond with only the category.`, request)
response := ollama.Generate(prompt)
Failure Modes
| Failure | Impact | Detection | Recovery |
|---|---|---|---|
| Ollama not running | Semantic Judge fails | Health check | Start Ollama |
| Model not loaded | Inference fails | Model check | Pull model |
| Slow inference | High latency | Latency metrics | Restart or use smaller model |
| OOM | Crashes | Memory metrics | Reduce context or batch size |
Graceful Degradation: If Ollama is unavailable, Semantic Judge falls back to deterministic policy evaluation only.
Resource Requirements
| Model | RAM | Disk | Inference Time |
|---|---|---|---|
| llama3.2 (3B) | 4GB | 2GB | ~30ms |
| llama3.2 (7B) | 8GB | 4GB | ~50ms |
| llama3.1 (70B) | 48GB | 40GB | ~500ms |
Security
- Network: Ollama should NOT be exposed publicly
- Model source: Only use official Ollama library models
- Prompt injection: Semantic Judge sanitizes inputs
Back to Runbooks | Documentation