Ollama Service

Local LLM inference for Semantic Judge cognitive analysis

Last Updated: January 15, 2026

Purpose

Ollama provides local LLM inference for: - Semantic Judge: Intent analysis of agent requests - Policy explanation: Human-readable policy decisions - Future: Oracle cost predictions

Key Feature: Runs locally for privacy and speed (<50ms inference).

Configuration

Variable	Required	Default	Description
`OLLAMA_HOST`	No	`http://localhost:11434`	Ollama API endpoint
`OLLAMA_MODEL`	No	`llama3.2`	Model to use

Connection

Environment	Connection
Local (Docker)	`http://localhost:11434`
Local (Native)	`http://localhost:11434`
Production	N/A (local sidecar or external API fallback)

Health Check

# Check Ollama is running
curl http://localhost:11434/api/tags

# Check model is loaded
curl http://localhost:11434/api/tags | jq '.models[] | select(.name | startswith("llama3.2"))'

# Test generation
curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "Say hello",
  "stream": false
}'

Model Management

# Pull model
ollama pull llama3.2

# List models
ollama list

# Remove model
ollama rm model-name

Semantic Judge Integration

The Semantic Judge uses Ollama for intent classification:

// Simplified flow
prompt := fmt.Sprintf(`Classify this agent request:
Request: %s
Categories: SAFE, RISKY, DANGEROUS
Respond with only the category.`, request)

response := ollama.Generate(prompt)

Failure Modes

Failure	Impact	Detection	Recovery
Ollama not running	Semantic Judge fails	Health check	Start Ollama
Model not loaded	Inference fails	Model check	Pull model
Slow inference	High latency	Latency metrics	Restart or use smaller model
OOM	Crashes	Memory metrics	Reduce context or batch size

Graceful Degradation: If Ollama is unavailable, Semantic Judge falls back to deterministic policy evaluation only.

Resource Requirements

Model	RAM	Disk	Inference Time
llama3.2 (3B)	4GB	2GB	~30ms
llama3.2 (7B)	8GB	4GB	~50ms
llama3.1 (70B)	48GB	40GB	~500ms

Security

Network: Ollama should NOT be exposed publicly
Model source: Only use official Ollama library models
Prompt injection: Semantic Judge sanitizes inputs

Back to Runbooks | Documentation