Skip to content

Ollama Service

Local LLM inference for Semantic Judge cognitive analysis

Last Updated: January 15, 2026


Purpose

Ollama provides local LLM inference for: - Semantic Judge: Intent analysis of agent requests - Policy explanation: Human-readable policy decisions - Future: Oracle cost predictions

Key Feature: Runs locally for privacy and speed (<50ms inference).


Configuration

Variable Required Default Description
OLLAMA_HOST No http://localhost:11434 Ollama API endpoint
OLLAMA_MODEL No llama3.2 Model to use

Connection

Environment Connection
Local (Docker) http://localhost:11434
Local (Native) http://localhost:11434
Production N/A (local sidecar or external API fallback)

Health Check

# Check Ollama is running
curl http://localhost:11434/api/tags

# Check model is loaded
curl http://localhost:11434/api/tags | jq '.models[] | select(.name | startswith("llama3.2"))'

# Test generation
curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "Say hello",
  "stream": false
}'

Model Management

# Pull model
ollama pull llama3.2

# List models
ollama list

# Remove model
ollama rm model-name

Semantic Judge Integration

The Semantic Judge uses Ollama for intent classification:

// Simplified flow
prompt := fmt.Sprintf(`Classify this agent request:
Request: %s
Categories: SAFE, RISKY, DANGEROUS
Respond with only the category.`, request)

response := ollama.Generate(prompt)

Failure Modes

Failure Impact Detection Recovery
Ollama not running Semantic Judge fails Health check Start Ollama
Model not loaded Inference fails Model check Pull model
Slow inference High latency Latency metrics Restart or use smaller model
OOM Crashes Memory metrics Reduce context or batch size

Graceful Degradation: If Ollama is unavailable, Semantic Judge falls back to deterministic policy evaluation only.


Resource Requirements

Model RAM Disk Inference Time
llama3.2 (3B) 4GB 2GB ~30ms
llama3.2 (7B) 8GB 4GB ~50ms
llama3.1 (70B) 48GB 40GB ~500ms

Security

  • Network: Ollama should NOT be exposed publicly
  • Model source: Only use official Ollama library models
  • Prompt injection: Semantic Judge sanitizes inputs

Back to Runbooks | Documentation