Gilgamesh
Gilgamesh is an interactive CLI agent that connects to a local llama.cpp server (or any OpenAI-compatible endpoint) and provides tool-calling capabilities for software engineering tasks. It promotes a test-driven development approach and is designed for CPU inference with small models by keeping total prompt overhead under ~1,600 tokens.
Features
- 7 built-in tools: read, write, edit, bash, grep, glob, test
- Multi-language testing: auto-detects Go, Python, Rust, Zig, Node.js projects
- Configurable tool permissions: whitelist/blacklist tools per project
- Custom tool registration: define project-specific tools in
.gilgamesh/tools.json - Streaming SSE: tokens stream to terminal as they arrive
- Multi-model profiles: switch between fast/default/heavy models mid-session
- Skills system: 7 built-in skills + reusable prompt templates (
.gilgamesh/skills/*.md) - Hook system: pre/post tool execution hooks (
.gilgamesh/hooks.json) - Session logging: JSONL session logs with distill summaries
- Memory persistence: project-scoped facts that persist across sessions
- Conversation history: save and resume previous sessions (
/resume,/sessions) - Loop detection: detects and breaks out of repeated tool calls
- Context compaction: automatically trims old tool results to stay within context limits
- Shell completion: bash, zsh, fish (
gilgamesh completion bash) - Graceful Ctrl+C: cancel in-progress requests, double-Ctrl+C force quits
- Markdown rendering: headers, bold/italic, lists, blockquotes, code blocks with syntax highlighting
- Error classification: network, auth, timeout, LLM errors with recovery hints
- Context gauge: visual progress bar showing context pressure (
/status) - Config validation: startup warnings for invalid endpoints or missing models
- Environment variable overrides: GILGAMESH_ACTIVE_MODEL, GILGAMESH_ENDPOINT, etc.
- Accessible NoColor: text fallbacks for all Unicode icons when color is disabled
- TDD-first: system prompt promotes writing tests before implementation
The CLI / MCP / API Duality
Gilgamesh embodies the core principle of Gods from the Machine: every capability is exposed through three interfaces.
CLI — For Humans
Interactive terminal REPL with slash commands, streaming output, and model switching.
./gilgamesh # interactive mode
./gilgamesh run "refactor this function" # one-shot mode
./gilgamesh -m heavy run "complex task" # use heavy model
MCP Server — For Agents
JSON-RPC 2.0 over stdio, compatible with Claude Desktop and any MCP client.
./gilgamesh mcp
{
"mcpServers": {
"gilgamesh": {
"command": "/path/to/gilgamesh",
"args": ["mcp"]
}
}
}
HTTP API — For Programs
REST endpoints with SSE streaming for the chat interface.
./gilgamesh serve -p 7777
| Method | Path | Description |
|---|---|---|
| GET | /api/health | Health check |
| GET | /api/tools | List all tools with schemas |
| POST | /api/tools/{name} | Execute a tool (JSON body = args) |
| POST | /api/chat | Agent conversation (SSE streaming) |
Architecture
gilgamesh/
├── main.go # CLI entry, REPL, subcommand dispatch
├── agent/ # Core agent loop + event-based variant
├── llm/ # OpenAI-compatible streaming SSE client
├── tools/ # Tool registration, dispatch, 7 built-in tools
├── ui/ # Terminal UI (color, markdown, tables, gauges, errors, commands)
├── mcp/ # JSON-RPC 2.0 MCP server
├── server/ # HTTP API server
├── config/ # JSON config loader, validation, env var overrides
├── context/ # Project context + skills loader (7 built-in via go:embed)
├── memory/ # Project-scoped persistent memory
├── hooks/ # Pre/post tool execution hooks
├── session/ # JSONL session logging + conversation history
└── cmd/bench/ # Go model benchmark tool (6-stage pipeline)
Token Budget
The critical constraint for CPU inference:
| Component | Tokens |
|---|---|
| System prompt | ~300 |
| 7 tool definitions | ~800 |
| Project context | ~500 (capped) |
| Total overhead | ~1,600 |
At ~160 tok/s prompt processing (Qwen3.5-2B Q4_K_M, 12 threads), the first response arrives in ~10 seconds on CPU.
Benchmarking & Model Trials
Gilgamesh includes a pure Go benchmark suite for trialing local models. It loads profiles from config, integrates with llama-bench, and supports JSON output for historical tracking:
go run ./cmd/bench # benchmark active profile from config
go run ./cmd/bench -all # benchmark all profiles + summary table
go run ./cmd/bench -raw # include raw llama-bench pp/tg metrics
go run ./cmd/bench -json # JSON output for scripting
go run ./cmd/bench -save r.json # append to JSON log for tracking
Measures 6 dimensions: health, raw inference (pp/tg tok/s), minimal prompt, tool call parsing, one-shot agent, and full edit task. Results are tracked in TRIALS.md.
Key Findings
| Model | PP (tok/s) | TG (tok/s) | First Response | Verdict |
|---|---|---|---|---|
| Qwen3.5-2B Q4_K_M | 172 | 19 | ~7s | Sweet spot — default |
| Qwen3.5-4B Q4_K_M | 66 | 9.6 | ~20s | Quality ceiling — heavy |
| Qwen3.5-0.8B | — | — | — | Rejected — too unreliable |
Quick Start
# Build
git clone https://github.com/godsfromthemachine/gilgamesh
cd gilgamesh && go build -o gilgamesh .
# Configure (create gilgamesh.json)
cat > gilgamesh.json << 'EOF'
{
"models": {
"default": {
"name": "qwen3.5-2b",
"endpoint": "http://127.0.0.1:8081/v1",
"api_key": "sk-local"
}
},
"active_model": "default"
}
EOF
# Run
./gilgamesh