projects

Gilgamesh

Go ACTIVE — v0.6.0 GitHub

Gilgamesh is an interactive CLI agent that connects to a local llama.cpp server (or any OpenAI-compatible endpoint) and provides tool-calling capabilities for software engineering tasks. It promotes a test-driven development approach and is designed for CPU inference with small models by keeping total prompt overhead under ~1,600 tokens.

Features

  • 7 built-in tools: read, write, edit, bash, grep, glob, test
  • Multi-language testing: auto-detects Go, Python, Rust, Zig, Node.js projects
  • Configurable tool permissions: whitelist/blacklist tools per project
  • Custom tool registration: define project-specific tools in .gilgamesh/tools.json
  • Streaming SSE: tokens stream to terminal as they arrive
  • Multi-model profiles: switch between fast/default/heavy models mid-session
  • Skills system: 7 built-in skills + reusable prompt templates (.gilgamesh/skills/*.md)
  • Hook system: pre/post tool execution hooks (.gilgamesh/hooks.json)
  • Session logging: JSONL session logs with distill summaries
  • Memory persistence: project-scoped facts that persist across sessions
  • Conversation history: save and resume previous sessions (/resume, /sessions)
  • Loop detection: detects and breaks out of repeated tool calls
  • Context compaction: automatically trims old tool results to stay within context limits
  • Shell completion: bash, zsh, fish (gilgamesh completion bash)
  • Graceful Ctrl+C: cancel in-progress requests, double-Ctrl+C force quits
  • Markdown rendering: headers, bold/italic, lists, blockquotes, code blocks with syntax highlighting
  • Error classification: network, auth, timeout, LLM errors with recovery hints
  • Context gauge: visual progress bar showing context pressure (/status)
  • Config validation: startup warnings for invalid endpoints or missing models
  • Environment variable overrides: GILGAMESH_ACTIVE_MODEL, GILGAMESH_ENDPOINT, etc.
  • Accessible NoColor: text fallbacks for all Unicode icons when color is disabled
  • TDD-first: system prompt promotes writing tests before implementation

The CLI / MCP / API Duality

Gilgamesh embodies the core principle of Gods from the Machine: every capability is exposed through three interfaces.

CLI / MCP / API Architecture

CLI — For Humans

Interactive terminal REPL with slash commands, streaming output, and model switching.

./gilgamesh                              # interactive mode
./gilgamesh run "refactor this function" # one-shot mode
./gilgamesh -m heavy run "complex task"  # use heavy model

MCP Server — For Agents

JSON-RPC 2.0 over stdio, compatible with Claude Desktop and any MCP client.

./gilgamesh mcp
{
  "mcpServers": {
    "gilgamesh": {
      "command": "/path/to/gilgamesh",
      "args": ["mcp"]
    }
  }
}

HTTP API — For Programs

REST endpoints with SSE streaming for the chat interface.

./gilgamesh serve -p 7777
MethodPathDescription
GET/api/healthHealth check
GET/api/toolsList all tools with schemas
POST/api/tools/{name}Execute a tool (JSON body = args)
POST/api/chatAgent conversation (SSE streaming)

Architecture

gilgamesh/
├── main.go           # CLI entry, REPL, subcommand dispatch
├── agent/            # Core agent loop + event-based variant
├── llm/              # OpenAI-compatible streaming SSE client
├── tools/            # Tool registration, dispatch, 7 built-in tools
├── ui/               # Terminal UI (color, markdown, tables, gauges, errors, commands)
├── mcp/              # JSON-RPC 2.0 MCP server
├── server/           # HTTP API server
├── config/           # JSON config loader, validation, env var overrides
├── context/          # Project context + skills loader (7 built-in via go:embed)
├── memory/           # Project-scoped persistent memory
├── hooks/            # Pre/post tool execution hooks
├── session/          # JSONL session logging + conversation history
└── cmd/bench/        # Go model benchmark tool (6-stage pipeline)

Token Budget

The critical constraint for CPU inference:

ComponentTokens
System prompt~300
7 tool definitions~800
Project context~500 (capped)
Total overhead~1,600

At ~160 tok/s prompt processing (Qwen3.5-2B Q4_K_M, 12 threads), the first response arrives in ~10 seconds on CPU.

Benchmarking & Model Trials

Gilgamesh includes a pure Go benchmark suite for trialing local models. It loads profiles from config, integrates with llama-bench, and supports JSON output for historical tracking:

go run ./cmd/bench              # benchmark active profile from config
go run ./cmd/bench -all         # benchmark all profiles + summary table
go run ./cmd/bench -raw         # include raw llama-bench pp/tg metrics
go run ./cmd/bench -json        # JSON output for scripting
go run ./cmd/bench -save r.json # append to JSON log for tracking

Measures 6 dimensions: health, raw inference (pp/tg tok/s), minimal prompt, tool call parsing, one-shot agent, and full edit task. Results are tracked in TRIALS.md.

Key Findings

ModelPP (tok/s)TG (tok/s)First ResponseVerdict
Qwen3.5-2B Q4_K_M17219~7sSweet spot — default
Qwen3.5-4B Q4_K_M669.6~20sQuality ceiling — heavy
Qwen3.5-0.8BRejected — too unreliable

Quick Start

# Build
git clone https://github.com/godsfromthemachine/gilgamesh
cd gilgamesh && go build -o gilgamesh .

# Configure (create gilgamesh.json)
cat > gilgamesh.json << 'EOF'
{
  "models": {
    "default": {
      "name": "qwen3.5-2b",
      "endpoint": "http://127.0.0.1:8081/v1",
      "api_key": "sk-local"
    }
  },
  "active_model": "default"
}
EOF

# Run
./gilgamesh