roadmap
Roadmap
Phase 1: Foundation COMPLETE
- Create godsfromthemachine GitHub organization
- Write vision document and project origins
- Create org profile repo with landing page
- Set up initial repos: gilgamesh, zeus, raijin, garuda
- Generate branding assets and visual identity
Phase 2: Gilgamesh v0.1 — Core Agent COMPLETE
- 7 built-in tools: read, write, edit, bash, grep, glob, test
- Streaming SSE client for OpenAI-compatible endpoints
- Interactive CLI REPL with slash commands
- One-shot mode (gilgamesh run "prompt")
- Multi-model profiles (fast/default/heavy)
- Skills system with argument substitution
- Hook system for pre/post tool execution
- Session logging (JSONL with distill)
- Loop detection and context compaction
- TDD-first system prompt
Phase 3: Gilgamesh v0.2 — CLI/MCP/API Duality COMPLETE
- MCP server mode (gilgamesh mcp) — JSON-RPC 2.0 over stdio
- HTTP API server mode (gilgamesh serve)
- REST endpoints: /api/health, /api/tools, /api/tools/{name}, /api/chat
- SSE streaming for /api/chat endpoint
- RunWithEvents() agent variant for non-terminal consumers
Phase 4: Gilgamesh v0.3 — Quality & Polish COMPLETE
- Unit tests for all 9 packages (agent, llm, tools, mcp, server, config, context, hooks, session)
- CI/CD with GitHub Actions (build + test on push/PR)
- Go benchmark tool (cmd/bench/) for model trialing
- Model trials documentation (TRIALS.md)
- Table-driven tests with edge cases for tools
- Graceful shutdown for HTTP server with request timeouts
- MCP protocol version negotiation (2025-03-26 + 2024-11-05)
- Version bump to v0.3.0
Phase 5: Gilgamesh v0.5 — Skills, Permissions & Multi-language COMPLETE
- Configurable tool permissions (allowed_tools/denied_tools)
- Multi-language test support (Go, Python, Rust, Zig, Node.js)
- Shell completion (bash, zsh, fish)
- 7 built-in skills embedded in binary (commit, review, explain, fix, refactor, doc, tdd)
- 3-tier skill precedence: built-in → global → project-local
- Version bump to v0.5.0
- Memory/context persistence across sessions (.gilgamesh/memory.json)
- Project-scoped conversation history (save/resume)
- Custom tool registration (.gilgamesh/tools.json)
- TDD workflow automation (/tdd built-in skill)
Phase 6: Gilgamesh v0.6 — UI & Developer Experience COMPLETE
- Command registry pattern replacing switch statement
- Graceful Ctrl+C via context.Context threading
- Streaming markdown renderer (headers, bold/italic, lists, blockquotes, code blocks)
- Accessible NoColor icon fallbacks
- Auto-sized table renderer and context pressure gauge
- Error classification with recovery hints
- Config validation and environment variable overrides
- Spinner with elapsed time display
- Terminal color profile detection (NO_COLOR, CLICOLOR, COLORTERM)
- 243 tests across 12 packages
Phase 7: Gilgamesh v1.0 — Intelligence FUTURE
- Multi-model routing — automatic 2B/4B selection based on task complexity
- Distill workflow — session → skill extraction
- Trials methodology documentation (docs/TRIAL_METHODOLOGY.md)
Phase 8: Broader Gods FUTURE
- Zeus: complete Zig build system for llama.cpp
- New god projects: security tools, agentic browsers
- Cross-god integration via MCP/HTTP
- Shared tool library across gods
- Publishing and packaging (GitHub releases, homebrew)
Running: Model Trialing & Benchmarking ONGOING
- Go benchmark tool (cmd/bench/main.go)
- Baseline benchmarks: Qwen3.5 2B Q4_K_M, 4B Q8_0
- Key findings documented (2B sweet spot, 0.8B rejected)
- Enhanced bench suite: config loading, raw llama-bench, JSON output, result persistence
- 4B Q4_K_M raw trial — same speed as Q8_0, saves 1.6GB disk
- Full -all comparison run: 2B vs 4B with summary table
- 4B Q4_K_M agent bench — outperforms Q8_0, recommend as heavy profile
- Thread tuning — 12 threads optimal on 16-core EPYC, 30% better TG
- Context length impact — 16K ctx saves 672MB, sufficient for agent
- Batch size tuning — b=256 optimal, b=512 regresses
- 9B Q8_0 agent benchmarks — not worth it, same efficiency as 4B but slower
- KV cache quantization — q4_0 saves 5-7% RAM, no quality loss
- Trial IQ4_XS / IQ3_M quants — smaller memory footprint
- New model families — Phi-4, Gemma 3, others
- Speculative decoding — draft model (0.8B) + verify (4B)
- Multi-model routing — simple tasks → 2B, complex → 4B
Principles
Every milestone must satisfy:
- Local-first — all intelligence from local AI models
- Zero external deps — Go stdlib only for gilgamesh
- CLI + MCP + API — every capability through all three interfaces
- Lean — minimal token overhead for CPU inference
- TDD — gilgamesh eats its own dogfood with comprehensive tests