roadmap

Roadmap

Phase 1: Foundation COMPLETE
  • Create godsfromthemachine GitHub organization
  • Write vision document and project origins
  • Create org profile repo with landing page
  • Set up initial repos: gilgamesh, zeus, raijin, garuda
  • Generate branding assets and visual identity
Phase 2: Gilgamesh v0.1 — Core Agent COMPLETE
  • 7 built-in tools: read, write, edit, bash, grep, glob, test
  • Streaming SSE client for OpenAI-compatible endpoints
  • Interactive CLI REPL with slash commands
  • One-shot mode (gilgamesh run "prompt")
  • Multi-model profiles (fast/default/heavy)
  • Skills system with argument substitution
  • Hook system for pre/post tool execution
  • Session logging (JSONL with distill)
  • Loop detection and context compaction
  • TDD-first system prompt
Phase 3: Gilgamesh v0.2 — CLI/MCP/API Duality COMPLETE
  • MCP server mode (gilgamesh mcp) — JSON-RPC 2.0 over stdio
  • HTTP API server mode (gilgamesh serve)
  • REST endpoints: /api/health, /api/tools, /api/tools/{name}, /api/chat
  • SSE streaming for /api/chat endpoint
  • RunWithEvents() agent variant for non-terminal consumers
Phase 4: Gilgamesh v0.3 — Quality & Polish COMPLETE
  • Unit tests for all 9 packages (agent, llm, tools, mcp, server, config, context, hooks, session)
  • CI/CD with GitHub Actions (build + test on push/PR)
  • Go benchmark tool (cmd/bench/) for model trialing
  • Model trials documentation (TRIALS.md)
  • Table-driven tests with edge cases for tools
  • Graceful shutdown for HTTP server with request timeouts
  • MCP protocol version negotiation (2025-03-26 + 2024-11-05)
  • Version bump to v0.3.0
Phase 5: Gilgamesh v0.5 — Skills, Permissions & Multi-language COMPLETE
  • Configurable tool permissions (allowed_tools/denied_tools)
  • Multi-language test support (Go, Python, Rust, Zig, Node.js)
  • Shell completion (bash, zsh, fish)
  • 7 built-in skills embedded in binary (commit, review, explain, fix, refactor, doc, tdd)
  • 3-tier skill precedence: built-in → global → project-local
  • Version bump to v0.5.0
  • Memory/context persistence across sessions (.gilgamesh/memory.json)
  • Project-scoped conversation history (save/resume)
  • Custom tool registration (.gilgamesh/tools.json)
  • TDD workflow automation (/tdd built-in skill)
Phase 6: Gilgamesh v0.6 — UI & Developer Experience COMPLETE
  • Command registry pattern replacing switch statement
  • Graceful Ctrl+C via context.Context threading
  • Streaming markdown renderer (headers, bold/italic, lists, blockquotes, code blocks)
  • Accessible NoColor icon fallbacks
  • Auto-sized table renderer and context pressure gauge
  • Error classification with recovery hints
  • Config validation and environment variable overrides
  • Spinner with elapsed time display
  • Terminal color profile detection (NO_COLOR, CLICOLOR, COLORTERM)
  • 243 tests across 12 packages
Phase 7: Gilgamesh v1.0 — Intelligence FUTURE
  • Multi-model routing — automatic 2B/4B selection based on task complexity
  • Distill workflow — session → skill extraction
  • Trials methodology documentation (docs/TRIAL_METHODOLOGY.md)
Phase 8: Broader Gods FUTURE
  • Zeus: complete Zig build system for llama.cpp
  • New god projects: security tools, agentic browsers
  • Cross-god integration via MCP/HTTP
  • Shared tool library across gods
  • Publishing and packaging (GitHub releases, homebrew)
Running: Model Trialing & Benchmarking ONGOING
  • Go benchmark tool (cmd/bench/main.go)
  • Baseline benchmarks: Qwen3.5 2B Q4_K_M, 4B Q8_0
  • Key findings documented (2B sweet spot, 0.8B rejected)
  • Enhanced bench suite: config loading, raw llama-bench, JSON output, result persistence
  • 4B Q4_K_M raw trial — same speed as Q8_0, saves 1.6GB disk
  • Full -all comparison run: 2B vs 4B with summary table
  • 4B Q4_K_M agent bench — outperforms Q8_0, recommend as heavy profile
  • Thread tuning — 12 threads optimal on 16-core EPYC, 30% better TG
  • Context length impact — 16K ctx saves 672MB, sufficient for agent
  • Batch size tuning — b=256 optimal, b=512 regresses
  • 9B Q8_0 agent benchmarks — not worth it, same efficiency as 4B but slower
  • KV cache quantization — q4_0 saves 5-7% RAM, no quality loss
  • Trial IQ4_XS / IQ3_M quants — smaller memory footprint
  • New model families — Phi-4, Gemma 3, others
  • Speculative decoding — draft model (0.8B) + verify (4B)
  • Multi-model routing — simple tasks → 2B, complex → 4B

Principles

Every milestone must satisfy:

  1. Local-first — all intelligence from local AI models
  2. Zero external deps — Go stdlib only for gilgamesh
  3. CLI + MCP + API — every capability through all three interfaces
  4. Lean — minimal token overhead for CPU inference
  5. TDD — gilgamesh eats its own dogfood with comprehensive tests