kunal12203/codex-cli-compact
Summary
Dual-Graph (codex-cli-compact) is a context pre-loading layer for Claude Code and Codex CLI that scans your codebase into a semantic graph, then automatically injects the most relevant files into each prompt before Claude sees it. It runs as a local MCP server exposing tools like graph_read, graph_retrieve, and graph_continue, plus a token-tracking dashboard at localhost:8899. The goal is reducing token costs by 30-45% by eliminating exploratory tool calls Claude would otherwise make itself.
Great for
Great for people interested in building semantic context retrieval systems for AI coding assistants — specifically the problem of efficiently mapping codebases into graphs and deciding what to pre-load into LLM prompts to minimize token burn.
Easy wins
- +Remove hardcoded absolute path '/Users/krishnakant/.dual-graph/venv/bin/python3' in benchmark/run_preinjection_benchmark.py shebang and replace with env-based Python resolution
- +Fix the obvious bug in dashboard/server.py record_token_event() where 'total' is referenced on the last line but never defined in that scope (will always raise NameError)
- +Add a requirements-dev.txt or pyproject.toml with pinned versions — currently requirements.txt has no version pins at all (mcp>=1.3.0 etc.) making reproducible installs impossible
- +Write even basic smoke tests for graph_builder.py and context_packer.py — the project has zero tests despite having a /bench and /benchmark directory full of evaluation scripts
Red flags
- !dashboard/server.py record_token_event() has a definite NameError: 'total' is used in write_json({'ok': True, 'cost_usd': event['cost_usd'], 'total_tokens': total}) but 'total' is never defined in that method — this endpoint will always crash at runtime
- !benchmark/run_preinjection_benchmark.py has a hardcoded shebang '#!/Users/krishnakant/.dual-graph/venv/bin/python3' — indicates code was committed directly from a personal dev machine without sanitization
- !bin/identity.json is checked into the repo — unclear what it contains but named suggestively; warrants inspection before running
- !README claims '30-45% cheaper' but the benchmark methodology uses a regex-based quality scorer (not human evaluation), and the benchmark only covers one specific 'restaurant CRM' test project — external validity is unproven
- !Auto-update on every launch (confirmed in README) combined with no lockfiles means any update could silently change behavior; the heartbeat sending machine_id + platform is opt-out only
- !No license file — unclear terms for contributing or using the code commercially
- !Single commit history despite version 3.8.55 claimed in README — the full development history is absent, making it impossible to understand how the codebase evolved
Code quality
dashboard/server.py has an obvious NameError bug: record_token_event() references 'total' in the final write_json call but 'total' is never assigned in that function scope. mcp_graph_server.py is ~1000+ lines of densely packed logic with minimal abstraction — TURN_STATE is a mutable global dict used as process-local state which will break under any concurrent usage. The benchmark scripts (run_preinjection_benchmark.py, run_challenge_v3835.py) have hardcoded absolute paths to a specific developer's home directory, making them non-portable. Error handling is inconsistent: some paths have broad 'except Exception' swallowing all errors silently, while others raise RuntimeError. The memory/context-store logic in mcp_graph_server.py is genuinely sophisticated (upsert with identity matching, staleness tracking, pruning) but is buried without documentation.
What makes it unique
The core idea — building a semantic codebase graph and using it to pre-inject relevant context before LLM prompts, rather than letting the LLM explore lazily — is a legitimate and relatively novel approach compared to naive RAG or just giving Claude all tools. The 'compounding context' angle (prioritizing files from recent turns) is also interesting. However, the quality of the open-source release is far below the sophistication of the underlying ideas: it reads like a personal tool dump rather than a designed-for-contribution project.
Scores
Barrier to entry
highSingle contributor with 1 commit, no CONTRIBUTING guide, no tests at all, no good-first-issues, and the benchmark scripts have hardcoded paths to a specific developer's machine (/Users/krishnakant/.dual-graph/venv/bin/python3) — a new contributor would need to reverse-engineer the architecture entirely from source.