chopratejas/headroom
Summary
Headroom is a transparent LLM context compression proxy that sits between your application and providers like OpenAI/Anthropic. It intercepts prompt messages and compresses tool outputs, JSON arrays, logs, code, and RAG results before they hit the model — using statistical analysis, AST-aware compression, and ML models — aiming to cut token costs 50-90% with minimal accuracy loss. It can run as a drop-in HTTP proxy (one env var change) or as a Python library with integrations for LangChain, LiteLLM, Agno, and AWS Strands.
Great for
people interested in LLM cost optimization infrastructure, specifically the engineering problem of compressing heterogeneous agent context (JSON tool outputs, logs, code, RAG results) without degrading answer quality — touching NLP compression algorithms, statistical anomaly detection, and LLM proxy architecture
Easy wins
- +Two labeled 'good first issue' tickets exist — check them; they're likely scoped integration or compressor tasks given the modular handler pattern in headroom/compression/handlers/
- +Add a new content type handler: the ContentRouter in content_router.py has a clean plugin pattern (CompressionStrategy enum + routing logic) — adding support for a new format like YAML or CSV would follow the existing SearchCompressor/LogCompressor pattern
- +Improve the 'experimental' LangChain integration (flagged as experimental in the README, lives in headroom/integrations/langchain/) — the integration pattern is already established in the Agno integration for reference
- +Benchmark coverage: headroom/evals/ has a full eval framework but some runners appear thin; adding a new scenario to benchmarks/scenarios/ would be self-contained and immediately testable with the existing CLI (python -m headroom.evals)
Red flags
- !Single-contributor reality vs. 7-contributor claim: contributor_count=7 but commit_frequency_30d=1 and commit_count=1 — this appears to be a solo project; the '7 contributors' figure likely includes bots or one-off PRs, which matters for assessing bus factor
- !proxy/server.py shells out to an external binary 'rtk' (headroom/bin/rtk) via subprocess with user-controlled PATH lookup — no validation of the binary path or output before JSON parsing, which is a minor but real supply-chain/injection surface
- !The docker-compose.yml spins up Neo4j with hardcoded credentials (NEO4J_AUTH=neo4j/password) — fine for dev, but this pattern often gets copy-pasted to staging
- !The README accuracy benchmarks (GSM8K: 0.870 baseline AND 0.870 headroom, N=100) are a suspiciously small sample and the delta of exactly 0.000 on math benchmarks warrants skepticism — the eval framework exists to reproduce this, which is good, but the headline numbers should be treated as directional
Code quality
The code is noticeably above average for a project this size. smart_crusher.py shows real algorithmic care — statistical ID vs score field detection using entropy, UUID structural checks, and sequential pattern analysis rather than name-matching heuristics, with explicit deprecation notices on the legacy regex methods. content_router.py has a clean two-tier cache with TTL eviction, proper metrics, and honest comments about tradeoffs ('No max-entries cap — TTL is the natural bound'). The test file for IntelligentContextManager explicitly calls out 'NO MOCKS for core logic' and tests atomicity of tool call/response pairs. Minor issues: proxy/server.py is very long (~900+ lines visible) and mixes concerns (cost tracking, cache stats, metrics) that could be further decomposed. The `_get_rtk_stats()` function shells out to an external binary with no documented security consideration.
What makes it unique
The core idea of a transparent compression proxy isn't new (LLMLingua, ContextCipher, various token-pruning papers exist), but Headroom's differentiation is pragmatic: it's one of the few projects that combines statistical JSON compression (SmartCrusher's variance-based anomaly preservation), AST-aware code compression via tree-sitter, a proper CCR (Compress-Cache-Retrieve) lossless layer, and provider-aware prefix cache optimization — all behind a single proxy endpoint that requires zero code changes. The TOIN (Tool Intelligence Network) telemetry that learns per-tool compression patterns over sessions is a genuinely novel angle. It's not academic — it's engineered for the Claude Code / agentic workflow use case specifically.
Scores
Barrier to entry
mediumThe codebase is well-structured with good docstrings and a CONTRIBUTING guide, but the compression logic in smart_crusher.py and content_router.py is genuinely complex (statistical field detection, multi-tier caching, TOIN telemetry integration), and the optional dependency tree ([proxy], [code], [ml], [memory], etc.) means getting a full dev environment running requires deliberate effort.