samuelfaj/distill

TypeScript299162 issues3 contributors

Summary

distill is a CLI pipe tool that compresses verbose command outputs (test logs, git diffs, terraform plans, etc.) before they're consumed by an LLM agent, saving tokens by using a local or remote LLM to summarize the raw output. It auto-detects whether input is a batch (summarize once), a watch loop (recurring diff summaries), or an interactive prompt (pass through unchanged). It ships as platform-specific native binaries via npm optional dependencies.

Great for

building token-efficient LLM agent workflows, especially people integrating tools like Claude Code, Codex, or OpenCode into CI/heavy-output shell pipelines

Easy wins

+Add a 'windows pipe passthrough' smoke test — the e2e suite skips several Unix-only tests on win32 and there's no Windows-specific coverage for the interactive prompt path
+Implement streaming summarization output so the user sees the distilled answer as it's generated rather than waiting for the full summary — the summarizer interface (summarizeBatch/summarizeWatch) currently buffers everything before writing
+Add a '--dry-run' or '--count-tokens' flag that shows how many tokens were in the raw input vs the summary, surfacing the stat from the README claim in actual usage
+The project has no CONTRIBUTING.md and no issue templates — writing these is a concrete, mergeable contribution that would genuinely help given the 0 good-first-issues label

Red flags

!No license — the repo has no LICENSE file, which means contributors technically cannot legally use or redistribute the code. This is a significant blocker for any serious collaboration or downstream usage.
!commit_count: 1 and contributor_count: 1 in the metadata despite showing 3 contributors and 299 stars — likely the project was recently force-pushed or squashed, making git history useless for understanding evolution
!bun.lock and package-lock.json both present in the root — dual lockfiles suggest the toolchain (npm vs bun) is inconsistent; build scripts use bun but package.json specifies 'packageManager: npm@11.7.0', which could cause environment-specific breakage for contributors who run 'npm test' vs 'bun test'
!The 'thinking' feature (likely chain-of-thought mode for Ollama's /api/generate) is passed as a top-level 'think' field in the request body — this is Ollama-specific and silently does nothing for OpenAI-compatible providers, with no warning to the user

Code quality

good

stream-distiller.ts is the architectural core and is impressively well-structured: clean state machine (undecided/watch/interactive), explicit timer management, a serialized promise queue for watch renders, and good fallback logic when distillation produces empty/bad output. The config.ts is thorough — provider aliases, env var precedence, runtime validation with typed errors, no magic strings. The e2e tests use a real fake HTTP server (Bun.serve) rather than mocks, which is a strong signal. The one rough edge is that structuralSimilarity in text.ts (used to decide watch promotion) is called but not visible in the samples — its quality is unknown and the 0.55 threshold is a magic number with no comment justifying it.

What makes it unique

The idea of piping shell output through an LLM summarizer is not new (various shell plugins and agent frameworks do this), but distill's specific niche — auto-detecting watch vs batch vs interactive modes, falling back gracefully when the LLM is unavailable, and targeting agent instruction files (AGENTS.md etc.) rather than human users — is a genuinely differentiated angle. The platform binary distribution pattern via npm optional dependencies is well-executed and not commonly seen in projects this small.

Scores

Collab

Activity

Barrier to entry

low

The codebase is small (~10 focused source files in src/), well-tested with Bun's test runner, has CI, and the architecture is straightforward — a new contributor can read all of src/ in under an hour and the test suite clearly illustrates every behavioral contract.

Skills needed

TypeScript (the entire codebase is TS)Bun runtime (test runner, build scripts, binary bundling)Node.js streams and process I/O (stdin/stdout piping, TTY detection)LLM API familiarity (OpenAI-compatible and Ollama HTTP APIs)npm workspaces and platform-specific optional package patterns