garrytan/gstack
Summary
gstack is a collection of Claude Code slash-command 'skills' (Markdown prompt templates) plus a compiled headless browser CLI built on Playwright. The browser binary (browse) is a TypeScript/Bun CLI that Claude Code uses as a tool to navigate pages, take screenshots, extract DOM content, and run QA passes. The workflow skills (/plan-ceo-review, /review, /ship, /qa, etc.) are opinionated prompt templates that instruct Claude to adopt specific roles during different phases of software development.
Great for
building and extending opinionated AI-driven dev workflows on top of Claude Code, particularly the browser automation layer that connects headless Playwright to an LLM agent loop
Easy wins
- +Add test fixtures in browse/test/fixtures/ — the existing fixture HTML files (basic.html, forms.html, spa.html) are referenced but not shown in the tree; adding edge-case fixtures (infinite scroll, shadow DOM, iframes) would directly improve QA coverage
- +There are two TODO files (TODO.md and TODOS.md) at the repo root — consolidating or triaging these into GitHub issues would help new contributors find work
- +The eval baseline pinning test (test/skill-llm-eval.test.ts lines ~200-240) has a manual UPDATE_BASELINES env var workflow; this could be automated as a CI step on main
- +cookie-import-browser.ts imports cookies from Chrome/Arc/Brave/Edge — adding Firefox support is a well-scoped, self-contained addition with an existing test file at browse/test/cookie-import-browser.test.ts
Red flags
- !Only 1 commit in history (commit_count: 1, contributor_count: 1 per API data) — despite 10k stars and 1.2k forks, this appears to be a repo initialized as a single squashed commit, which makes contribution history opaque and git blame useless
- !The install instructions tell users to run an arbitrary `./setup` script fetched via git clone directly into ~/.claude/skills/ — no checksum verification, no review step; this is a supply chain risk for a repo with 10k stars
- !SKILL.md files are generated from .tmpl files but BOTH are committed — easy to get them out of sync; there's a gen-skill-docs.test.ts to catch this, but the CLAUDE.md development guide buries the 'don't edit .md directly' rule
- !The eval E2E tests use `spawnSync('sh', ['-c', 'echo ping | claude -p ...'])` to check API connectivity — this requires the `claude` CLI to be in PATH, which is an undocumented prerequisite for running the test suite
- !No package-lock.json or bun.lockb visible in the file tree — dependency versions like playwright ^1.58.2 will float on fresh installs
Code quality
The browse source is well-structured: commands.ts as single source of truth, clean separation into read-commands/write-commands/meta-commands, and snapshot.ts centralizing the SNAPSHOT_FLAGS metadata. The test suite is unusually mature for a 2-contributor project — three tiers (static validation, E2E via real Claude sessions, LLM-as-judge quality scores) with incremental eval persistence, auto-comparison against previous runs, and cost tracking. The screenshot path validation in commands.test.ts shows security-conscious thinking (validates paths are within allowed directories). One weak spot: afterAll in commands.test.ts uses a `setTimeout(() => process.exit(0), 500)` hack to avoid browser hang, which is a known fragility the comments acknowledge.
What makes it unique
The prompt-template skill approach is not novel (many Claude Code snippet repos exist), but the browse binary is the genuinely interesting piece: it's a compiled Playwright CLI designed specifically as an AI agent tool, with ref-based element addressing (@e1, @c3), snapshot diffing, annotated screenshots, and a structured command protocol. The three-tier eval infrastructure (static → E2E → LLM judge with cost tracking and auto-regression comparison) is more sophisticated than most AI tooling repos of this size. The 'Conductor' parallel-sessions angle in the README is marketing for an external product, not part of this repo.
Scores
Barrier to entry
mediumThe browse binary and test suite require Bun, Playwright, and an ANTHROPIC_API_KEY for paid evals (~$4/run); free unit tests work fine, but meaningful contributions to the QA/review skills require running E2E tests against real Claude sessions, which has real cost.