Back to Explore

garrytan/gstack

TypeScript10,2821,22734 issues2 contributorsMIT
View on GitHub

Summary

gstack is a collection of Claude Code slash-command 'skills' (Markdown prompt templates) plus a compiled headless browser CLI built on Playwright. The browser binary (browse) is a TypeScript/Bun CLI that Claude Code uses as a tool to navigate pages, take screenshots, extract DOM content, and run QA passes. The workflow skills (/plan-ceo-review, /review, /ship, /qa, etc.) are opinionated prompt templates that instruct Claude to adopt specific roles during different phases of software development.

Great for

building and extending opinionated AI-driven dev workflows on top of Claude Code, particularly the browser automation layer that connects headless Playwright to an LLM agent loop

Easy wins

  • +Add test fixtures in browse/test/fixtures/ — the existing fixture HTML files (basic.html, forms.html, spa.html) are referenced but not shown in the tree; adding edge-case fixtures (infinite scroll, shadow DOM, iframes) would directly improve QA coverage
  • +There are two TODO files (TODO.md and TODOS.md) at the repo root — consolidating or triaging these into GitHub issues would help new contributors find work
  • +The eval baseline pinning test (test/skill-llm-eval.test.ts lines ~200-240) has a manual UPDATE_BASELINES env var workflow; this could be automated as a CI step on main
  • +cookie-import-browser.ts imports cookies from Chrome/Arc/Brave/Edge — adding Firefox support is a well-scoped, self-contained addition with an existing test file at browse/test/cookie-import-browser.test.ts

Red flags

  • !Only 1 commit in history (commit_count: 1, contributor_count: 1 per API data) — despite 10k stars and 1.2k forks, this appears to be a repo initialized as a single squashed commit, which makes contribution history opaque and git blame useless
  • !The install instructions tell users to run an arbitrary `./setup` script fetched via git clone directly into ~/.claude/skills/ — no checksum verification, no review step; this is a supply chain risk for a repo with 10k stars
  • !SKILL.md files are generated from .tmpl files but BOTH are committed — easy to get them out of sync; there's a gen-skill-docs.test.ts to catch this, but the CLAUDE.md development guide buries the 'don't edit .md directly' rule
  • !The eval E2E tests use `spawnSync('sh', ['-c', 'echo ping | claude -p ...'])` to check API connectivity — this requires the `claude` CLI to be in PATH, which is an undocumented prerequisite for running the test suite
  • !No package-lock.json or bun.lockb visible in the file tree — dependency versions like playwright ^1.58.2 will float on fresh installs

Code quality

good

The browse source is well-structured: commands.ts as single source of truth, clean separation into read-commands/write-commands/meta-commands, and snapshot.ts centralizing the SNAPSHOT_FLAGS metadata. The test suite is unusually mature for a 2-contributor project — three tiers (static validation, E2E via real Claude sessions, LLM-as-judge quality scores) with incremental eval persistence, auto-comparison against previous runs, and cost tracking. The screenshot path validation in commands.test.ts shows security-conscious thinking (validates paths are within allowed directories). One weak spot: afterAll in commands.test.ts uses a `setTimeout(() => process.exit(0), 500)` hack to avoid browser hang, which is a known fragility the comments acknowledge.

What makes it unique

The prompt-template skill approach is not novel (many Claude Code snippet repos exist), but the browse binary is the genuinely interesting piece: it's a compiled Playwright CLI designed specifically as an AI agent tool, with ref-based element addressing (@e1, @c3), snapshot diffing, annotated screenshots, and a structured command protocol. The three-tier eval infrastructure (static → E2E → LLM judge with cost tracking and auto-regression comparison) is more sophisticated than most AI tooling repos of this size. The 'Conductor' parallel-sessions angle in the README is marketing for an external product, not part of this repo.

Scores

Collab
7
Activity
8

Barrier to entry

medium

The browse binary and test suite require Bun, Playwright, and an ANTHROPIC_API_KEY for paid evals (~$4/run); free unit tests work fine, but meaningful contributions to the QA/review skills require running E2E tests against real Claude sessions, which has real cost.

Skills needed

TypeScript (Bun runtime, not Node)Playwright browser automationCLI tool design (stdin/stdout protocols, binary compilation with bun build --compile)Prompt engineering for LLM agent toolsTesting with LLM-as-judge patterns (eval infrastructure is non-trivial)Git workflow automation (the /ship skill orchestrates git operations)