Nubaeon/empirica

Python190226 contributorsMIT

Summary

Empirica is a Python CLI tool and MCP server that wraps AI coding sessions (primarily Claude Code) with an 'epistemic measurement' layer — tracking confidence vectors, forcing investigation before code edits via a Sentinel gate, and persisting findings/unknowns/dead-ends across sessions in SQLite. It injects structured self-assessment prompts into AI workflows and surfaces a real-time statusline showing confidence scores. Think of it as a structured metacognition harness for AI agents.

Great for

people interested in AI reliability tooling, specifically instrumenting LLM coding agents with structured confidence tracking, persistent cross-session memory, and gating mechanisms that prevent premature code edits

Easy wins

+Add a GitHub Actions CI workflow — the Makefile already has `make ci` defined, pyproject.toml has full pytest/ruff/pyright config, but there's no .github/workflows/*.yml file at all
+Write tests for the MCP server tool definitions in empirica-mcp/empirica_mcp/server.py — the 100+ tool definitions are completely untested based on the file tree showing tests/ exists but no mcp-specific test files
+Fix the duplicate/inconsistent module structure: there are both empirica/integration/ and empirica/integrations/ directories, suggesting a refactor was started but not completed
+Add type annotations to the bare except clauses throughout workflow_commands.py — there are dozens of `except Exception: pass` blocks with no logging or error context

Red flags

!Single commit history despite claiming v1.6.7 with extensive development history — this is almost certainly a squashed or re-initialized repo, making it impossible to understand the actual development history or evaluate commit hygiene
!docker-compose.yml hardcodes a personal home directory path: `/home/yogapad/.empirica:/root/.empirica:ro` — this is a developer's machine-specific path committed to the repo
!pyproject.toml references CVE fixes with future dates: 'CVE-2026-27205', 'CVE-2026-27199', 'CVE-2026-24049' — CVEs with 2026 dates in a 2025/2026 repo are suspicious and suggest fabricated security justifications for dependency versions
!No CI/CD pipeline despite having full test infrastructure configured — the tests may not actually pass
!contributor_count shows 6 in metadata but contributor_count in the actual analysis shows 1 — suggests the GitHub API data may be inflated or forks are being counted
!The README claims 'emerged from 600+ real working sessions' but there is 1 commit — no way to verify this claim
!empirica/config/mcp_security.yaml exists but the security model in docker-compose.yml hardcodes credentials paths and mounts the entire project directory read-only into containers, which is a questionable security boundary

Code quality

decent

The architecture is genuinely thoughtful — repository pattern in session_database.py, dialect-aware schema adaptation, lazy-loading to avoid circular imports, and a proper migration runner. However, workflow_commands.py has ~15 silent `except Exception: pass` blocks that swallow errors with no observability, the `_auto_bootstrap` function calls subprocess('empirica') creating a self-referential process spawn that could fail silently in many environments, and there are deprecated methods (log_preflight_assessment, log_check_phase_assessment) still present with full implementations rather than just raising DeprecationWarning. The docs_commands.py shows good separation but uses bare `except Exception: pass` throughout AST parsing without logging.

What makes it unique

The core concept — adding structured epistemic state tracking (confidence vectors, Sentinel gates, noetic/praxic phase separation) as a middleware layer over AI coding agents — is genuinely novel and not a clone of anything obvious. Most AI reliability tooling focuses on output validation rather than mid-session confidence gating. However, the practical value depends entirely on whether LLMs actually self-assess accurately when prompted by this system, which is an open research question the README doesn't address. The 13-vector system is interesting but appears to be empirically derived by one developer rather than grounded in published research.

Scores

Collab

Activity

Barrier to entry

high

The commit history shows exactly 1 commit (despite claiming v1.6.7 with 600+ sessions of development), no CI pipeline, 0 good-first-issues, and the codebase requires understanding a multi-layered proprietary framework (CASCADE, Sentinel, noetic/praxic split, 13 epistemic vectors) before any contribution makes sense — there's no onboarding path for contributors beyond the end-user docs.

Skills needed

Python 3.10+ (async/await, dataclasses, Pydantic v2)SQLite schema design and migrationsCLI design with argparseMCP (Model Context Protocol) server developmentUnderstanding of AI agent workflows and prompt engineeringDocker/containerization basics for the multi-agent setup