kayba-ai/agentic-context-engine

Python1,9962558 issues27 contributorsMIT

Summary

ACE (Agentic Context Engine) is a Python framework that enables AI agents to learn from their own execution traces without fine-tuning. It maintains a 'Skillbook' — a living prompt document of learned strategies — updated after each task by a Reflector component that programmatically analyzes traces via a sandboxed Python REPL, and a SkillManager that curates the resulting strategies. It wraps popular agent frameworks (LiteLLM, LangChain, browser-use, Claude Code) and injects the evolving Skillbook into the agent's system prompt.

Great for

people interested in in-context learning for AI agents — specifically the problem of making agents accumulate and reuse procedural knowledge across task executions without any model training

Easy wins

+Add missing good-first-issue labels and help-wanted labels — there are 8 open issues with only 'future improvement' labels, making it hard to find entry points
+Write integration tests for the ace_next/ pipeline steps (test coverage floor is only 25%; ace_next/steps/ has ~15 step files with no obvious corresponding tests in the samples provided)
+Add type stubs or improve mypy coverage for ace_next/ — pyproject.toml explicitly excludes ace_next from mypy, leaving the newer codebase untyped
+Consolidate the versioned prompt files (prompts_v2.py, prompts_v2_1.py, prompts_v3.py, plus reflector prompt versions v3/v4/v5) into a single versioned registry with a clear deprecation path

Red flags

!ace/ and ace_next/ are parallel reimplementations of the same concepts with no clear migration path documented in the committed code — new contributors will struggle to know which to extend
!The sandboxed REPL in ace/reflector/sandbox.py (and ace_next/rr/sandbox.py) executes LLM-generated Python code — the security model of this sandbox is not visible in the file tree samples and deserves scrutiny before production use
!CLAUDE.md references docs/ACE_DESIGN.md and docs/PIPELINE_DESIGN.md as mandatory reading, but neither file appears in the committed file tree (only docs/old_docs/ and docs/concepts/ are present) — onboarding docs are missing
!commit_count: 1 and contributor_count: 1 in the API data despite 27 listed contributors and active history — likely a data collection artifact, but worth verifying the contributor diversity claim
!The 'pipeline/' package is imported in pyproject.toml but absent from the file tree, suggesting it exists locally but may not be fully committed or is being developed privately

Code quality

decent

The pipeline engine (tests/pipeline_engine/test_branch.py) shows disciplined design — immutable StepContext with .replace(), explicit requires/provides contracts, well-structured merge strategies, and thorough unit tests. The ace/integrations/claude_code/learner.py is production-quality with graceful fallbacks for missing deps (tenacity, toon), robust transcript filtering, and good docstrings. However, prompts_v2_1.py is a sprawling 500+ line file mixing prompt templates with business logic, and the reflector has five distinct prompt versions (prompts_rr_v3 through v5 still live in the tree) with no clear indication which is active. The RecursiveReflector test suite is solid but tests mock the LLM at the wrong layer for some cases, testing prompt string contents rather than behavior.

What makes it unique

The core idea — using a sandboxed code-execution loop (Recursive Reflector) rather than a single-pass summarizer to analyze agent traces — is a genuine differentiator from naive few-shot memory systems like MemGPT or simple RAG-over-history approaches. The Stanford/SambaNova research citation is legitimate (arxiv 2510.04618). However, the hosted kayba.ai upsell is prominent throughout the README, and the framework is clearly also a lead-gen vehicle for that product, which may affect prioritization of open-source contributions.

Scores

Collab

Activity

Barrier to entry

medium

The codebase has two parallel implementations (ace/ and ace_next/) with a third in-progress rewrite, versioned prompt files proliferate (prompts.py, prompts_v2.py, prompts_v2_1.py, prompts_v3.py, plus reflector-specific versions), and CLAUDE.md mandates reading two design docs before touching any core code — none of which appear in the committed file tree.

Skills needed

Python 3.12+ (heavily uses modern typing, dataclasses, async)LLM API familiarity (LiteLLM, OpenAI, Anthropic)Agent framework experience (LangChain, browser-use, or Claude Code)Understanding of prompt engineering and system prompt designpytest for tests (async fixtures, custom markers)Familiarity with sandboxed code execution concepts (the Recursive Reflector runs LLM-generated Python in a REPL)