agulaya24/baselayer

Python31 contributorsApache-2.0

Summary

Base Layer is a CLI tool and MCP server that processes personal text corpora (ChatGPT exports, journals, books) through a 4-step LLM pipeline (extract facts → author identity layers → compose brief) to produce a ~2,500 token 'identity brief' capturing behavioral patterns. The brief can be injected into any AI system prompt or served via MCP to tools like Claude Desktop and Cursor. It uses Anthropic's API (Haiku/Sonnet/Opus) for processing and stores everything locally in SQLite + ChromaDB.

Great for

people interested in personal AI memory systems, behavioral modeling from text corpora, or LLM-as-judge evaluation frameworks — specifically the problem of compressing identity signal from unstructured personal data into dense, injectable context

Easy wins

+Add a CONTRIBUTING.md — the repo has 76 design decisions in DECISIONS.md but zero guidance on how to run tests, set up a dev environment, or submit PRs
+Add CI (GitHub Actions) — there are 414 tests but no automated runner; a simple pytest workflow would validate contributions immediately
+Fix the `pyproject.toml` package layout: `packages = ['baselayer']` with `package-dir = {baselayer = 'scripts'}` is non-standard and will break IDE navigation and imports for contributors — the scripts/ directory should be renamed to src/baselayer/ or similar
+Implement persistent updating (listed as active roadmap item): extract_facts.py already has per-conversation processing logic; the gap is incremental re-authoring of layers when new facts are added

Red flags

!Privacy/security: The code sends personal text (conversations, journals) to Anthropic's API by design — this is documented but the README's privacy section claims 'nothing persists remotely' while simultaneously relying on Anthropic's API retention policies. These are in tension.
!The eval script scripts/archive/eval_scripts/run_validation_study.py imports `from marks_bcb_prompts import MARKS_PROMPTS, MARKS_DRS_SCENARIOS, MARKS_VRI_MAPPING` but marks_bcb_prompts.py does not appear in the file tree — this is broken dead code in the archive
!Single commit in the entire repository history (commit_count: 1) — the entire project was pushed as one commit, making it impossible to understand development history, review changes, or understand why decisions were made
!No lockfile (no poetry.lock, no requirements.txt with pinned versions, no pip-tools output) — chromadb~=0.6 and sentence-transformers~=3.0 have had breaking changes; contributors may get different behavior on different installs
!The pyproject.toml declares `baselayer = 'scripts'` as the package dir — this means `import baselayer.config` resolves to `scripts/config.py`, but scripts also contain one-off experiment files, archive code, and utilities that become part of the installed package

Code quality

decent

The core pipeline files (extract_facts.py, author_layers.py, verify_provenance.py) show careful design — there's real thought in predicate normalization, entity resolution, domain capping, and the tiered fact retrieval. The provenance verification system (NLI + vector audit + claim verification) is more rigorous than most hobby projects. However, session-number comments like 'S65', 'D-056 Tier 2', 'Plan 1' appear throughout without explanation, making the code archaeology-dependent. The `_get_entity_map` and `_get_user_names` functions use module-level `hasattr` caching instead of proper module globals or lru_cache, which is fragile. The eval scripts directory contains production-mixed-with-experiment code (e.g., run_validation_study.py imports from marks_bcb_prompts.py which isn't in the file tree, suggesting broken imports in the archive).

What makes it unique

The behavioral compression angle is genuinely differentiated from most 'AI memory' tools which do naive RAG over conversation history. The three-layer identity architecture (ANCHORS/CORE/PREDICTIONS) with explicit provenance tracing back to source facts is more structured than Mem0 or similar. The ablation study infrastructure and BCB (Behavioral Compression Benchmark) framework show real evaluation rigor. However, this is currently a single-person research project that happens to have a pip package, not a collaborative open-source project — the gap between the polished README and the actual contributor experience is significant.

Scores

Collab

Activity

Barrier to entry

high

Single contributor, zero onboarding docs (no CONTRIBUTING.md, no CI), 76+ undocumented design decisions referenced throughout the code by session numbers (e.g., 'Session 55 Plan 2', 'S65') that are meaningless without access to the author's session logs, and the pyproject.toml maps the entire `scripts/` directory as the `baselayer` package which is an unusual layout that will confuse contributors.

Skills needed

Python 3.10+ including async patternsSQLite schema design and query optimizationAnthropic API (Claude models, prompt engineering)ChromaDB / vector embeddings (sentence-transformers)LLM evaluation methodology (the codebase has serious eval infrastructure)MCP protocol (Model Context Protocol) for the server componentUnderstanding of NLI models (DeBERTa used in verify_provenance.py)