back
whitepaper · v2.4 · 2026 · brainctl 2.7.0

brainctl: a persistent memory layer for autonomous agents

Terrence Schonleber & contributors · github.com/TSchonleber/brainctl
abstract

Autonomous agents built on large language models have no durable memory. Every session begins from zero, every context window is a goldfish bowl, and every handoff loses most of what the previous agent learned. brainctl is an open-source memory layer that gives agents a persistent brain: a single SQLite database with full-text and vector search, a typed knowledge graph, three first-class memory types (episodic, semantic, procedural) backed by their own tables and gates, provenance-tracked decisions, and sleep-inspired consolidation. It ships as a Python library, a CLI, a 209-tool Model Context Protocol server (stdio and streamable HTTP, with a tool-allowlist for clients that cap the surface), and nineteen first-party plugins spanning agent frameworks (Claude Code, Codex CLI, Cursor, Eliza, Gemini CLI, Goose, Hermes, OpenClaw, OpenCode, Pi, Rig, Virtuals Game, Zerebro) and trading bots (Freqtrade, Jesse, Hummingbot, NautilusTrader, OctoBot, Coinbase AgentKit). On top of the memory substrate brainctl ships an on-chain economy: Ed25519-signed memory bundles, optional Solana memo pinning, Light Protocol compressed-token minting, and a chain-canonical agent-to-agent marketplace at brainctl.org with full negotiation primitives (offer / counter / accept / reject / withdraw), just-in-time cNFT minting at settlement, and a transparent fee schedule (2.5% protocol fee at settlement, flat per-op fees on every chain interaction). Provider import adapters bring memories in from mem0 and arbitrary JSON exports; wallet export to Phantom / Backpack / Solflare / Glow is one CLI call. This paper describes the motivation, the substrate, the seven-store memory typology, the worthiness gate with adaptive five-factor admission control and Bayesian confidence model, AGM-style belief revision, an eight-phase NREM/REM consolidation pipeline grounded in synaptic homeostasis theory (Tononi & Cirelli 2003), self-improving retrieval via Thompson Sampling explore/exploit, context-dependent encoding, Q-value utility scoring, and phase-aware quantum amplitude scoring over a zero-LLM entity-linked knowledge graph, hybrid search with reciprocal rank fusion, multi-agent handoff, the security posture, the on-chain primitives (signed exports → mint → marketplace) and protocol fee schedule, comparison with existing approaches, and the economics of funding open-source infrastructure with a token.

motivation

the forgetting agent

Consider a coding agent asked to implement a multi-day feature. On day one it decides to use Retry-After headers for backoff because the server controls the rate-limit window. On day two, a new session, the agent is told to "improve backoff" and — with no memory of yesterday's rationale — reinvents an exponential backoff scheme that silently fights the server's headers. On day three a third session reconciles the two by introducing yet another layer. The code grows; coherence decays.

This is the canonical failure mode of stateless agents. It is not a failure of reasoning; each session is locally correct. It is a failure of memory — specifically, the inability to preserve the epistemic status of a decision across a session boundary.

why context windows don't fix it

The naive fix — feed the full transcript into the next session's context window — scales in three ways that all break down in practice.

The empirical case against simply scaling context windows is well-established in the literature. Liu et al. 2024 ("Lost in the Middle") showed that language models systematically attend more strongly to information at the beginning and end of their context windows than to information in the middle, with a characteristic U-shaped accuracy curve that gets worse as context length increases. The finding holds across model families and across multi-document QA, key-value retrieval, and reading-comprehension tasks. The naive fix degrades retrieval quality before it runs out of length.

why retrieval alone doesn't fix it either

The next line of defence — a vector database bolted onto the agent — helps, but not enough. Vector retrieval treats memory as a bag of facts indexed by surface similarity. This misses at least three structural properties of real memory:

  1. 1. Provenance. A note written by the agent itself is not the same as a note from the user. A decision ratified twice is not the same as a decision mentioned once. Retrieval that ignores provenance promotes the popular over the reliable.
  2. 2. Temporal structure. "Alice is CTO" written on Monday is superseded by "Alice left the company" written on Tuesday. A retrieval system that returns both with similar cosine scores is broken at the level of meaning.
  3. 3. Epistemic status. Some facts are confident; some are tentative hypotheses; some are contradictions pending resolution. Flattening them into the same embedding space loses the gradient that matters most.

That vector retrieval alone is insufficient for agent memory is now broadly held in the literature. Park et al. 2023 ("Generative Agents: Interactive Simulacra of Human Behavior") introduced a memory-stream architecture with importance scoring and a reflection step that periodically synthesizes higher-level beliefs from raw observations — already past flat similarity retrieval. Packer et al. 2023 (MemGPT) reframed agent memory as an OS-style hierarchy of context, recall, and archival tiers with explicit paging between them. Zhang et al. 2024 surveyed the space and concluded that effective agent memory needs explicit structure, lifecycle management, and consolidation. brainctl is positioned in that line of work.

design goals

brainctl is an attempt to build a memory layer that is:

non-goals and scope

It is equally important to be clear about what brainctl is not trying to be. The agent-memory problem is adjacent to several other hard problems, and conflating them leads to over-scoped projects that do none of them well. The non-goals below are not things we will get to later; they are things we are actively choosing to leave to other tools.

These non-goals are chosen, not forced. They exist because scope discipline is the difference between a memory layer that can be understood in one sitting and a platform that accretes features until no one can reason about it as a whole.

a concrete multi-day agent trace

To make the motivation less abstract, here is the kind of trace brainctl is trying to preserve. Consider a coding agent implementing an API-v2 migration over three sessions spanning four days. Without brainctl, each session starts from zero and the trace looks like:

  day 1   session A
    orient    -> [empty]
    work      -> decide to use Retry-After headers for backoff
                 (rationale: server controls the rate-limit window)
                 implement initial fetcher against /api/v2/orders
                 notice rate limit kicks in at ~100 req / 15s
    wrap_up   -> [lost]

  day 2   session B
    orient    -> [empty]
    user      -> "please improve the backoff"
    work      -> invent exponential backoff from scratch
                 (rationale: handle rate limits gracefully)
                 silently conflicts with day-1 Retry-After logic
    wrap_up   -> [lost]

  day 4   session C
    orient    -> [empty]
    user      -> "the rate limiter is broken, there's two layers"
    work      -> introduce a reconciliation layer on top of both
                 (rationale: can't figure out which to remove)
    wrap_up   -> [lost]

  outcome: 3 sessions, 3 uncoordinated decisions, 1 bug,
           coherence lost at the first handoff.

With brainctl, the same three sessions look like:

  day 1   session A
    orient    -> Brain.orient(project="api-v2")
                 -> context package: [empty — new project]
    work      -> brain.decide("use Retry-After for backoff",
                              "server controls rate-limit window")
                 brain.entity("RateLimitAPI", "service",
                              observations=["100 req/15s"])
                 brain.remember("rate-limit: 100/15s",
                                category="integration")
    wrap_up   -> handoff_packet {
                   goal: "implement api-v2 order fetcher",
                   current_state: "fetcher against /orders working,
                                   Retry-After backoff in place",
                   open_loops: ["pagination not yet implemented"],
                   next_step: "add cursor-based pagination"
                 }

  day 2   session B
    orient    -> Brain.orient(project="api-v2")
                 -> context package: {
                      handoff: (from session A, verified),
                      decisions: ["use Retry-After for backoff"],
                      entities: [RateLimitAPI],
                      memories: ["rate-limit: 100/15s"]
                    }
    user      -> "please improve the backoff"
    work      -> brain.search("backoff")
                 -> hit: decision(Retry-After) w/ rationale
                 agent proposes: "tighten Retry-After parsing,
                                  add jitter, keep existing model"
                 brain.decide("add jitter to Retry-After delay",
                              "avoid thundering herd on recovery")
    wrap_up   -> handoff_packet { ... }

  day 4   session C
    orient    -> same flow
                 -> context now carries both day-1 and day-2
                    decisions with full provenance
    user      -> "there might be an issue with rate limits"
    work      -> brain.search("rate limit")
                 agent finds both decisions, understands the
                 layered design, investigates the real bug.

  outcome: 3 sessions, 1 coherent design, no contradiction
           loop, each handoff preserves the epistemic state.

The difference is not that brainctl is smarter than the agent. It is that brainctl prevents the agent from being repeatedly reintroduced to its own prior reasoning. The cost of forgetting is paid at every session boundary otherwise; brainctl moves that cost to a one-time write.

architecture

substrate choice: sqlite, one file

SQLite was chosen over Postgres, DuckDB, LMDB, and every dedicated vector database. The reasons are practical rather than ideological.

The database runs with foreign keys enforced, WAL journaling, and a 64 MB cache page pool. A lazy shared connection per Brain instance amortizes connection setup across calls.

Schema evolution is a first-class concern. As of v1.5.0 the package ships a safe migration runner (brainctl migrate) that detects drift between what the database records as applied and what the migration files actually require, classifies each pending migration as likely-applied, partial, or needs-apply via column- and table-level heuristics, and only then replays what is missing. v1.5.1 hardened the heuristic for GENERATED ALWAYS AS VIRTUAL columns and ADD COLUMN IF NOT EXISTS patterns. The practical effect: users who pip-upgrade to a new brainctl release can run one command and have their existing brain.db file brought up to the new schema without losing any data — the single-file invariant survives upgrades, not just clean installs.

memory typology

Following Tulving (1972), brainctl represents memory as several first-class stores with different write and retrieval patterns. These are not just different tables — they have different invariants and different consolidation paths. The three Tulvingian types — episodic, semantic, and procedural — are present as separate first-class data models, joined by four supporting stores (decisional, associative, affective, prospective) that map onto specific cognitive functions the typology alone does not cover. Procedural memory was added as a first-class store in v2.7.0 — prior versions mapped that label onto the decisional store, a flattening this typology corrects.

storetypetable(s)invariants
episodictime-stamped events, causally linkedeventsappend-only, cheap, high volume
semanticabstract facts, conventions, generalizationsmemoriesW(m) gated, FTS5-indexed, confidence-tracked
proceduralreusable workflows + tool sequences with fitnessprocedures, procedure_steps, procedure_sourcesstep-ordered, feedback-tracked, FTS5-indexed (migration 052)
decisionaldecisions and their rationaledecisionsprovenance required, immutable rationale
associativetyped knowledge graphentities, knowledge_edgesnodes carry JSON observations; edges are typed and directional
affectiveemotional salience signalaffect_logVAD coordinates (valence, arousal, dominance)
prospectiveintentions that fire on future queriesmemory_triggerskeyword-matched, lifecycle-tracked

Semantic memories are categorized into nine types — convention, decision, environment, identity, integration, lesson, preference, project, and user — each with its own default confidence prior and decay constants. Procedures, separately, carry a status (draft, active, retired), an execution-feedback counter, and a fitness score updated by the procedure_feedback tool when a procedure's steps are run and the outcome is recorded. Event types are drawn from a similarly closed vocabulary: artifact, decision, error, handoff, result, session_start, session_end, task_update, warning, observation. Entity types cover agent, concept, document, event, location, organization, person, project, service, tool.

The live schema at v2.7.0 contains ~62 user-facing tables across 51 migrations. The primitives above sit alongside auxiliary tables for RBAC (memory_trust_scores, access_log), quarantine (decoherent_memories, recovery_candidates), the global workspace (workspace_broadcasts, workspace_phi), neuromodulation state, theory-of-mind models, EWC importance weights, and Bayesian uncertainty logs.

the Brain interface

The Python API exposes a single class whose public surface is deliberately small. Five methods cover the entire session lifecycle.

from agentmemory import Brain

brain = Brain(agent_id="my-agent")

# session start — pull a context package
ctx = brain.orient(project="api-v2")

# during work — write
brain.remember("rate-limit: 100/15s", category="integration")
brain.entity("RateLimitAPI", "service",
             observations=["100 req/15s"])
brain.decide("use Retry-After for backoff",
             "server controls timing")

# during work — read
hits = brain.search("rate limits", k=5)

# session end — produce the handoff
packet = brain.wrap_up("auth module complete",
                       project="api-v2")

A handoff packet contains four fields: goal, current_state, open_loops, and next_step. These are the minimum sufficient statistics for the next agent to resume work cold. Packets are stored in handoff_packets with a signature over their contents so an auditor can verify the receiving agent saw what the sender wrote.

the mcp surface

The same operations are exposed over the Model Context Protocol as 209 tools, grouped by capability: memory, events, entities, decisions, procedures, consolidation, belief management, affect, workspace, federation, neuromodulation, theory-of-mind, expertise tracking, reflexion, and uncertainty. The MCP server is a thin adapter — the underlying logic lives in the Python package — so Python and MCP callers see identical semantics. Two transports ship in-tree: brainctl-mcp over stdio (the default; spawned by Claude Desktop, Codex, Cursor, etc.) and brainctl-mcp-http (added in v2.5.0), a streamable-HTTP server with bearer-token auth and an allowlist for remote callers (xAI Grok, Strand, hosted-agent platforms).

For clients that cap the total MCP tool count — Google's Antigravity IDE enforces a hard limit of 100 — v2.6.4 added a BRAINCTL_ALLOWED_TOOLS env var. When set to a comma-separated list of tool names, the stdio server returns only the listed tools from tools/list and rejects everything else from tools/call. Unknown names are a hard error at startup, with a difflib.get_close_matches hint so a typo like memory-add reports "did you mean memory_add?" rather than silently dropping the tool. Unset is the default and exposes the full 209-tool surface for backward compatibility.

At v2.7.0, nineteen first-party plugins ship in-tree across two families. Agent frameworks: Claude Code, Codex CLI, Cursor, Eliza, Gemini CLI, Goose, Hermes, OpenClaw, OpenCode, Pi, Rig, Virtuals Game, and Zerebro. Trading bots: Freqtrade, Jesse, Hummingbot, NautilusTrader, OctoBot, and Coinbase AgentKit. Each plugin is an idempotent installer that wires the brainctl MCP server into its host framework's native configuration surface and, where the framework has a distinct memory abstraction, adapts it to the Brain interface. Trading-bot plugins additionally ship a strategy-mixin pattern that gives a live strategy persistent recall over its own past trades, regimes, and parameter sweeps without forcing the operator to write the persistence layer themselves.

The plugin tree spans three structurally distinct shapes — pure-MCP registration via the host's native config (Claude Code, Codex CLI, Cursor, Gemini CLI, Goose, Hermes, the trading bots), MCP plus first-party hook scripts in the host's native runtime (OpenClaw skills, OpenCode TypeScript hooks), and proxied-MCP via a community adapter for hosts that deliberately ship without built-in MCP support (Pi via the pi-mcp-adapter proxy convention). The shape is dictated by what each host framework actually exposes, not by preference; the surface area an integration can occupy is exactly the surface its host gives it.

The Codex CLI plugin, added in v1.4.0, is representative. Codex discovers MCP servers from ~/.codex/config.toml, so the plugin is a sentinel-wrapped installer that merges a [mcp_servers.brainctl] block into the user's config without disturbing other servers, with automatic backup, dry-run preview, and clean uninstall. It ships with an AGENTS.md.template that teaches Codex the orient / wrap_up lifecycle on every session start.

The OpenClaw plugin (also added in v1.6.0) takes a different shape from the config-file-merge installers: it ships as a skill with an AGENTS.md snippet injection, because OpenClaw's multi-agent topology discovers brainctl through its skill registry rather than a static config file. Any remaining stdio-speaking MCP client — VS Code, Claude Desktop, Zed — works out of the box without a dedicated plugin.

concurrency and durability

SQLite's concurrency model is both simpler and stronger than most networked databases for the access pattern brainctl has. Writes are serialized — at any moment there is exactly one writer — and reads are concurrent with both reads and the active writer. In Write-Ahead Logging (WAL) mode, which brainctl uses unconditionally, readers never block writers and writers never block readers; the only lock contention is writer-on-writer, resolved by a short retry with exponential backoff.

For a memory layer this is the right model. Agents write sparsely (a few to a few dozen rows per session), read frequently (every orient call is a small burst of reads), and almost never need two concurrent writers on the same brain. In the rare case where two agents share a brain and race on a write, the second writer retries; brainctl's connection pool hides this from the caller. The lazy shared connection per Brain instance (added in v1.2.0) amortizes SQLite connection setup across a session, so a typical orient + dozen-write + wrap_up flow pays the connection cost once.

Durability comes from three layers. First, WAL journaling with synchronous=NORMAL flushes every commit to the WAL file, which survives process crashes; a power loss costs at most the last in-flight commit. Second, the W(m) gate runs inside a transaction that either admits or rejects the memory atomically — there is no such thing as a partially-admitted memory. Third, the migration runner (v1.5.0, §2.1) wraps each migration in savepoints so an upgrade that fails halfway rolls back to the pre-migration state with no manual intervention. Operators who want stronger durability guarantees can switch to synchronous=FULL with a one-line PRAGMA change; the trade-off is roughly 2× write latency in exchange for fsync-per-commit.

Backup is trivial by design: copy the file. SQLite's online backup API (exposed via brainctl backup) does this safely against a running database by walking pages rather than blocking the writer. For point-in-time recovery the WAL file can be retained alongside the main database and replayed. No backup daemon, no snapshot coordinator, no distributed consensus.

privacy and data isolation

brainctl is designed for the case where the brain holds data the operator wants kept close: private code, user identifiers, internal decisions, draft reasoning. Several architectural choices flow from that posture.

No network on the hot path. The Brain interface performs zero outbound network calls during orient, remember, decide, entity, search, or wrap_up. Embeddings are produced by a local Ollama instance; the FTS5 index is in-process; the vector index is in-process. An operator can physically disconnect the network and a brainctl agent will still function for every read and write path. The only code that touches the network is opt-in ingestion pipelines the operator wires up themselves.

Scope-based isolation. Every entity, memory, and decision carries a scope field with values like global, project:api-v2, or agent:reviewer-bot. Queries filter by scope at the storage layer, so an agent invoked with scope project:api-v2 cannot see memories written under project:billing unless they are explicitly marked global. Enforcement happens in SQL via parameterized query rewriting, not in application code that could be bypassed.

Per-agent identity. Every write is attributed to an agent_id set at Brain(agent_id=...) construction time. The agent ID is the unit of source attribution for the source-monitoring layer (§7.3). Two agents sharing the same brain file see each other's memories only to the extent that scopes and trust levels allow — the trust-scoped RBAC layer (§7.4) enforces this at query time.

PII as a first-class signal. When a memory contains personally identifiable information, detection runs at write time (not audit time), and the result is stored alongside the memory in pii_audit. A read-time filter can redact PII before the memory reaches the working context. Operators can answer data-retention and right-to-erasure requests with a single SQL statement against the PII audit log, which matters for anyone shipping brainctl into a regulated environment.

What brainctl does not do: it does not encrypt brain.db at rest. SQLite has a proprietary encryption extension (SEE) and a free alternative (SQLCipher), and brainctl is compatible with both but does not ship encryption on by default. Operators who need at-rest encryption should either use SQLCipher directly or rely on filesystem-level encryption (FileVault, LUKS, BitLocker) — the same posture most operators use for git repositories holding sensitive code.

the memory model

Before describing the model, two epistemic notes. First, brainctl is inspired by the cognitive-science research it cites; it is not a model of any of it. We do not simulate hippocampal cell assemblies, we do not replay neural firing patterns, and we do not implement biologically plausible learning rules. What we do is borrow the architectural patterns these systems have evolved — episodic/semantic separation, decay-with-reinforcement, schema-driven consolidation, source attribution, belief revision under contradiction — and implement them as a SQLite schema with worker processes. Some of the mappings are tight (the Bayesian α/β confidence model is literal Bayesian inference; AGM belief revision follows the formal postulates), some are loose (the dream cycle is structurally analogous to sharp-wave ripple replay, not a simulation of it). Where the analogy is loose, we say so explicitly in the relevant section.

The goal is the smallest set of mechanisms that prevent the failure modes of stateless agents — forgetting, confabulation, contradiction loops, catastrophic supersedes, attention starvation. Cognitive science is the prior art that already worked through these failure modes in a different substrate; we borrow it the way distributed systems engineers borrow from biology, as a source of solved problems.

episodic, semantic, procedural

The three Tulvingian memory types map onto distinct write patterns. brainctl writes episodic memory freely (the append-only event stream), gates semantic memory aggressively (the W(m) worthiness check at §3.2), and treats procedural memory as a separate first-class store (the procedures + steps + sources tables added in v2.7.0, with their own status lifecycle and execution-feedback loop). The asymmetry is intentional: episodic storage is the sharp-wave ripple buffer in the mammalian analogy, so it stays cheap and high-volume; semantic storage shapes every future retrieval and must stay dense; procedural storage records the actual how an agent gets a class of tasks done, so it accumulates fitness evidence rather than fading on decay. The consolidation cycle (§4) is the pipeline that promotes stable episodic patterns into semantic entries during quiet hours; the procedure-feedback loop (§3.7) is the analogous strengthening signal for procedural memory.

the worthiness gate W(m)

Every candidate semantic memory m is scored by a five-factor worthiness function before admission to long-term storage. The design follows Zhang et al.'s Adaptive Memory Admission Control (A-MAC, ICLR 2026 Workshop), which demonstrated F1=0.583 on the LoCoMo benchmark with 31% latency reduction. The five factors decompose the admission decision into interpretable, independently tunable dimensions:

W(m) = wu · utility(m) + wc · confidence(m) + wn · novelty(m) + wr · recency(m) + wt · type_prior(m)

The gate admits m only when the weighted sum clears a category-specific threshold θ. Each factor pulls in a specific direction.

Future utility (weight 0.15). Estimates how likely the memory is to be retrieved and useful in a future session. A memory about a one-off debugging session has lower utility than a memory about a recurring integration pattern. In practice, brainctl approximates utility by combining the Q-value from temporal-difference learning on past retrieval outcomes (§5) with the category's historical access frequency.

Factual confidence (weight 0.15). The Bayesian posterior from §3.3 — a candidate with a high E[p] from its source trust score is more likely to be admitted than a low-confidence observation from an unverified tool output.

Semantic novelty (weight 0.20). How different m is from what the agent already knows. brainctl approximates novelty by the inverse of the maximum embedding similarity to any existing memory in the same scope: a candidate that closely matches something written an hour ago carries almost no novelty, while a candidate with no near neighbor above a similarity floor carries high novelty. This subsumes both the surprise signal (within category) and the cross-category deduplication signal — even a high-novelty write against its own category will be rejected if a near-identical memory exists elsewhere, preventing the "we wrote this as a lesson, then wrote the same thing as a convention three days later" pathology.

Temporal recency (weight 0.10). An exponential decay favoring recent candidates over stale ones, reflecting the observation that memories written closer to the current context are more likely to be contextually relevant.

Content type prior (weight 0.40). The single most influential factor. This is the historical accept rate per memory category — decision memories start with a higher admission bias than observation memories because the cost of losing a decision is higher than the cost of losing an observation. The prior also incorporates the dynamic ewc_importance score from §4.3: a candidate that would merge with or supersede an existing high-importance memory gets a negative prior shift, which is how brainctl protects load-bearing memories from being overwritten by slightly-different-but-wrong new writes.

The outcome of rejection. If W(m) < θ, the candidate is not simply discarded. Instead it is merged back into its nearest match: the nearest match's recalled_count increments, its confidence posterior shifts slightly toward the new evidence, and its last-touched timestamp is updated so it resists decay. The rejected candidate itself is discarded. This is how brainctl prevents the "we already know this fifty times" pathology that kills every naive journal-based memory — redundant writes strengthen the thing they would have restated, instead of creating noise.

Modification resistance. Older, high-importance memories develop increasing resistance to reconsolidation-induced change (O'Neill & Winters 2026). When a new write attempts to supersede a well-established memory, the gate's effective threshold rises with the target's age and importance, preventing catastrophic overwriting of knowledge that has been stable across many sessions.

Gate calibration feedback loop. The gate continuously tracks whether its admission decisions correlate with downstream utility: do high-scoring candidates actually get retrieved more often, and do retrieved memories actually contribute to task success? Calibration error is fed back into the factor weights via the memory_outcome_calibration table (Dunlosky & Metcalfe 2009), closing the loop between admission and outcome.

confidence: bayesian α/β

Each memory carries a Bayesian posterior on reliability, parameterized as a Beta(α, β) distribution.

α' = α + 1 on successful recall  ·  β' = β + 1 on refutation
E[p] = α / (α + β)

Memories begin at Beta(1, 1) (uniform prior) unless the caller supplies a trust override — user-written memories, for example, start at Beta(3, 1). Successful recalls are those that contributed to a confirmed outcome; refutations are updates where a subsequent decision invalidated the prior claim. High-stakes retrievals (the planning layer) filter by expected confidence so tentative hypotheses do not leak into load-bearing reasoning.

A per-category calibration log (memory_outcome_calibration) tracks whether the confidence estimates are well-calibrated over time, using Brier scores against observed outcomes.

decay and forgetting

Unsupported memories age on an Ebbinghaus-style retention curve:

retention(t) = exp(−t / S)

where t is time since last access and S is a strength parameter that grows with every successful recall — the testing effect (Ebbinghaus 1885). When retention drops below a category-specific floor, the memory is marked for consolidation review rather than deleted outright. Decay is a soft pressure, not a hard delete, because a seemingly-stale memory may become load-bearing under a future topic shift.

The dual concern — forgetting too little vs forgetting too much — is the central tension in continual learning. McCloskey and Cohen 1989 first characterized catastrophic interference in connectionist networks: training a neural net sequentially on task A then task B causes near-total loss of task A performance. Parisi et al. 2019 reviewed the modern continual-learning literature and grouped responses into three families: regularization-based (penalize weight changes that hurt prior tasks, exemplified by EWC), structural (allocate new capacity for new tasks), and replay-based (interleave old samples with new ones during training). brainctl uses regularization and replay in combination — EWC-style importance weights protect load-bearing memories, the dream cycle replays episodic traces during quiet hours — and treats forgetting as a controlled pressure rather than something to eliminate. Total recall is its own pathology.

belief revision (AGM)

Contradictions are data. When a new fact m' contradicts an existing memory m, brainctl runs an Alchourrón–Gärdenfors–Makinson style revision (Alchourrón, Gärdenfors, Makinson 1985). AGM imposes three postulates:

  1. 1. Closure — the belief set remains closed under logical consequence after revision.
  2. 2. Success — the new fact is admitted to the revised set.
  3. 3. Inclusion (minimality) — the revision makes the smallest change consistent with 1 and 2.

The losing belief is not deleted. It is written to belief_collapse_events with the collapse reason, the winner's citation, credibility rankings, and a full provenance chain. This lets operators reverse a revision if they later discover the winner was itself unreliable — the superseded belief can be recovered with its history intact. Open conflicts are surfaced via belief_conflicts and can be resolved interactively through the resolve_conflict tool.

prospective memory

A memory system built only around retrospective storage — facts the agent has already learned — is structurally missing half of human memory. Prospective memory is the cognitive capacity to remember to do something in the future: to notice, when a specific context recurs, that there is a pending intention attached to it. Einstein and McDaniel 1990 characterized prospective memory as distinct from retrospective memory, with its own failure modes (the agent forgets the intention itself) and its own success signals (the intention fires at the right moment without explicit query).

brainctl implements prospective memory as a first-class primitive via the memory_triggers table (migration 014, extended in subsequent releases). A trigger is a small record with three parts: a precondition (a set of keywords, an entity reference, or a task context), an action hint (what the agent should remember to consider when the precondition matches), and a lifecycle state (pending / fired / retired).

# seed a prospective memory at decision time
brain.trigger_create(
    precondition="billing schema",
    action="remember: invoices table was renamed to charges last week",
    scope="project:billing",
)

# later, during a future session, a query about billing
# automatically surfaces the trigger before the agent acts
ctx = brain.orient(project="billing")
# ctx includes any triggers whose preconditions match
# the orient context window.

The trigger_check tool is called automatically on every orient call and on every search that crosses a relevance threshold. Matched triggers are surfaced into the working context with a visual marker so the agent can distinguish a prospective-memory hit from a regular retrieval. Fired triggers transition to the fired state and no longer match, preventing the same intention from firing indefinitely; triggers that become stale without firing transition to retired on a configurable timeout.

Prospective memory is one of the most common failure modes of naive agent memory. A stateful agent can remember that "we renamed this table" perfectly well as a retrospective fact — but if the agent is not prompted to recall it, the fact is functionally unreachable. Prospective triggers close that gap by making the act of recall context-addressed rather than query-addressed: the memory finds the agent, not the other way around.

procedural memory

Episodic memory captures what happened. Semantic memory captures what is true. Neither captures how to do something — the reusable sequences of tool calls, decisions, and recovery steps an agent assembles through repeated practice. Tulving 1972 separated procedural from declarative memory for precisely this reason: skill-like knowledge accrues through repetition rather than through one-shot encoding, and it lives in a different cognitive subsystem than facts and episodes.

brainctl shipped procedural memory as a first-class store in v2.7.0, after Velamj contributed it as PR #94 on 2026-04-24. Migration 052 adds three canonical tables: procedures (the procedure itself with a name, description, status, and aggregate fitness counter), procedure_steps (ordered tool invocations or sub-procedures with optional preconditions), and procedure_sources (provenance pointers back to the episodic events or semantic memories the procedure was derived from). An FTS5 virtual table sits over the procedures+steps content so queries can search by tool name, step text, or description simultaneously.

Status lifecycle. Procedures move through three states: draft (just authored, not yet validated against outcomes), active (validated by at least one successful procedure_feedback call), and retired (either superseded by a better procedure or explicitly retired by the agent after repeated failure). Status is consulted by procedure_search so a query for "how do I do X?" by default surfaces active procedures first.

Fitness tracking. Every execution outcome flows back through procedure_feedback, which increments per-procedure success and failure counters, updates a last-execution timestamp, and adjusts an internal fitness score. The fitness score is consulted at retrieval time as a small reranking signal: among procedures that match the query, those with higher demonstrated success rate surface earlier. This is the procedural analogue of the Bayesian α/β confidence model on semantic memories in §3.3 — concrete outcome evidence updating a posterior, with no LLM call required.

Bridge synopsis. Each procedure also writes a small bridge row into memories with category convention or lesson, summarizing the procedure's name and purpose. Legacy memory_search calls therefore still find procedures via their text content, without callers needing to switch APIs. The bridge row is recomputed when the underlying procedure is updated, so the synopsis stays in sync with the canonical content.

CLI + MCP surface. brainctl procedure {add|get|list|search|update|feedback|backfill|stats} on the CLI side; eight MCP tools (procedure_add, procedure_get, procedure_list, procedure_search, procedure_update, procedure_feedback, procedure_backfill, procedure_stats) on the agent side. The backfill command exists specifically for the case where an agent has been operating without procedural memory and wants to derive procedures retroactively from its episodic trace — it walks the events stream looking for repeated tool sequences with similar outcomes and proposes procedure candidates the operator can promote.

Acquisition discipline. procedure_backfill with dry_run=False walks the memories + decisions tables, applies a regex classifier (looks_procedural: how-to phrasing, if-then conditionals, rollback language, step markers, ordering hints), and writes accepted candidates straight into procedures — no operator approval in the loop. That covers acquisition from already-written semantic memories. Acquisition directly from the raw event stream (learned clustering of successful tool sequences without a hand-designed classifier) is the next layer, listed in §11.1.

consolidation

the consolidation pipeline

Consolidation runs as an eight-phase NREM/REM pipeline, structurally analogous to the mammalian sharp-wave ripple replay observed in the hippocampus during slow-wave sleep and quiet wakefulness. It is demand-driven rather than scheduled: a homeostatic pressure metric (total confidence mass divided by active memory count) accumulates during normal operation, and when pressure exceeds a configurable setpoint, consolidation fires without waiting for a fixed interval — matching Tononi & Cirelli's (2003) synaptic homeostasis hypothesis, in which sleep pressure builds during waking and is discharged during sleep. Consolidation can also be invoked manually via brainctl-consolidate dream-cycle. The quiet-hours cron is a separate housekeeping pipeline that runs decay passes and bookkeeping; it is not the consolidation pipeline.

The neuroscience worth grounding here. Sharp-wave ripples (SWRs) are short, high-frequency (≈150–250 Hz) oscillations in the CA1 region of the hippocampus, characterized in detail by Buzsáki and colleagues over four decades and reviewed in Buzsáki 2015. Wilson and McNaughton 1994 showed that place-cell sequences active during a maze-traversal task reactivate during subsequent sleep, at compressed timescales, in the same temporal order as the original experience. Diba and Buzsáki 2007 extended this finding to reverse replay during awake immobility, and Karlsson and Frank 2009 demonstrated that even remote experiences — places the animal had not visited recently — reactivate during awake SWRs. The consistent finding across these studies is that hippocampal replay during quiet states is causally implicated in memory consolidation and the gradual integration of episodic detail into cortical semantic memory. McClelland, McNaughton, and O'Reilly 1995 formalized this in their Complementary Learning Systems theory: a fast, sparse hippocampal store interleaves new experiences during quiet hours into a slow, distributed cortical store, balancing rapid acquisition against catastrophic interference.

Replay is one of the few mechanisms that survived the move from biological neural systems into artificial ones essentially unchanged. Lin 1992 introduced experience replay in reinforcement learning: store past transitions in a buffer, revisit them during training. Mnih et al. 2015 — the Deep Q-Network paper that first achieved human-level play on Atari — attributed much of DQN's sample efficiency and stability to its experience replay buffer, sampling mini-batches uniformly from past transitions to break the harmful correlation between sequential samples. Schaul et al. 2016 extended this with prioritized experience replay, sampling buffer entries proportionally to their TD-error so that the most surprising transitions are revisited more often. brainctl's replay phase sits in this lineage: rather than uniform sampling (Lin / Mnih) or TD-error-proportional sampling (Schaul), it replays entity-clustered, salience-weighted candidates — closer to content-aware prioritized replay than to stochastic sampling, but in the same family.

Once triggered, the pipeline proceeds through eight ordered phases:

  1. N2 – synaptic tagging. Memories within the labile window (recently written, or flagged by high-importance events within a ±2-hour window) are tagged to protect them from the global downscaling that follows (Frey & Morris 1997). This mirrors the biological synaptic tag-and-capture mechanism, where strong encoding events set a molecular tag that protects synapses from homeostatic downscaling.
  2. N3 – proportional downscaling. A single multiplicative factor (setpoint / pressure) is applied to all non-permanent, non-tagged memories. High-importance memories resist downscaling via factor(1 − importance), implementing an analog of Elastic Weight Consolidation (Kirkpatrick et al. 2017). Memories that fall below a retirement threshold are soft-deleted (Tononi & Cirelli 2014).
  3. Entity-clustered replay. Replay candidates are grouped by shared entity references (Niediek et al. 2026) and ranked by salience magnitude (Robinson et al. 2026). Replay is decoupled from Hebbian strengthening — the system replays broadly, then tags selectively (Widloski & Foster 2025). Memories created near high-importance events (importance ≥ 0.7) receive elevated replay weight, mirroring Yang & Buzsáki's (2024) finding that awake sharp-wave ripples tag episodic memories for subsequent sleep consolidation.
  4. Coupling gate. Only memories with at least one knowledge-graph edge pass for promotion; isolated, unconnected memories are held back until the entity linker (§5.8) provides connectivity (Schwimmbeck et al. 2026).
  5. Schema-accelerated promotion. Memories with ≥ 3 entity links bypass the normal episodic holding period and are immediately promoted to semantic tier, implementing Tse et al.'s (2007) finding that schema-consistent information consolidates an order of magnitude faster than schema-inconsistent information.
  6. De-overlap. Similar-but-distinct memories are detected and their boundaries sharpened, mirroring the brain's active separation of overlapping representations during sleep (Aquino Argueta et al. 2026).
  7. REM – dream synthesis. Bisociation synthesis generates cross-domain connections between memories that share latent structure but occupy different knowledge-graph clusters. Affect dampening preserves the factual content of emotionally tagged memories while reducing affective intensity (Walker & van der Helm 2009). Isolated-memory bridge discovery finds memories with zero edges, embeds them, and connects each to its nearest semantic neighbor if the similarity clears a threshold — where the pipeline does its actual creative work.
  8. Housekeeping. Hebbian strengthening of tagged co-accessed pairs, metric updates, label-propagation community detection on the knowledge graph, high-betweenness bridge node identification, and tag cycle decrement.

Spacing-effect decay. Between consolidation cycles, memory stability increases when a memory is recalled at well-spaced intervals (inter-study interval ≥ 15% of retention interval per category), based on Cepeda et al.'s (2006) meta-analysis of 839 assessments from 317 experiments on distributed practice. Each memory carries a next_review_at timestamp computed from its temporal class and stability; the consolidation cycle checks for due reviews and replays them at expanding intervals, constituting an integrated spaced-repetition system (Cepeda et al. 2006; Murre & Dros 2015).

The load-bearing functional analogy: selective re-processing of episodic experience during a quiet period improves the structure of long-term memory. The eight phases implement this in sequence — tag what matters, downscale what does not, replay broadly, gate on connectivity, accelerate schema-consistent memories, separate overlapping representations, synthesize cross-domain connections, clean up. The replay_priority and ripple_tags columns in the schema borrow SWR terminology as a naming convention for the offline re-processing pass; the consolidation pipeline and the W(m) write-time admission gate are separate pipelines that operate at different points in the memory lifecycle.

proactive interference gate

A Proactive Interference Index — the PII gate — blocks supersedes that would erase too-recent context. Without it, a long-running agent can overwrite load-bearing memories with a noisy summarization pass and enter a catastrophic forgetting state. The gate computes a recency-weighted dependency score over the memory graph and refuses supersede operations that would drop the score of any descendant below a threshold.

The gate is enforced at write time, logged to pii_audit, and can be reviewed. Rejected supersedes are not dropped silently — they are written to a pending queue and either reconciled by the agent in a later session or escalated to the operator.

ewc-style importance weighting

Inspired by Kirkpatrick et al.'s Elastic Weight Consolidation (Kirkpatrick et al. 2017), brainctl maintains an importance score for each memory based on how often it is touched during planning and how many downstream decisions cite it. The analogue to EWC's Fisher information is a simpler access-frequency × citation-depth product, stored in ewc_importance. The score is consulted by the W(m) write-time gate: when a new memory would merge into or supersede an existing one, a candidate that wants to displace a high-importance memory has to clear a higher worthiness bar than one that would displace an obscure note. Importance is a write-time prior, not a consolidation operation.

schema integration

A complementary frame for what consolidation is doing comes from schema theory. Bartlett 1932, in Remembering, showed that human memory is reconstructive rather than reproductive: people do not retrieve experiences verbatim, they reconstruct them by fitting fragments into pre-existing schemas — organized knowledge structures that capture what kinds of things tend to go together. His "War of the Ghosts" experiment demonstrated that participants reshaped an unfamiliar Native American folk tale toward their own cultural schemas with each retelling. Rumelhart 1980 later formalized schemas as the building blocks of cognition: data structures for representing the generic concepts stored in memory, into which new experiences are slotted and against which they are interpreted.

brainctl's semantic memory is, in effect, a small set of explicit schemas. The nine memory categories — convention, decision, environment, identity, integration, lesson, preference, project, user — are not arbitrary tags; they are the schemas that semantic memories must fit into to be admitted, each with its own confidence prior and decay constants. Viewed through the schema-theoretic lens, the W(m) worthiness gate is a schema-fit check: if a candidate memory does not fit any existing schema and is not novel enough to warrant a new instance, it gets merged into its nearest schematic match. Consolidation, in turn, is the process of pulling stable patterns out of episodic detail and fitting them into these schemas — the same compression Bartlett observed in human reconstruction.

cost, scheduling, and failure modes

Consolidation is expensive relative to the hot path — a single cycle touches hundreds or thousands of rows, runs embedding comparisons, performs graph traversal, and invokes a local LLM for the REM-phase bisociation step. Running it on the critical path would blow up orient and wrap_up latency. The whole point of the homeostatic trigger (§4.1) is to take this work offline and batch it behind the back of the agent.

Scheduling semantics. The homeostatic trigger fires on demand when consolidation pressure exceeds the configured setpoint. Two supplementary conditions serve as fallbacks: inactivity (no events written in the last IDLE_SECONDS, default 300) or volume pressure (the number of new episodic writes since the last cycle exceeds PRESSURE_EVENTS, default 50). The volume condition prevents a continuously-active agent from never consolidating. A manual invocation via brainctl-consolidate dream-cycle ignores all conditions and runs immediately. The quiet-hours cron is a separate pipeline that runs housekeeping (decay passes, calibration updates, stale-trigger sweeps), not the eight-phase consolidation pipeline itself.

Cost envelope. Rough figures on a single-agent brain with ~5k semantic memories and ~20k events, measured on an M2 MacBook Pro: the combined NREM phases (tagging, downscaling, replay, coupling, schema acceleration, de-overlap) run in ~2–3 seconds, REM bisociation takes 2–4 seconds depending on how many hypothesis candidates the LLM is asked to score, and housekeeping runs in ~500 ms. Total wall-clock for a full cycle on a brain of that size sits between 4 and 7 seconds. The cost is dominated by the REM phase because it is the only phase that invokes the LLM; the rest are pure SQL and Python compute.

Failure modes and idempotency. Consolidation is designed to be crash-safe. Each phase writes to the database in its own transaction and records its progress in consolidation_events. If the process crashes mid-cycle:

The worst outcome of an interrupted consolidation is a small amount of duplicated work on the next run, not corruption. Consolidation is also safe to skip entirely — a brain that never consolidates is still fully functional, just denser and with less hypothesis-generated structure.

What can go wrong. A misconfigured quiet-hours cron can trigger housekeeping at the same moment the homeostatic trigger fires a consolidation cycle, causing writer contention (resolved by SQLite's retry but measurable as latency). A runaway REM-phase LLM call can hang the cycle if the upstream model is unreachable — a configurable timeout aborts the phase and marks it for retry on the next cycle. The consolidation_stats tool surfaces per-phase timing and error counts for operators who want to monitor the pipeline.

causal reasoning and temporal structure

An agent memory system that stores and retrieves facts without causal structure is fundamentally limited: it cannot explain why a fact matters, what would change if the fact were different, or how facts at different time scales relate. brainctl addresses this with four mechanisms that layer causal and temporal structure on top of the consolidation pipeline.

Typed causal edges. The knowledge graph supports three causal relation types: causes, enables, and prevents. A causal chain tracer follows these edges forward from any memory or event up to a configurable hop limit, giving the agent (or an auditor) a concrete answer to "what led to this outcome?"

Counterfactual attribution. Working backward from a task outcome, counterfactual attribution (Kang et al. 2025) traces the causal graph in reverse and boosts the Q-values of contributing memories proportional to their edge weight — answering "which memories caused this success?" This closes the loop between retrieval (Q-value reranking from §5) and outcome (causal attribution), making the system genuinely self-improving: memories that demonstrably contributed to good outcomes surface more often in future retrievals.

Temporal abstraction hierarchy. Memories are assigned to a six-level temporal hierarchy based on age: moment (< 12h), session (12h–1d), day (1–7d), week (7–30d), month (30–90d), quarter (> 90d). Hierarchical summarization compresses moment-level memories into day-level summaries, day-level into week-level, and so on — achieving significant memory length reduction while maintaining retrieval quality across time scales (Shu et al. 2025). The hierarchy gives the consolidation pipeline a natural multi-resolution structure: recent memories are preserved in full detail, while older memories are compressed into summaries that retain their load-bearing content.

Belief collapse. Superposed beliefs — states where the agent holds multiple mutually exclusive interpretations simultaneously — are resolved to definite states via four collapse triggers: task checkout (the agent acts on a belief), direct query (an external observer asks), evidence threshold (accumulated evidence exceeds a confidence bound), and time decoherence (the belief has been superposed longer than a configurable window). Each collapse event is logged with the measured amplitude and collapse context, preserving the pre-collapse state for auditability. The quantum Zeno effect applies: frequent measurement slows collapse, so agents that query a belief repeatedly keep it superposed longer than agents that ignore it.

retrieval

Retrieval-augmented language models are now an established design pattern. Lewis et al. 2020 introduced retrieval-augmented generation (RAG), pairing a dense retriever with a generator that conditions on retrieved passages. Karpukhin et al. 2020 (Dense Passage Retrieval) showed that a learned dense retriever could outperform traditional sparse methods like BM25 on open-domain QA. Khandelwal et al. 2020 (kNN-LM) demonstrated that augmenting a language model with nearest-neighbor lookups against a datastore at inference time improves perplexity without retraining the model. Borgeaud et al. 2022 (RETRO) scaled retrieval to a 2-trillion-token backbone and showed that a 7B-parameter model with retrieval can match a 280B-parameter model without it. The pattern is mature.

brainctl's retrieval layer is a small variant of this pattern with two differences. First, the retrieval target is the agent's own structured memory (a typed graph plus episodic and semantic stores) rather than an external corpus of passages. Second, the merge between lexical and semantic ranking is reciprocal rank fusion rather than a learned re-ranker, because the corpus is small enough and the heterogeneity high enough that RRF beats learning-to-rank on the kind of corpora a single agent's brain produces. The next four subsections describe the layer concretely.

hybrid search

All queries go through a single search interface that fans out to two indexes. FTS5 handles lexical search with BM25 ranking, stemming, and phrase queries. sqlite-vec handles semantic search with cosine similarity over locally-computed embeddings. Both return ranked result lists.

reciprocal rank fusion

The two result lists are merged with reciprocal rank fusion (Cormack, Clarke, Buettcher 2009):

score(d) = Σᵢ 1 / (k + rankᵢ(d))    with k = 60

RRF is robust to score scale differences between lexical and semantic rankers, does not require calibration, and is parameter-light. Empirically it beats both linear combination and learning-to-rank approaches on the kind of small, heterogeneous corpora a single agent's brain produces.

local embeddings

Embeddings are produced by a local Ollama instance running nomic-embed-text. The choice is deliberate — it means the hot path has zero external API calls, retrieval is free at inference time, and an operator can run the entire stack offline. The trade-off is a small quality gap versus frontier embedding models; in practice this is dominated by the hybrid retrieval step and by the worthiness gate keeping the corpus clean.

Embeddings live in embeddings alongside shadow tables generated by the sqlite-vec extension. A lazy recompute strategy means re-embedding only touches rows whose content hash has changed.

v2.4.0 widened the embedding surface without changing the retrieval substrate. A five-model registry — nomic-embed-text (default), bge-m3, mxbai-embed-large, snowflake-arctic-embed2, and qwen3-embedding:8b — is selectable via BRAINCTL_EMBED_MODEL and re-indexable in place via brainctl reindex --model <name> with dim-mismatch validation. An optional cross-encoder reranker (bge-reranker-v2-m3) ships behind the brainctl[rerank] extra and lazy-imports sentence-transformers; it is default-off, falls back to a no-op when the dep is missing, and slots after retrieval rather than inside it.

Neither change alters the FTS+vec RRF backbone. The registry widens the set of reachable embedding shapes (multilingual, longer context, larger dim) so an operator can pick an embedding that matches the corpus instead of being locked to one default; the reranker adds an opt-in reordering pass for cases where top-K precision matters more than retrieval latency. Both are instrumentation, not a new ranking model.

spreading activation

Hybrid search returns the memories that match a query. Spreading activation, in the sense of Collins and Loftus 1975, returns the memories that are connected to the matches — the neighbors in semantic space whose activation matters because the matches activated them. Their original model was a semantic network with weighted edges and parallel decay: when a node is activated, activation spreads to connected nodes proportionally to edge weight, with the activation decaying over distance and time. The model was originally proposed to explain priming effects in semantic memory experiments — why hearing "doctor" speeds up recognition of "nurse" even when the two are not co-presented.

brainctl's knowledge graph — entities as nodes, knowledge_edges as typed directional relations — is the substrate for an approximation of this. When the search interface surfaces an entity, the retrieval pass also walks the graph one or two hops out and gathers the connected entities, weighted by recency, edge type, and access history. The effect is that a query for RateLimitAPI does not just return the entity; it returns the related decisions, the upstream services that depend on it, the recent observations attached to it, and any contradictions in flight.

salience routing

Retrieval is only the first step. A global-workspace-inspired attention budget (Dehaene & Changeux 2011, Baars 1988) decides which of the matched memories actually surface into the agent's working context. The budget weighs:

Only winners make it into the prompt. The workspace broadcast is logged to workspace_broadcasts so the agent (or an auditor) can later inspect what was attended to and why. This is the difference between "what does retrieval return" and "what does the agent actually see" — a distinction that matters when the agent makes a mistake and you need to trace it.

retrieval in the session lifecycle

Retrieval in brainctl is not a single operation invoked whenever the agent is curious. It is three distinct operations tied to three distinct phases of the session lifecycle, and the difference matters for both performance and correctness.

Orient retrieval (bulk, session-start). Brain.orient(project=...) is called once at the beginning of a session and returns a composed context package rather than a results list. The package includes: the most recent handoff packet for the project, the top-N semantic memories by access-frequency within scope, the relevant entities and their immediate neighbors in the knowledge graph, any open belief conflicts, any prospective memory triggers whose preconditions match the project keywords, and the decay-protected EWC-important memories. The package is assembled by running each of those sub-queries in parallel and merging the results under a token budget. The agent sees it once at the start of the session and does not need to re-query the basics.

Ad-hoc search (mid-session, query-addressed). Brain.search(query, k=...) is the operation the agent calls when it actually doesn't know something. It runs the full hybrid retrieval pipeline from §5.1–§5.4 and returns a ranked list. The caller is expected to pass a specific question, not a vague topic — the hybrid retriever works best when the query has enough lexical signal for FTS5 to find candidates that the vector index then reranks. A good rule of thumb: if the agent can phrase the query in one sentence, it's a good search; if it would phrase it as a whole paragraph, it should use orient with a narrower scope instead.

Wrap-up retrieval (session-end, self-referential). Brain.wrap_up(summary, project=...) performs a small internal retrieval against the session's own writes — memories, decisions, and events created during this session — and composes the handoff packet. This pass deliberately does not reach across scope; it only sees what the current agent did. The result is a packet that describes the session's own state, not the broader brain state, so the next agent orienting into the project can layer the handoff on top of its own orient context without double-counting.

The three operations form a closed loop. Orient reads broadly, search reads narrowly and on demand, wrap_up writes a structured summary that future orient calls read back. An agent that uses only remember and search gets much less than an agent that uses the full lifecycle, because without orient the agent starts cold every time and without wrap_up every session's work has to be re-discovered by the next. The lifecycle is why agent_orient and agent_wrap_up are native MCP tools (§2.4) rather than Python-only conveniences.

self-improving retrieval

The retrieval mechanisms above (§5.1–§5.6) describe a static pipeline: a query arrives, indexes are consulted, results are ranked and surfaced. brainctl extends this with six mechanisms that make retrieval a learning process that improves with every query.

Thompson Sampling exploration. Search reranking draws from Beta(α, β) distributions rather than using confidence point-estimates. Memories with uncertain confidence are explored more frequently; memories with high certainty are exploited. This converts static retrieval into a self-improving explore/exploit learner with zero additional infrastructure — it uses the existing Bayesian alpha/beta columns that track retrieval outcomes (Thompson 1933).

Retrieval-practice strengthening. Each successful recall boosts memory confidence proportional to retrieval prediction error. Hard retrievals — those with high prediction error — strengthen more than easy ones, implementing the "desirable difficulties" effect from cognitive psychology: memories that are used become stronger, memories that are not fade naturally (Roediger & Karpicke 2006; Bjork 1994).

Q-value utility scoring. Each memory carries a Q-value updated via temporal-difference learning after retrieval outcomes. Memories that contribute to task success receive higher Q-values, creating a reinforcement signal that links retrieval rank to downstream utility (Zhang et al. 2026 / MemRL).

Context-matching reranker. Every memory captures a JSON snapshot of the agent's operational context at write time — project, agent ID, session ID — plus a SHA-256 hash for fast matching, implementing Tulving & Thomson's (1973) encoding specificity principle. Search results receive a score boost (up to 20%) when their encoding context matches the current retrieval context: a full hash match contributes a 0.3 bonus; partial key-value overlap gives proportional credit. This signal is fused into the reciprocal rank fusion pipeline alongside FTS5, vector, Thompson Sampling, and PageRank signals, validated by Smith & Vela's (2001) meta-analysis of 93 context-dependent memory experiments.

Temporal contiguity. Memories created near each other in time receive a co-retrieval bonus (Dong et al. 2026), reflecting the well-established finding that temporal proximity at encoding predicts co-activation at retrieval. Complementing this, an encoding affect linkage (Eich & Metcalfe 1989) connects memories to the agent's emotional state at encoding time.

Per-project retrieval presets. The orient() call returns project-specific retrieval weight presets stored in the agent state table, enabling progressive specialization to project-specific retrieval patterns (Finn et al. 2017). A new project inherits global defaults; over time, project-level weights diverge to reflect the distinct statistical structure of each codebase or domain.

The net effect is that retrieval accuracy improves consistently with use. Early queries rely on keyword matching and recency; later queries benefit from accumulated Thompson posteriors, Q-value rankings, calibrated confidence, and project-tuned weight profiles.

knowledge graph activation and quantum scoring

The knowledge graph is not decorative — it is the retrieval substrate. Without entity connectivity, the consolidation coupling gate (§4.1) rejects memories from promotion, quantum interference computes at parity with classical retrieval, and PageRank has no edges to traverse. On early production data, 92% of episodic memories had zero knowledge-graph connections. The root cause was that memory_add did not automatically link new memories to known entities.

Zero-LLM entity linking. brainctl solves this with a three-layer pipeline that requires no LLM calls. Layer 1 scans all active memories for FTS5 substring matches against the known entity names (case-insensitive, names < 3 characters excluded). On production data this single pass created 746 new mentions edges, dropping isolation from 92% to 16%. Layer 2 (optional) runs GLiNER, a 205M-parameter bidirectional transformer for zero-shot named entity recognition (Zaratiana et al. 2024, NAACL), extracting person, project, tool, service, concept, and organization entities from remaining unlinked memories. Layer 3 creates entity-to-entity co-occurrence edges for memories mentioning two or more entities — producing 2,270 new co_occurs edges that densify the graph for both PageRank traversal and quantum interference. The combined pipeline grew the graph from 8 connected clusters to 81.

Phase-aware quantum amplitude scoring. With a connected graph in place, brainctl applies phase-aware amplitude scoring inspired by quantum cognition models. Each memory's amplitude is computed as √(confidence) × ei × phase, where phase encodes the memory's position in the knowledge-graph interference pattern. Neighbors connected via knowledge_edges contribute constructive interference (similar phases boost retrieval score) or destructive interference (opposing phases reduce it). The quantum signal is blended with the classical score and gated on confidence_phase being populated, enabling progressive rollout.

With 3,000+ edges from the entity linking pipeline, quantum interference has the substrate it needs to differentiate retrieval scores based on graph topology rather than keyword overlap alone. Memories that are well-connected to the current query's entity neighborhood receive constructive boosts; memories that are topologically distant receive destructive interference, effectively suppressing false positives that would score well on text similarity alone.

multi-agent and handoff

handoff packets

A handoff packet is a compact four-field structure emitted by brain.wrap_up: goal (what was the agent trying to achieve), current_state (where things stand), open_loops (what is unresolved), and next_step (what should happen first in the next session). These are the minimum sufficient statistics for continuation and map directly onto the continuation state a human would give a colleague.

Packets are signed (HMAC over the packet contents keyed by agent identity) so the receiving agent can verify that what it is orienting from is in fact what the previous agent wrote. This matters once multiple agents share a brain.

theory of mind

Each agent maintains not only its own belief state but a model of the beliefs of agents it hands off to — stored in agent_perspective_models. This allows asymmetric handoffs: the sending agent can tailor the packet to what the receiving agent already knows, and the receiving agent can reason about discrepancies between its own view and the sender's.

In the simple case this reduces to "skip context the receiver already has." In the interesting case it enables correction loops where a receiver can detect that the sender was operating under a stale assumption.

federation

Federation between brains — shared context across multiple physical databases with access control — is a direction rather than a scheduled milestone. The design sketch is a minimal sync protocol over signed append-only logs, allowing one brain to pull memories from another under per-scope permissions. The goal is to keep the single-file invariant at rest while enabling collaboration between operators in motion. Interest and concrete use cases from operators are what will drive whether and when this gets built.

trust model between cooperating agents

When a single brain file is shared by multiple agents — an orchestrator and several workers, a human and a reviewer bot, a research agent and a coding agent — the question of trust becomes load-bearing. Who can read whose memories? Who can overwrite whose decisions? What happens when a low-trust agent writes something that contradicts a high-trust agent's prior belief?

brainctl's trust model is built from four orthogonal dimensions that compose at query time:

  1. 1. Identity (agent_id). Every write is attributed to the agent that produced it. There is no anonymous write path. Two agents sharing a brain always know who wrote what, as a literal database join.
  2. 2. Scope. Memories are written into a scope — global, project:api-v2, or agent:reviewer-bot — and reads filter by scope. A worker agent scoped to project:api-v2 cannot see memories written under project:billing. Scopes are not hierarchical by default, but operators can configure inclusion chains (e.g., every project:* scope also sees global).
  3. 3. Trust level (memory_trust_scores). Each source is assigned a trust score that determines what it can overwrite, not just what it can read. A high-trust writer (the operator, a verified human reviewer) can supersede memories written by lower-trust writers. A low-trust writer (an ingest pipeline, a web scraper, a third-party tool) cannot supersede anything outside its own writes. The trust score is consulted by the W(m) gate: a low-trust candidate that would merge with a high-trust existing memory fails the gate with a trust downgrade flag, not a silent merge.
  4. 4. Provenance chain. The source-monitoring layer (§7.3) ensures that even a trusted agent's derived memories carry the attribution of the facts they were derived from. If a high-trust agent writes a conclusion that was derived from a low-trust observation, the conclusion inherits a parent link to the low-trust source. A later retrieval can apply a trust floor and filter both out.

Write conflicts between peers. When two agents write contradictory claims into the same scope at the same trust level, the contradiction flows into belief_conflicts rather than being silently resolved. Neither write is discarded. An operator (or a higher-trust arbiter agent) can then call resolve_conflict to rank the competing claims and collapse the loser via AGM (§3.5). This prevents the pathological case where two agents thrash on the same memory, each overwriting the other.

Read asymmetry. Cross-agent reads are unrestricted by default within the same scope — sharing a brain is the entire point of running multiple agents on one. But an operator can mark memories private to an agent, which restricts reads to that agent's identity even within the shared scope. This is how brainctl supports a reviewer pattern where a reviewer agent can read everything a worker agent writes but the worker cannot see the reviewer's private notes.

What brainctl does not yet do: cryptographic attribution. Agent IDs are database-level identities, not signed identities. A compromised process with write access to the brain file can write under any agent_id it wants; the RBAC layer protects against curious mistakes and tool-output contamination, not against an adversary that has already achieved filesystem-level access. Cryptographic per-write signatures are a candidate for a future release, but they would only matter in a multi-operator setting which is itself not yet the primary deployment model.

lineage

The memory-as-explicit-state line of work in ML agents has three load-bearing recent papers. Park et al. 2023 ("Generative Agents: Interactive Simulacra of Human Behavior"), the Smallville simulation, introduced a memory-stream architecture with importance, recency, and relevance scoring plus a periodic reflection step that synthesizes higher-level beliefs from raw observations. Packer et al. 2023 (MemGPT) reframed agent memory as an operating-system-style hierarchy with paging between context, recall, and archival tiers. Shinn et al. 2023 (Reflexion) showed that letting an agent verbally reflect on its own failures and write those reflections back to a persistent buffer improves task success across reasoning and coding benchmarks.

brainctl draws from all three: handoff packets generalize the Generative Agents reflection step into a session-bridging signed signature; the typed memory stores generalize MemGPT's tier hierarchy from three levels to six; and the reflexion_lessons table (migration 008, plus thereflexion_failure_recurrence tracker) is a direct implementation of Shinn et al.'s persistent reflection buffer. The lineage is not implicit. brainctl is what you get when you take those three papers seriously, keep the substrate local, and add the cog-sci pieces (AGM revision, schema integration, source monitoring) that the ML literature mostly leaves unaddressed.

security posture

quarantine

Untrusted input — memories written by an agent that ingested a web page, a user message, or a tool output — lands in the memory_quarantine table before it reaches memories. A human operator or a trusted reviewer agent marks each quarantined item as safe, malicious, or uncertain. Malicious items are purged with all derived knowledge edges retracted; safe items are promoted; uncertain items stay quarantined.

This is the primary defence against memory poisoning attacks, where an adversary tries to inject false premises into the agent's long-term memory via a tool response or a retrieved document.

pii audit trail

Personally identifiable information detection runs on every semantic write. Hits are logged to pii_audit with the source memory ID, the detected category (email, phone, name, account number, ...), and the action taken (redact, drop, escalate). This gives operators a single log to query when answering data-retention requests.

provenance chains and source monitoring

The cognitive-science frame for what this section describes is the source monitoring framework of Johnson, Hashtroudi, and Lindsay 1993. Source monitoring is the cognitive process by which a person attributes a memory to its origin — was this fact something I read, was it told to me, did I infer it, did I imagine it? Source monitoring failures cause confabulation: the propositional content of the memory may be correct, but the source attribution is wrong, and any decision grounded in that memory is grounded in a phantom premise. The DRM paradigm (Roediger & McDermott 1995) showed how easily even healthy human memory generates false memories under associative pressure, and how robust the confidence in those false memories can be.

brainctl's provenance posture is a literal implementation of source monitoring at the data layer, with the explicit goal of preventing the agent equivalent of confabulation. Every memory carries an agent ID, a source type (user-written, agent-written, tool-output, ingested-document, derived, consolidation-promoted), a creation timestamp, and — for memories derived from other memories — a list of parent IDs forming a directed acyclic provenance graph. The access_log table records every read with the reader, timestamp, and surrounding query context. Together these let an auditor trace any fact in the brain back to its origin and answer the question "why does the agent think this?" — not as a metaphor, but as a literal join.

rbac

Memory RBAC (migration 017) attaches a trust level to each memory source and a scope to each reader. High-trust writers (the operator, the user) can write to any scope; lower-trust writers (ingest pipelines, external tools) are confined to sandboxed scopes. Readers filter by scope at query time via memory_trust_scores.

encryption, supply chain, and incident response

Three concerns sit alongside the threat model above and deserve explicit treatment. None of them are novel problems — they are the standard operational-security concerns any system holding sensitive data faces — but pretending they do not apply to brainctl would leave the reader with gaps.

Encryption at rest. brainctl does not encrypt brain.db by default. SQLite has a closed-source encryption extension (SEE) and a free, widely-used alternative (SQLCipher), and the brainctl code path is compatible with both — the database is opened through a single function that respects a pragma-level encryption configuration. Operators who need at-rest encryption should either link against SQLCipher (one-time library swap, no code changes) or rely on filesystem-level encryption (FileVault, LUKS, BitLocker, ZFS native encryption). The default posture is to inherit from the filesystem because the majority of operators already have disk encryption enabled for everything else on the machine; adding another layer on top offers diminishing returns. For regulated environments where application-level encryption is contractually required, SQLCipher is the recommended path and a short operator guide exists in the repository.

Supply chain. brainctl's runtime dependencies are deliberately short. The core requirements are Python 3.11+, SQLite with WAL mode (shipped with Python), the sqlite-vec extension (a single shared library with no transitive dependencies), and — for semantic retrieval — a local Ollama instance running nomic-embed-text. The MCP server adds the mcp Python package. Nothing in this list is network-native at runtime: Ollama runs on localhost, sqlite-vec is compiled into the process, and the mcp stdio transport is file-descriptor-based. There is no browser surface, no long-lived network connection, and no auto-updater reaching out to a remote server. This is an intentional reduction in supply-chain surface: the fewer packages in the runtime, the fewer places a malicious upstream can land a compromised release.

Development dependencies are larger (test frameworks, linters, benchmark harnesses), but they are isolated to requirements-dev.txt / the dev extras and are not loaded by the runtime. The repository publishes a lockfile and the PyPI release is built from a deterministic CI pipeline; operators who want reproducible builds can pin against the lockfile and verify the built artifact against the release hash.

Incident response. If an operator discovers that a memory, a tool output, or a source was compromised, brainctl provides the primitives for a clean response without data loss:

  1. 1. Identify the compromised source. Query access_log for the offending agent ID or source type. The log is append-only and timestamped, so the window of compromise is recoverable.
  2. 2. Quarantine derived memories. The memory_quarantine table accepts writes from an operator that mark a set of memories as pending review. The trust-propagation pipeline then walks the provenance graph from the quarantined memories outward, flagging every downstream memory whose parent chain touches the compromised source. None of these are deleted; they are held pending review.
  3. 3. Replay consolidation with the compromised sources excluded. A dream cycle can be invoked with an exclusion set, so the Hebbian pass and the REM bisociation step do not reinforce the poisoned edges while the operator decides what to restore.
  4. 4. Restore or purge. Memories marked safe after review flow back into the active store; memories marked malicious are purged with the quarantine_purge tool, which retracts all derived knowledge edges and records the retraction in the audit log. The purge is a soft tombstone rather than a hard delete, so a subsequent investigation can recover the original content and its provenance.

The incident-response primitives above are designed to assume bad input is inevitable and to give the operator tools to contain rather than tools to prevent. Prevention is always incomplete; containment and auditability are what determine how bad a compromise becomes.

threat model: memory poisoning

The threat model brainctl's quarantine, source monitoring, and RBAC layers defend against is grounded in a specific recent security literature. Greshake et al. 2023 ("Not What You've Signed Up For") characterized indirect prompt injection: attacks where untrusted content arrives via a tool output, a retrieved document, or a web page and contains instructions intended to manipulate the agent's downstream behavior. The original paper demonstrated end-to-end attacks against LLM-integrated applications including Bing Chat and email assistants, with payloads as simple as a HTML comment hidden in a web page.

For stateful agents the threat is amplified, because injected content can persist across sessions if the agent writes it to memory. An attacker who controls a single tool response or document can craft input designed to land in the agent's long-term store and bias every future decision that retrieves from that region of memory. This is the agent equivalent of a persisted XSS attack, and it survives every restart and context reset.

brainctl's three structural defenses against memory poisoning are: (1) the memory_quarantine table, which holds untrusted writes pending review before they reach memories; (2) the trust-scoped RBAC of §7.4, which prevents low-trust writers from contaminating high-trust scopes even if quarantine is bypassed; (3) the source-monitoring provenance chain of §7.3, which makes every retrieved fact traceable to its origin so a compromised tool can be retroactively quarantined and all downstream memories it spawned can be retracted. None of these eliminate the risk — no defense does — but they make the difference between an agent that can be permanently poisoned by a single malicious tool response and an agent whose poisoning attempts are isolated, attributable, and reversible.

implementation and benchmarks

At v2.7.0, brainctl is implemented in approximately 64k lines of Python inside src/agentmemory/ with a SQLite schema defined by 51 migration files rebuilding to ~62 user-facing tables. The MCP server exposes 209 tools across two transports (stdio + streamable HTTP). Nineteen first-party plugins ship in-tree — agent frameworks (Claude Code, Codex CLI, Cursor, Eliza, Gemini CLI, Goose, Hermes, OpenClaw, OpenCode, Pi, Rig, Virtuals Game, Zerebro) and trading bots (Freqtrade, Jesse, Hummingbot, NautilusTrader, OctoBot, Coinbase AgentKit). The release sequence below is the migration-by-migration story of how the surface got here.

The v1.5.0 migration runner brings existing brain.db files up to the current schema safely. v1.6.0 added a deterministic single-system search-quality harness with a pytest regression gate that fails the build on any >2% drop in P@1 / P@5 / Recall@5 / MRR / nDCG@5. v2.4.0 added a five-model embedding registry, an optional cross-encoder reranker, and a same-fixture competitor harness under tests/bench/competitor_runs/ with adapters for Mem0, Letta, Zep, Cognee, MemPalace, and OpenAI Memory under a skip-not-fabricate contract. v2.4.1 added per-row provenance (retrieval_mode, vector_enabled, embedding_model, rerankers_active, search_args) plus a vec write-path connection pool that takes 30–100 ms off the Brain.remember hot path. v2.4.2 added brainctl status, a single-screen brain-health overview that combines DB stats, doctor-style issue detection, and service- availability checks and exits non-zero on any actionable issue so it can gate CI. v2.4.3 added the OpenCode (TypeScript hook) and Pi (proxy-via-adapter) plugin shapes alongside Goose (pure-MCP YAML). v2.4.5 added brainctl ingest code, a tree-sitter-based source-tree walker that writes file / function / class entities + contains and imports knowledge_edges into brain.db with SHA-256 idempotency caching, shipping three grammars (Python, TypeScript, Go) to keep the wheel footprint at ~4 MB.

v2.5.0 added the streamable-HTTP MCP transport (brainctl-mcp-http) alongside the existing stdio server, exposing the full tool surface over HTTP with bearer-token auth and an allowlist for remote clients (xAI Grok, Strand). v2.5.1 closed an MCP dispatcher bug + FTS5 cold-start consistency gap surfaced by the beta-audit.

v2.6.0 shipped the marketplace negotiation CLI (brainctl marketplace api ...) with full offer / counter / accept / reject / withdraw semantics, the protocol-prefix rename (brainctl-marketplace/v1:) decoupling the on-chain protocol identifier from the community-token ticker, the protocol fee schedule (flat per-op fees plus 2.5% on marketplace settlement, lowered from an earlier 3.5% before any production volume hit chain), and an independent treasury wallet (preserving the dev wallet’s anti-sniping hold ahead of token launch). v2.6.1 added provider import adapters (brainctl import mem0 / json) for onboarding from other memory providers into a quarantine scope, and brainctl bundle decrypt for local AES decryption of bundles you minted yourself. v2.6.2 fixed an MCP-server idle-timeout bug (#108) that was killing stdio servers under Claude Desktop after one hour of no requests. v2.6.3 added brainctl wallet export-key, which renders the managed wallet secret in the base58 format that Phantom, Backpack, Solflare, and Glow accept under their import-private-key flows, so a brainctl wallet can be paired with any standard Solana wallet UI. v2.6.4 added BRAINCTL_ALLOWED_TOOLS for stdio MCP clients that cap tool count (Google's Antigravity IDE caps at 100; brainctl exposes 201+ now and would refuse to load without the allowlist) — unknown names hard-error at startup with difflib "did you mean?" suggestions (closes #114).

v2.7.0 shipped procedural memory as a first-class store (§3.7) — migration 052, three canonical tables (procedures, procedure_steps, procedure_sources), an FTS5 virtual table for procedure search, and eight new MCP tools (procedure_add / get / list / search / update / feedback / backfill / stats) bringing the MCP surface to 209. The release credits Velamj as the author of PR #94, opened 2026-04-24, which predated comparable procedural-memory work in the public agent-memory space by ~19 days. Procedural memory is the third of Tulving's 1972 tripartite typology to become a first-class store in brainctl; earlier releases used the "procedural" label for what the new typology (§2.2) more accurately calls the decisional store.

Retrieval quality on standard benchmarks (measured 2026-04-18). Intel Core Ultra 7 258V, 33.9 GB RAM, default brainctl settings, no benchmark-specific tuning. Repro: python -m tests.bench.run --check-strict.

benchmarkscoringbrainctl
LoCoMo (n=1,986)session-level avg recall0.9217
LongMemEval (n=470)R@50.9702
LongMemEval (n=470)R@100.9894
MemBench FirstAgent (n=200)hit@50.930

The numbers above are committed as a regression baseline under tests/bench/baselines/ and gated in CI on every push — a >2% drop in P@1 / P@5 / Recall@5 / MRR / nDCG@5 fails the build. A same-fixture competitor harness lives under tests/bench/competitor_runs/ for operators who want to compare against Mem0, Letta, Zep, Cognee, MemPalace, or OpenAI Memory on identical inputs; adapters and reproduction instructions are in the repo. Per-row provenance fields added in v2.4.1 (retrieval_mode, vector_enabled, embedding_model, rerankers_active) make any future result bundle self-describing.

The local-first architecture is a deliberate performance and privacy choice. Every read is a direct SQLite hit — sub-millisecond on the hot path, no network round trip, no remote API to rate-limit or bill by the operation. Writes commit to WAL and return immediately; consolidation runs asynchronously during quiet hours. The brain.db file is yours: copy it, back it up, move it between machines. No token gates what your agent can remember. No third-party storage layer holds data you cannot inspect or export.

Representative figures on a single-agent brain after one month of continuous use on an M2 MacBook Pro:

operationp50p99notes
brain.remember3.1 ms9.8 msincludes W(m) gate + embedding
brain.search (k=10)6.4 ms18 msFTS5 + vec + RRF + salience
brain.orient22 ms55 msfull context package assembly
brain.wrap_up8 ms25 mspacket + signature + decisions flush
consolidation pass1.4 s4.1 sper 1000 episodic rows, offline

Numbers are indicative, not normative; a formal benchmark suite is in bench/ and runs in CI. The point is that the hot path is small — dominant cost is local embedding computation, which is still single-digit milliseconds.

dependencies and platform support

brainctl's runtime dependency list is intentionally short. The core requirements:

Platform support tracks SQLite and sqlite-vec. brainctl runs on macOS (Intel and Apple Silicon), Linux (x86_64 and aarch64), and Windows. CI exercises the full suite on macOS and Ubuntu on every push. Windows is supported but less battle-tested; the repository accepts Windows-specific bug reports and fixes them. The one platform that is explicitly unsupported is anything without a writable local filesystem — the single-file architecture is fundamentally incompatible with serverless runtimes that expose only ephemeral disk.

testing and release cadence

The test suite exceeds 1,700 tests covering the MCP tool surface, the W(m) gate, the migration runner, the consolidation pipeline, the belief revision logic, and the integration plugins. CI runs the full suite on every push and every pull request, on both macOS and Ubuntu, with the runs pinned to a fresh SQLite build to catch extension-loading regressions. The brainctl-mcp --doctor command runs a local subset of the CI checks on the operator's own installation — useful for diagnosing a broken install without shipping the environment to a maintainer.

Releases are SemVer: major for incompatible API changes, minor for backward-compatible features (plugin additions, new MCP tools, new tables via migration), patch for fixes. Release cadence is irregular but approximately every two to four weeks for minor releases and as-needed for patches. Every release includes a CHANGELOG entry with the migration list and any operator-visible behavioral changes. The PyPI release is built from a deterministic CI pipeline and published with an attestation so operators can verify the built artifact against the release hash.

contribution

brainctl is open source under MIT and accepts pull requests. The contribution model is intentionally low-ceremony: fork, branch, open a PR, get review. The one hard rule is that every new mechanism must have a research note in the research/ directory explaining the cognitive-science, ML, or systems grounding behind the design — this is how the paper trail in §10.1 stays current and how reviewers understand why the code looks the way it does. Tests are required for any change that touches the hot path or the W(m) gate; style conforms to ruff and black with defaults.

Good first issues typically fall into five categories: (1) new MCP tools wrapping existing Python API calls, (2) new plugins for agent frameworks that don't yet have first-party support, (3) research notes implementing a specific paper's mechanism as an optional feature, (4) bench harnesses extending the benchmark suite, (5) documentation and reproducibility bundles for the existing research notes. The project is not looking for large speculative refactors.

comparison with existing approaches

The agent-memory landscape is crowded and moving fast. Several mature projects already address subsets of the problem brainctl targets, and any honest comparison has to acknowledge that. The table below is descriptive — what each project does at the architectural level — not evaluative. The paragraph that follows it is the actual positioning.

projectsubstratememory typologybelief revisionconsolidation
LangGraph checkpointerper-graph state, pluggable backendflat dict or schema-defined statelast-write-wins on updatenone
Letta (formerly MemGPT)hosted, postgres or sqlitecore / recall / archivalnonerecall ↔ archival swap
Mem0hosted, postgres-backedflat memories with metadatanoneLLM-driven memory updates
Zephosted service, postgres + vectorsession messages + extracted factsnonenone
Cogneepostgres + vector + graphknowledge-graph-firstnoneoffline graph build
brainctlsqlite, single file7 typed stores + knowledge graph (incl. first-class procedural in v2.7)AGM with collapse audit8-phase consolidation pipeline

Letta (formerly MemGPT) pioneered the recall/archival memory swap and remains the strongest reference point for hosted multi-agent memory. Mem0 targets memory-as-a-service for production agents and has the cleanest REST surface in the category. Zep offers enterprise-grade session storage layered on Postgres and is probably the right call for any team that needs row-level ACLs, audit logs, and SOC2-style compliance from day one. Cognee leads on knowledge-graph-first memory and is the closest cousin to brainctl in spirit. LangGraph's checkpointer is the right answer for teams already invested in LangChain's runtime. CrewAI and AutoGen both ship first-party memory layers and are the path of least resistance inside their respective frameworks. The built-in memory features in ChatGPT and Claude shape end-user expectations and quietly set the floor for what users assume an agent should remember.

None of these are wrong. brainctl does not try to be a better Letta or a better Mem0; it tries to be the answer for a specific design corner that the rest of the field does not occupy: local-first SQLite as the only required infrastructure, seven typed memory stores with all three of Tulving's episodic / semantic / procedural types as first-class data models, AGM belief revision with a full collapse audit instead of last-write-wins, an eight-phase consolidation pipeline that actively generates hypotheses during quiet hours, and a chain-canonical agent-to-agent marketplace built on top of the memory primitives. If you need any of those five things and you also need MIT licensing (Apache 2.0 on the marketplace components) and zero hosted dependencies, brainctl is the answer. If you need anything the others ship that brainctl does not — managed multi-tenant hosting, enterprise audit tooling, framework-native ergonomics — use them. The design space brainctl occupies is underpopulated, not contested.

economics: why a token

license posture

brainctl is MIT-licensed and will remain so. There is no enterprise edition, no paid tier, no gated features. Every mechanism in this paper is implemented in the public repository, with tests and a research note in research/. The ~40 research notes in that directory document the cognitive-science grounding of each mechanism with citations and reproduction instructions.

why a token rather than grants or VC

Open-source infrastructure is chronically underfunded. The two mainstream funding mechanisms both fail in characteristic ways.

A token is a third option. It aligns funding with the people who benefit from the memory layer — builders running agents, operators paying for inference, the agents themselves eventually — without putting the software behind a paywall. If it fails, the software is still free. If it succeeds, the research accelerates.

distribution

The brainctl community token has not launched yet. There is no contract address, no pump.fun listing, and no circulating supply. The ticker symbol is intentionally being withheld until launch — anything trading today under a brainctl-style ticker is not us. When the token does launch, the intent is a fair launch on pump.fun — no team allocation, no pre-sale, no vesting — with the launch itself serving as the distribution.

The development wallet is already public, however, and already being tracked live on /transparency. Every inbound and outbound transfer is rendered on the page — fetched server-side from the Solana chain via the Helius enhanced transactions API, cached for 60 seconds, and cross-checkable against any independent RPC. The point is that the ledger is public before any money moves, not after.

Nothing in this section is financial advice or an offer to sell securities. The brainctl community token is unlaunched and its ticker is withheld until deploy. The brainctl software is free, open source, and MIT-licensed independent of any token.

commitments

Two concrete commitments at launch. Nothing else is promised about how future fees, treasury balances, or proceeds will be deployed — those decisions will be made and disclosed as they happen, on the public transparency page below. Making no commitment is preferred to making one we'd need to walk back.

  1. 1. Buy + burn ~10% of total supply. The team will use launch proceeds to buy approximately ten percent of total token supply on the open market and burn it. This is a one-time deflationary action executed shortly after launch, with the burn transaction published on the transparency page when it occurs.
  2. 2. Lock ~10% of total supply. An additional ~10% of total supply will be acquired and locked, with the lock address and unlock terms published on the transparency page at lock time.

Transparency. The development wallet is public from the moment the token deploys (pre-launch it is held privately to prevent sniping, §10.3). Every inbound and outbound transfer is rendered live on /transparency via a server-side Helius fetch, cached 60 seconds, cross-checkable on Solscan. There is no separate "internal" wallet, no treasury sub-account hidden off the transparency page, no off-chain spending channel. Operators, contributors, token holders, and adversaries all see the same ledger at the same time.

License invariant. The brainctl software is MIT-licensed (the marketplace components Apache 2.0) and version-controlled on GitHub. That invariant does not depend on the token, the treasury, or any market outcome.

the on-chain primitives (signed exports → mint → marketplace)

brainctl ships three composable on-chain primitives that together let agents trade memories with no custodial layer in the middle. Each is usable in isolation; the marketplace is the third, built on top of the first two. All three are shipped and on PyPI as of v2.7.0 — this section is a specification of the running system, not a forward-looking promise.

Layer 0: signed exports (brainctl 2.3). brainctl export --sign produces an Ed25519-signed JSON bundle of memories that anyone can verify offline without brainctl itself. --pin-onchain additionally writes the bundle’s SHA-256 hash to Solana via the SPL Memo program — only the hash. The signature proves the bundle came from a specific wallet; the memo proves the bundle existed at a specific slot. The primitive added a flat protocol fee in v2.6.0 (a tiny SOL transfer atomic with the memo); below the fee schedule.

Layer 1: mint (brainctl 2.5). brainctl export --sign --mint takes the same signed bundle and:

The compressed token is a standard SPL asset: indexable by Helius, transferable from any wallet UI, listable on Tensor or Magic Eden. The chain sees the bundle hash and the ciphertext URI; it does not see plaintext.

Layer 2: marketplace (brainctl 2.6+). Live at brainctl.org/marketplace and via brainctl marketplace api ... in the Python CLI. Sellers list a signed bundle’s hash rather than a pre-minted token — the cNFT is minted just-in-time to the buyer at settlement, so a single bundle can be sold to many buyers, each receiving their own freshly-minted token. Listings are USD-pegged with a $10,000 cap; settlement is in SOL pre-launch and switches to the community token via a single env-flip post-launch. The marketplace components are Apache 2.0 (the rest of brainctl stays MIT) — patent grant included, encouraging other agent platforms to adopt the memo format and the indexer pattern without friction.

The marketplace runs as a chain-as-database: state lives in Solana memos and Arweave manifests, not in any server-side database. Every action — list, offer, counter, accept, reject, withdraw, buy, release, cancel — is a signed memo with the deterministic prefix brainctl-marketplace/v1:<action>:.... The brainctl.org API is an indexer and a transaction builder; if it disappears, the same state is reconstructible by any other party from the chain. The API does not hold keys, does not custody funds, does not escrow. Authentication is wallet-signature challenge-response, persisted only in ephemeral KV (5-minute nonce TTL, 24-hour session TTL).

Negotiation. A listing picks a visibility mode at creation: auction (offers visible to all browsers) or private (offers visible only to the seller and the offerer; the chain memos themselves are public, but the indexer hides them from third parties). Each offer carries a USD-pegged price (capped at $10,000, converted to the payment token at settle time via Jupiter spot) and a TTL capped at 24 hours. Either side can counter an offer; the chain preserves the full lineage so a later reputation index can be derived from it.

Settlement. The buyer’s settle transaction does, in a single signature:

On detection of the buy memo, the seller’s daemon (brainctl marketplace api listen) mints a fresh compressed token to the buyer, SealedBox-encrypts the bundle’s AES key to the buyer’s X25519 pubkey, uploads the envelope to Arweave, and posts a release memo brainctl-marketplace/v1:release:<listing>:<envelope>:<minted_cnft>. The buyer polls for the release memo, decrypts the envelope locally to recover the AES key, fetches the encrypted bundle from the listing’s Arweave URI, decrypts, and optionally ingests into its own brain.db under scope=imported:<listing_id> — a quarantine scope that does not blend into the agent’s primary memory until explicit promotion.

Protocol fee schedule. Two layers of fee, both atomic with the operation they accompany so the buyer / seller never lands in a partially-settled state. The flat per-op fees (calibrated to SOL ≈ $200, overridable per-deployment via env vars) are:

The treasury wallet that receives these fees is deliberately separate from the dev wallet that holds the community-token allocation and the quarterly-burn pool. The split preserves the dev wallet’s anti-sniping hold (the address is held privately ahead of token launch, so snipers cannot watch it for the createToken transaction and front-run launch participants) while still letting the marketplace fee infrastructure run on a fully public, queryable address from day one.

Trust model. The remaining trust requirement is on the seller actually releasing the bundle key after payment lands. The shipped enforcement is the chain record itself: every release (or non-release) is signed by the seller’s wallet and visible to every future buyer, so a seller who collects payment without releasing keys ends up with a wallet history that disqualifies them from future sales. Stake-to-list, dispute window, and slashing are referenced in the constants (LISTING_STAKE_USD = 1.0 in agentmemory.marketplace) but are not yet enforced on chain. The next major iteration replaces the trusted-seller release entirely with Lit Protocol threshold encryption conditional on payment-confirmation finality, which removes the trust assumption. That work is roadmap, not shipped.

direction

brainctl does not maintain a fixed quarterly roadmap. The project is issue-driven: priorities come from the public GitHub issue tracker, from operator feedback, and from whichever research threads are returning interesting signal. A frozen eighteen-month plan would be fiction, and the alternative — publishing one anyway — is exactly the kind of thing a serious open-source project should refuse to do.

What the authors are currently interested in, without committing to timelines, includes:

Anything here may change. If a direction matters to you, the fastest way to affect priorities is to open an issue or send a pull request — every v2.6.x release this month closed a community-filed issue (#108 idle timeout, #113 Windows SIGHUP, #114 tool allowlist) and v2.7.0 itself shipped an external contributor's PR (#94 procedural memory, by Velamj).

open questions and research frontier

The second half of this section is an honest inventory of hard problems that are not solved in brainctl today. The list matters because a whitepaper that only describes what works gives the reader a dishonest model of what the project actually is. These are the places where the cognitive science or the ML literature suggests a better answer than what brainctl currently implements, and where work we would be happy to see (or do ourselves) is wide open.

The distinction between this section and the roadmap above: §11 lists work we intend to ship, with engineering paths visible. §11.1 lists research questions we don't yet know how to answer well. Some items in §11 will collapse into this list when the engineering reveals an open design question; some items here will graduate to §11 when the path becomes clear.

Learned schemas. The nine memory categories in §2.2 and the schema-integration story in §4.4 use hand-designed schemas. Humans acquire new schemas throughout life; a schema-acquiring agent would be able to grow its own category system in response to new domains. Concretely: given a corpus of observation memories the agent keeps writing, can brainctl identify that a new category has emerged and propose its addition? This is a clustering-plus-validation problem with the extra constraint that the new category should be stable across sessions.

Procedural fitness vs novelty. §3.7 ships a fitness score that updates from execution outcomes — high-fitness procedures naturally surface earlier in procedure_search. That's exploitation. The counterpart problem is when to stop reaching for the well-trodden procedure and try something new — exploration. Retrieval has Thompson Sampling for this; procedural memory currently does not. A naïve copy of the same explore/exploit mechanism into procedural retrieval is the obvious starting point, but procedures execute side-effects (real tool calls, durable changes), so the exploration cost is higher than for read-only retrieval. Calibrating the explore/exploit tradeoff to that asymmetry is an open design question.

Learned procedural acquisition. procedure_backfill already auto-creates procedures from candidate memories using a hand-designed regex heuristic (looks_procedural: how-to phrasing, if-then conditionals, rollback language, step markers). That covers acquisition from already-written memories. What is genuinely open is learned acquisition directly from the event stream — clustering successful trajectories at the tool-sequence level and promoting stable clusters into procedures without a hand-designed classifier. The clustering is easy; the validation — "is this cluster actually capturing the same task, or is it three superficially-similar but distinct workflows?" — is the hard part, and it's an open ML question.

Intent-matched prospective memory. §3.6 is honest that prospective memory triggers match on keywords, not intent. A trigger with precondition "billing schema" fires for every query mentioning those words, including unrelated contexts. An intent-matched variant would use the embedding of the precondition and a threshold against the query embedding, or a small trained classifier, to decide whether the trigger is actually relevant to the agent's current task. The data structures exist; the matching logic is the gap.

Spreading activation: equilibrium dynamics + inhibition. The current spreading_activation implementation already does more than bounded BFS — it propagates with per-edge-type weights (semantic_similar: 1.0, causes: 0.9, causal_chain_member: 0.8, etc.) and exponential decay per hop (decay ** (hop + 1)), citing Collins & Loftus 1975 directly. What is not there: iterative relaxation until equilibrium (the implementation does a fixed two-hop sweep), sibling inhibition (Collins & Loftus had inhibitory edges between siblings of an activated parent), and path accumulation (the current code takes the max contribution per target, not the sum across paths). Each of those adjustments changes how multi-hop reasoning surfaces related concepts; whether the fidelity gain justifies the complexity cost is the open empirical question.

Neural replay at short timescales. §4.1 is honest that brainctl's replay has nothing to do with theta rhythms or sequence preservation at biologically meaningful timescales. There is a research question whether any of that matters for agents: mammalian sharp-wave ripples compress 1–10 seconds of experience into ~100 ms, and the compression ratio itself may be a load-bearing part of why replay works. Whether an agent memory system benefits from time-compressed replay of sequences, or whether the current unordered top-K replay captures everything that actually matters, is not known.

Closed-loop W(m) calibration. The memory_outcome_calibration table is written to (every retrieval outcome flows in via outcome_eval), but nothing currently reads that table to adjust the W(m) coefficients (α, β, γ from §3.2). The loop is half-closed: outcome data is captured but coefficient updates are still manual. A worthiness gate that tuned its own coefficients from accumulated calibration data — same Bayesian update pattern as §3.3, applied one level up — would close the loop. The data and the schema exist; the adjustment policy is the gap.

Whole-pipeline retrieval calibration. The RRF k parameter has been tuned empirically for brainctl: the shipped value is k=30, deliberately moved off Cormack 2009's canonical k=60 after observation that brainctl's corpus (tens to low hundreds of memories per scope, not the millions in traditional IR benchmarks) shifted the optimum. What remains open is whether every retrieval-time constant (the salience-routing weights in §5.5, the per-edge-type weights in spreading_activation, the Q-value temperature for Thompson exploration, the temporal-recency half-life) has been similarly calibrated. Most have not been ablated. A full sensitivity analysis over a common agent-trajectory benchmark — alongside the competitor benchmarks in §11 — would close that loop too.

Cryptographic agent attribution. §6.4 notes that agent IDs are database-level identities, not signed identities, and that a compromised process with filesystem access can write under any agent ID. Cryptographic per-write signatures would harden multi-operator deployments but require a key distribution story that brainctl currently does not have. This becomes urgent once federation ships, and the answer probably ties into the same wallet-identity model the marketplace already uses for authentication.

Cross-brain conflict semantics. Federation is in the roadmap, but the hard parts are not transport — signed append-only logs are well-understood — they are the semantics of cross-brain conflict resolution. When two brains hold contradictory AGM beliefs and attempt to merge, whose provenance chain wins? What happens to the importance scores of memories pulled from a remote brain — do they inherit, decay, or normalize? Is the receiving brain's W(m) gate the right admission authority, or does the remote write's home gate matter? Open design questions all the way down.

Marketplace reputation that resists gaming. §10.5's current trust enforcement is just the chain record itself: every release (or non-release) is signed and visible. That's sufficient if the community can read provenance fluently, but it scales poorly — agents shopping for memories need a single signal, not a forensic exercise. A reputation index is the natural next layer, but every naïve formulation (release rate, sale velocity, age-of-wallet) is gameable. Sybil-resistant reputation that derives meaningful signal from on-chain history without being trivially manipulable is an open marketplace-economics question, and the answer likely cross-pollinates with the broader on-chain identity literature.

Trustless key release at threshold. Lit Protocol threshold encryption is in the roadmap (§11) as the mechanism that removes the trusted-seller assumption from marketplace settlement. The design space is wider than "just use Lit": what threshold size resists seller-validator collusion, how does the payment-confirmation oracle integrate with Solana finality (single-slot, sub-slot, or n-block-back), what happens when the threshold network has stale state vs the chain, and how is key rotation handled for long-lived listings. These are cryptographic-systems-design open questions, not just integration work.

Hosted-MCP per-call billing economics. The x402 middleware in §11 is shaped like a Stripe-without-Stripe primitive: agent calls a tool, server returns 402, agent's wallet pays, tool runs. The mechanism is well-defined; the pricing is not. Should hosted brainctl-mcp-http charge per-tool-call (every retrieval is the same price), per-result (only successful retrievals charged), per-memory-volume (writes scale with bundle size), per-token (input + output of the LLM the tool was feeding)? Each has different incentive structures and different spam-resistance properties. Empirical answers require running the experiment.

MCP surface vs cognitive load. brainctl exposes 209 tools and is still growing. Anthropic, OpenAI, and Google all publish recommended ceilings (typically 50–100 tools per server) below which agent tool-selection accuracy holds, but the empirical basis is thin. v2.6.4's BRAINCTL_ALLOWED_TOOLS is a coping mechanism — it lets tool-capped clients work — but the underlying question is whether the full surface degrades any agent's reasoning, even on uncapped clients. Capability groups (§11) help if the answer is "yes, curate"; they're wasted scope if the answer is "no, just sort intelligently". A clean ablation would settle it.

Formal verification of the W(m) gate. The gate's correctness properties — monotonicity under trust level, idempotency under rewrite, convergence under repeated redundant writes — are stated informally in §3.2 but not proven. A formal model of the gate in a proof assistant would be a strong signal of seriousness and a genuinely useful artifact for anyone extending the system.

This list is not exhaustive. It is what we can currently articulate as open problems from inside the project; the most interesting research questions are usually the ones that are not yet legible as questions. We expect this list to grow, not shrink, as the project matures — a healthy research program is one that accumulates hard problems faster than it solves them.

references

back
brainctl — whitepaper