CLS++ — Continuous Memory System
CLS++ — Continuous Memory System Comprehensive Design + Operational Readiness Field Value Author(s) Rajamohan Jabbala Reviewers— Status Approved Created 2026-02-19 Last Updated 2026-02-19 Repository github.com/rajamohan1950/CLS-PlusPlus Patent Filing Provisional Patent — October 2025 (35 U.S.C. § 111(b)) Table of Contents
- Context and Scope
- Requirements
- System Design
- Capacity Planning and Sizing
- Resilience and Fault Tolerance
- Observability and Operations
- Testing Strategy
- DORA Metrics and Engineering Excellence
- Alternatives Considered
- Cross-Cutting Concerns
- Implementation Plan
- Open Questions and Future Work
- Known Issues, Design Gaps, and QA Findings
- Appendix
- Context and Scope Problem Statement Every large language model in production today operates with amnesia. A session ends, the context window clears, and the model forgets everything — your name, your preferences, the correction you made three conversations ago, the fact you told it six months ago. The only workaround available today is RAG (Retrieval-Augmented Generation): embed the past, search the past, inject it back into the prompt. But RAG is not memory. It is lookup. It has no notion of what matters more than something else, no mechanism to strengthen a fact through repeated confirmation, no way to update a stale belief without destroying the old one, and no ability to forget gracefully when information is no longer relevant. The result is a class of AI products that are powerful within a session and useless across sessions. Users repeat themselves every conversation. Agents contradict themselves. Systems hallucinate facts that a real memory would have corrected. The gap between session-level intelligence and lifelong intelligence is the core unsolved problem in applied AI today. Existing partial solutions each address one dimension without solving the whole: MemGPT retrieves semantically similar content but has no permanence, no decay, no belief revision. Recurrent Memory Transformer (RMT) achieves temporal continuity during a session but loses all state at session end. Compressive Transformer (DeepMind) compresses old activations but is unaware of which memories matter — it remembers vaguely, not selectively. Infini-Attention (Google) handles infinite context in a single stream but has no hierarchical structure, no sleep cycle, no stable long-term store. RETRO (DeepMind) retrieves from a global text corpus but has no personal or adaptive memory — it knows everything and remembers nothing. None of these systems can answer a basic question that any human relationship requires: "What did we decide the last time we talked, and has anything changed since?" Solution CLS++ (Continuous Learning System++) is a brain-inspired memory architecture that gives AI agents the ability to remember, adapt, and reason across time without losing contextual continuity between sessions. CLS++ draws directly from neuroscientific Complementary Learning Systems (CLS) theory (McClelland & O'Reilly, 1995), which describes how the human brain uses two complementary systems — the hippocampus for fast episodic encoding and the neocortex for slow semantic integration — to build lifelong memory without catastrophic forgetting. CLS++ implements this theory computationally with four stores (Working Buffer, Indexing Store, Schema Graph, Deep Recess), a plasticity control loop with five biological signals (Salience, Usage, Authority, Conflict, Surprise), engram formation analogous to long-term potentiation, a reconsolidation gate for safe belief revision, and a nightly sleep cycle that performs the same consolidation, decay, and synthesis functions as biological REM and NREM sleep. The result is not a smarter context window. It is a living memory substrate — one that strengthens what matters, forgets what doesn't, corrects itself when shown evidence, and becomes more reliable the longer it runs. Design Philosophy Biology is the specification: Every component maps to a brain region. This is not analogy — it is the design constraint. When in doubt, ask what the hippocampus would do. Memory is a process, not storage: CLS++ does not store facts. It manages the lifecycle of beliefs: from fleeting attention through episodic capture, semantic integration, stable engram formation, and eventual reconsolidation or decay. Forgetting is a feature: Exponential decay is not data loss. It is cognitive hygiene. A memory that isn't reinforced should fade. A belief that accumulates conflicting evidence should be revisited. Systems that never forget become unreliable faster than systems that forget well. Provenance is non-negotiable: Every fact in every store carries a chain of evidence — source, timestamp, confidence, version, checksum. No fact exists without attribution. No belief updates without a quorum of evidence. Single-tenant by default, multi-tenant by architecture: CLS++ ships with per-namespace isolation and consent tags on every memory write. Multi-tenancy is additive, not a retrofit. Non-Goals General-purpose vector database — CLS++ is a memory system with biological semantics; raw similarity search belongs in Pinecone or pgvector Foundation model training — CLS++ augments LLM inference; it does not replace pretraining Real-time event streaming — Memory writes are synchronous; sub-millisecond event ingestion at Kafka scale is out of scope Emotion recognition or affective computing — The value channel (reward modulation) is outcome-based, not sentiment-based Replacing the context window — CLS++ feeds context into the context window; it does not eliminate the transformer's attention mechanism
- Requirements 2.1 Functional Requirements ID Requirement Priority Acceptance Criteria FR- 001 Four-store architecture — Working Buffer (L0), Indexing Store (L1), Schema Graph (L2), Deep Recess (L3) operate as independent stores with defined promotion pathways between them P0 All four stores accept reads and writes independently; promotion pipeline moves items L0→L1→L2→L3 based on plasticity score; unit tests confirm promotion at correct thresholds ID Requirement Priority Acceptance Criteria FR- 002 Working Buffer — Token ring buffer with configurable capacity (default 4,096 tokens), TTL- based eviction, and crash-safe snapshot to Indexing Store P0 Insert/evict O(1); no memory leak over 1-hour soak; snapshot triggers on TTL expiry; evicted tokens appear in Indexing Store within 500ms FR- 003 Indexing Store — Episodic embedding store with metadata (timestamp, salience, source, consent); supports top-k cosine retrieval P0 P95 kNN < 50ms at 100,000 items; embedding deterministic (cosine self-similarity > 0.999); metadata fully preserved on round-trip FR- 004 Schema Graph — Semantic concept graph with typed edges (subject→predicate→object, weight); supports multi-hop traversal up to configurable depth P0 Insert 2,000 nodes and 10,000 edges; query P95 < 80ms; hop traversal budget-bounded (no unbounded walks); edge weights decay per sleep cycle FR- 005 Deep Recess — Append-only engram archive with SHA-256 checksumming, Parquet columnar storage, P0 Write P95 < 30ms per record; checksum verified on every read; ID Requirement Priority Acceptance Criteria Vector Quantization compression, and Bloom filter existence checks compression ≥ 10:1 vs raw storage; Bloom filter false-positive rate < 0.1% FR- 006 Plasticity Engine — Compute promotion score from five signals (S, U, A, C, Δ) using the formula = αS + β·log(1+U) + γA − λC + δΔ ; apply configurable thresholds Score P0 Score computation deterministic on fixed seed; promotion correctly triggered when Score > 1.5; unit parity with mathematical specification to 1e-6 tolerance FR- 007 Salience signal (S) — Classify each incoming event for contextual importance (0–1) using lightweight LLM classifier or heuristic scorer P0 Salience assigned within 100ms of event write; coherent ordering confirmed on labeled test set (top- quartile items reliably more salient than bottom) FR- 008 Usage signal (U) — Increment usage counter on every retrieval; counter persists across process restarts via DB-backed store P0 Counter increments atomically; no double-increment under concurrent reads; survives container restart ID Requirement Priority Acceptance Criteria FR- 009 Conflict signal (C) — Detect contradiction between new fact and existing memory using cosine distance + semantic overlap; output conflict score (0–1) P0 Conflict correctly scored ≥ 0.7 on labeled contradiction pairs; correctly scored < 0.2 on consistent pairs; P95 scoring < 200ms FR- 010 Surprise signal (Δ) — Measure novelty as deviation from Schema Graph prediction; high Δ accelerates reinforcement analogous to noradrenaline effect P1 Surprise inversely correlated with schema coverage in labeled test; high-Δ items show faster salience convergence FR- 011 Decay function — Apply exponential decay S t = _ S _ 0 · e^(−k·t) to salience of unused items; k is per- item configurable; items with S_t < 0.2 are archived or pruned P0 Decay mathematically correct to 1e-4 tolerance; pruning only triggers at correct threshold; no premature pruning of reinforced items FR- 012 Engram formation — Promote a memory to Deep Recess when usage _ days ≥ 5 AND confidence ≥ P0 Engram formation correctly triggered in simulation (Day 5 test); checksum computed as SHA- ID Requirement Priority Acceptance Criteria 0.85 AND conflict < 0.2 ; checksum the engram before engraving 256(content ∥ timestamp ∥ source); engram never written without passing all three gates FR- 013 Reconsolidation Gate — When new fact conflicts with engram (conflict > 0.3), compute evidence quorum Q = Σ(w_i · confidence_i); archive old and engrave new only when Q ≥ 0.8 P0 Wrong-update rate < 1% on synthetic conflict corpus; old version archived with version number intact; lineage table preserves temporal history FR- 014 Sleep cycle — Nightly background job: Phase 1 (strengthen), Phase 2 (decay/prune), Phase 3 (merge duplicates), Phase 4 (compact to Deep Recess); emit morning report P0 Retention > 90% on gold set; noise reduction > 60%; compression ≥ 10:1; completes within 60-minute budget; idempotent (safe to re-run on crash) ID Requirement Priority Acceptance Criteria FR- 015 Dream replay — During sleep, replay high-salience episodes by re-embedding; update schema graph links analogous to REM sleep synthesis P1 Dream replay increases schema edge weights for replayed items; consistency check score > 0.85 post-replay on labeled probe set FR- 016 Memory read API — POST /v1/memory/read : semantic query against all four stores with schema expansion, conflict labeling, and provenance in response P0 P95 end-to-end read < 120ms at 100,000 items; provenance array (store, id, version) always present; contested facts labeled as current / historical / contested FR- 017 Memory write API — POST /v1/memory/write : accept text + metadata, embed, insert to Indexing Store, compute plasticity, optionally promote to Schema Graph P0 Write P95 < 150ms; consent tag required (400 if absent); promotion flag returned in response FR- 018 Sleep trigger API — POST /v1/memory/sleep : admin-only trigger for nightly maintenance cycle with configurable budget (max items, max seconds) P0 Returns structured morning report; 409 if already running; bounded by budget parameters ID Requirement Priority Acceptance Criteria FR- 019 Conflict adjudication API — POST /v1/memory/adjudicate conflict : submit new fact + _ evidence list; gate returns merge / replaced / rejected with lineage P0 Decision correct on labeled conflict corpus; 422 when quorum unmet; full lineage preserved on replace FR- 020 Item lineage API — GET /v1/memory/item/{id} : return all versions of a memory item with checksums and timestamps P0 Returns complete version history; checksums match stored values; empty array (not 404) when no history FR- 021 Health API — GET /v1/memory/health : composite health score + per-store metrics (LHRA, DHR, drift, compression, latency) P1 Health score computed from weighted store metrics; refreshes within 60 seconds; all target metrics present in response FR- 022 Namespace isolation — All memory items tagged with namespace (e.g., learning_ style , policy, health ); reads and writes are scoped; reconsolidation never crosses namespace boundaries P0 Cross-namespace queries return empty (no bleed); reconsolidation gate enforces namespace boundary; ID Requirement Priority Acceptance Criteria consent enforcement respects namespace ACLs FR- 023 Right-to-Forget — Tombstone-based deletion propagates through all four stores + caches + backups within 24 hours; verifiable deletion receipt issued P1 RTBF SLA: 95% completed < 24h; deletion receipt is SHA-256 hash of deleted record IDs; Bloom filter updated within 1 compaction cycle FR- 024 Consent enforcement — Every memory write includes consent tag ( explicit / implicit / none ); reads with require consent=true return 403 for _ none -tagged items; sleep cycle respects consent during deduplication P0 Zero consent violations in integration test suite; none -tagged items never returned on consent- required reads; sleep cycle merges only across compatible consent levels ID Requirement Priority Acceptance Criteria FR- 025 Multi-tenancy — Row-level isolation by tenant_id on all store tables; namespace ACLs per tenant; tenant-specific plasticity coefficient tuning P1 Zero cross-tenant data leakage in isolation test suite; tenant-scoped plasticity coefficients persist across restarts; routing layer directs requests to correct shard 2.2 Non-Functional Requirements Performance ID Requirement Target Measurement NFR- P01 Indexing Store kNN retrieval P95 < 50ms at 100K items; P95 < 120ms at 1M items Ring buffer with per-query RTT NFR- P02 Schema Graph traversal P95 < 80ms (2 hops, budget=5,000 nodes) Per-query RTT NFR- P03 Deep Recess engram write P95 < 30ms per record Per-write RTT NFR- P04 End-to-end read (all stores) P95 < 120ms at 100K items API middleware RTT NFR- P05 Plasticity score computation < 1ms per item (stateless scorer) Unit benchmark NFR- P06 Sleep cycle Completes within 60 min budget at 20,000 items Job execution timer NFR- P07 Conflict scoring P95 < 200ms Per-conflict RTT Availability and Reliability ID Requirement Target Measurement NFR- A01 Service uptime > 99% (Docker healthcheck every 10s, 5 retries) Healthcheck pass rate NFR- A02 Deep Recess RPO 2 minutes (backup daemon at same cadence as host) Backup log freshness NFR- A03 RTO (full restore) < 10 minutes restore.sh execution time NFR- A04 Sleep cycle idempotency Safe to re-run after crash; checkpointed every 1,000 items Checksum re-verification on restart NFR- A05 Engram integrity Zero undetected mutations (SHA-256 verified on every read) Checksum mismatch rate = 0 Security ID Requirement Target Measurement NFR- SEC01 Authentication OAuth2 Bearer (m2m or user), mTLS optional for service-to-service Token freshness, scope audit NFR- SEC02 Encryption at rest AES-GCM per tenant; KMS-rotated keys Key rotation audit log NFR- SEC03 Encryption in transit TLS 1.3 minimum on all API calls Certificate audit NFR- SEC04 PII in logs Zero PII in structured logs; audit-only access to raw content Log scan on each deploy Maintainability and Operability ID Requirement Target Measurement NFR- M01 Deployment docker compose up -d --build < 5 min Build time measurement NFR- M02 Observability All four stores + sleep cycle + plasticity visible in dashboard Dashboard coverage NFR- M03 Test coverage Unit, integration, evaluation harness; all gates exit- criteria driven pytest + eval harness report NFR- M04 API contract versioning All schemas versioned; non-breaking upgrades via schema version field _ Contract test suite 2.3 Out of Scope (Explicit Exclusions) Sub-millisecond event ingestion — CLS++ writes are synchronous; Kafka-speed ingestion requires a streaming layer outside this system Replacing vector databases — CLS++ uses a vector store (FAISS/pgvector) internally; it is not a replacement for Pinecone or Weaviate Foundation model pretraining — CLS++ is an inference-time memory system; it does not modify model weights (LoRA adapters are a future extension, Section 12) Mobile or on-device deployment — Server-side only; edge/federated deployment is future work (Section 18 of the research paper) A/B testing infrastructure — Plasticity coefficient tuning is manual; automated experimentation framework is out of scope
- System Design 3.1 Architecture Overview ┌──────────────────────────────────────────────────┐ │ Host Machine │ │ │ │ ┌──────────────┐ launchd daemon │ │ │ backup.sh ├──► every 2 min │ │ │ (pg_dump + │ Deep Recess Parquet → S3 │ │ │ MinIO sync) │ │ │ └──────┬───────┘ │ │ │ │ ┌──────────┐ │ ▼ │ │ Client │ │ ┌──────────────────────────────────────────┐ │ │ (API / │ │ │ Docker Compose Network │ │ │ SDK) ├──────────►│ │ │ │ └──────────┘ │ │ ┌────────────┐ ┌────────────────────┐ │ │ │ │ │ PostgreSQL │ │ MinIO │ │ │ │ │ │ :5434 │ │ :9000/:9001 │ │ │ │ │ │ (stores L0, │ │ (Deep Recess │ │ │ │ │ │ L1, L2 + │ │ Parquet + VQ │ │ │ │ │ │ metadata) │ │ blobs) │ │ │ │ │ └──────┬──────┘ └──────────┬─────────┘ │ │ │ │ │ │ │ │ │ │ ▼ ▼ │ │ │ │ ┌────────────────────────────────────┐ │ │ │ │ │ CLS++ Core Service │ │ │ │ │ │ :8080 │ │ │ │ │ │ │ │ │ │ │ │ ┌──────────┐ ┌───────────────┐ │ │ │ │ │ │ │ Working │ │ Plasticity │ │ │ │ │ │ │ │ Buffer │ │ Engine │ │ │ │ │ │ │ │ (L0) │ │ │ │ │ │ │ │ │ └──────────┘ └───────────────┘ │ │ │ │ │ │ │ │ │ │ │ │ ┌──────────┐ ┌───────────────┐ │ │ │ │ │ │ │ Indexing │ │ Schema Graph │ │ │ │ │ │ │ │ Store(L1) │ │ (L2) │ │ │ │ │ │ │ └──────────┘ └───────────────┘ │ │ │ │ │ │ │ │ │ │ │ │ ┌──────────┐ ┌───────────────┐ │ │ │ │ │ │ │ Deep │ │ Reconsolidat. │ │ │ │ │ │ │ │ Recess │ │ Gate │ │ │ │ │ │ │ │ (L3) │ │ │ │ │ │ │ │ │ └──────────┘ └───────────────┘ │ │ │ │ │ │ │ │ │ │ │ │ ┌──────────────────────────────┐ │ │ │ │ │ │ │ Sleep Orchestrator (cron) │ │ │ │ │ │ │ └──────────────────────────────┘ │ │ │ │ │ └────────────────────────────────────┘ │ │ │ │ │ │ │ │ ┌──────────────┐ ┌──────────────────┐ │ │ │ │ │ Embedding │ │ Directory Svc │ │ │ │ │ │ Service │ │ (etcd/consul) │ │ │ │ │ │ :8081 │ │ :2379 │ │ │ │ │ └──────────────┘ └──────────────────┘ │ │ │ │ │ │ │ │ ┌──────────────────────────────────────┐ │ │ │ │ │ Dashboard :3000 │ │ │ │ │ │ (Streamlit / Next.js) │ │ │ │ │ └──────────────────────────────────────┘ │ │ │ └────────────────────────────────────────────┘ │ └──────────────────────────────────────────────────┘ Service Inventory: Service Port Technology Role Healthcheck cls-core 8080 FastAPI (Rust core
- Python wrapper) All four stores, plasticity engine, reconsolidation gate, sleep orchestrator curl /v1/memory/health every 10s embedding- svc 8081 FastAPI + sentence- transformers (or OpenAI) Deterministic embedding generation; gRPC or HTTP curl /health every 10s; self-similarity probe postgres 5434→5432 PostgreSQL 16 + pgvector L0 snapshot, L1 episodic metadata + vectors, L2 schema graph edges, engram lineage pg_ isready every 10s minio 9000/9001 MinIO (S3- compatible) L3 Deep Recess: Parquet files
- VQ blob store Built-in etcd 2379 etcd 3.5 Directory service: shard maps, namespace ACLs, etcdctl endpoint health every 30s Service Port Technology Role Healthcheck routing table dashboard 3000 Streamlit or Next.js Memory health, LHRA/DHR metrics, sleep reports, promotion charts HTTP 200 on / 3.2 Four-Store Architecture Each store is a distinct persistence and retrieval layer. Data moves forward only through the promotion pipeline — never backward. The stores form a temporal gradient from volatile (L0) to permanent (L3). User Input / Agent Event │ ▼ ┌─────────────────────────┐ VOLATILE │ L0 — Working Buffer │ Capacity: 4,096 tokens │ Prefrontal Cortex model │ Eviction: TTL (seconds–minutes) │ Data: raw tokens │ Storage: in-process deque + Redis snapshot └──────────┬──────────────┘ │ TTL expiry or explicit flush ▼ ┌─────────────────────────┐ SHORT-TERM EPISODIC │ L1 — Indexing Store │ Capacity: weeks of episodes │ Hippocampus model │ Eviction: decay + sleep pruning │ Data: embeddings + meta │ Storage: FAISS / pgvector + SQLite/PG metadata └──────────┬──────────────┘ │ Score_promo > 1.5 (plasticity gate) ▼ ┌─────────────────────────┐ LONG-TERM SEMANTIC │ L2 — Schema Graph │ Capacity: unbounded (degree-capped) │ Neocortex model │ Eviction: edge weight decay + sparsification │ Data: concept nodes, │ Storage: Neo4j / Memgraph / custom KV-graph │ typed edges │ └──────────┬──────────────┘ │ confidence ≥ 0.85 AND usage_days ≥ 5 AND conflict < 0.2 ▼ ┌─────────────────────────┐ PERMANENT │ L3 — Deep Recess │ Capacity: unlimited (append-only) │ Thalamus + BG model │ Eviction: never (versioning only) │ Data: checksummed │ Storage: MinIO Parquet + VQ compression │ engrams │ └─────────────────────────┘ 3.3 Plasticity Engine The plasticity engine is a stateless scorer. It computes a promotion score for every memory item using five biologically-grounded signals. Promotion formula: Score_promo = α·S + β·log(1+U) + γ·A − λ·C + δ·Δ Default coefficients: α = 1.0 (salience — how much this event stood out) β = 0.3 (usage — logarithmic: frequent recall strengthens) γ = 0.5 (authority — how trustworthy the source is) λ = 0.7 (conflict — penalizes contradictory facts) δ = 0.4 (surprise — novelty accelerates reinforcement) Promotion threshold: Score_promo > 1.5 → L1 → L2 Engram threshold: Score_promo > 2.2 AND confidence ≥ 0.85 → L2 → L3 Decay function: S_t = S_0 · e^(−k · t) where k = decay constant (tunable per namespace) t = days since last reinforcement Prune / archive when S_t < 0.2 Reinforcement (usage boost): S_{t+1} = S_t + η·(1 − S_t) where η = reinforcement speed (default 0.05) High-Δ items use η_boost = η · (1 + Δ) 3.4 Reconsolidation Gate When new information conflicts with an existing engram, the gate manages belief revision safely — archiving the old before engraving the new, and only when a quorum of evidence supports the update. NEW FACT ARRIVES │ ▼ similarity(new, old) > τ_s = 0.7? │ YES │ ▼ NO → STORE AS NEW MEMORY conflict(new, old) > τ_c = 0.3? │ YES │ ▼ NO → MERGE (minor update) Q = Σ(w_i · confidence_i) ≥ 0.8? │ YES │ ▼ NO → REJECT UPDATE (keep old) archive(old) ← versioned, never deleted engrave(new) ← checksum, new version number update lineage table ← temporal history preserved return "replaced" Gate states: State Description Transition Stable Engram frozen, no modification allowed New conflicting evidence → Labile Labile Engram opened for review, accepting evidence Quorum met → Re-stabilized; Quorum unmet → back to Stable Re- stabilized New engram engraved, old archived No further transition 3.5 Sleep Cycle The sleep cycle is a scheduled background job that maintains memory hygiene. It runs nightly in four phases, bounded by configurable token budgets. It is fully idempotent — safe to re-run after crash via checkpointing every 1,000 items. 02:00 IST — Sleep cycle begins │ │ │ │ ├─ Phase N1 (10 min): RANK │ Rank all L1 items by (salience × usage × authority) │ Identify top 20% (strengthen) and bottom 40% (decay/prune) ├─ Phase N2 (15 min): STRENGTHEN + DECAY │ Strengthen high-rank: confidence += η·(1 − confidence) │ Decay low-rank: salience *= e^(−k·Δt) │ Prune if salience < 0.2 AND usage_days > 30 ├─ Phase N3 (10 min): DEDUPLICATE │ Merge L1 items with cosine similarity > 0.92 │ embedding_merged = Σ(w_i · emb_i) / Σ(w_i) │ Retain highest-confidence item; archive duplicates ├─ Phase REM (15 min): CONSOLIDATE + DREAM │ Identify engram candidates: confidence ≥ 0.85 AND permanence=False │ Engrave to Deep Recess (Parquet + checksum) │ Dream replay: re-embed top-20 high-salience episodes │ Update Schema Graph edges for replayed items │ └─ 03:00 IST — Morning report emitted ├── reinforced: N items ├── pruned: N items ├── deduped: N items ├── engraved: N items (new Deep Recess engrams) └── health: composite stability score Maintenance schedule: Phase Time Duration Function N1 — Rank 02:00 10 min Score all L1 items; sort for strengthen/prune N2 — Strengthen + Decay 02:10 15 min Confidence boost for strong; exponential decay for weak; prune below threshold N3 — Deduplicate 02:25 10 min Cosine merge (threshold 0.92); weighted embedding average REM — Consolidate + Dream 02:35 15 min Engram formation; dream re-embedding; schema edge update Report 02:50 5 min Morning report; audit log write; health score update Idle 02:55 — Cycle complete; system resumes normal operation 3.6 APIs CLS++ Core API ( /v1/ ) Endpoint Method Auth Description /memory/write POST Bearer Accept text + meta, embed, insert to L1, compute plasticity, optionally promote to L2 /memory/read POST Bearer Semantic query across all stores; schema expansion; conflict labeling; provenance in response /memory/sleep POST Bearer (admin) Trigger nightly maintenance with budget constraints /memory/adjudicate _ conflict POST Bearer Submit conflicting fact + evidence list; reconsolidation gate returns merge/replaced/rejected /memory/item/{id} GET Bearer Full item with all versions, checksums, and lineage /memory/health GET Bearer Composite health score + per-store metrics /memory/export GET Bearer Full tenant memory export (JSON, RTBF-safe) Endpoint Method Auth Description /memory/forget DELETE Bearer RTBF: tombstone all items for tenant; queue compaction Write request schema: json { "text": "Raj prefers diagram-first explanations.", "meta": { "subject": "raj", "namespace": "learning_style", "source": "user", "authority": 0.9, "consent": "explicit", "salience": 0.8, "surprise": 0.4, "tags": ["preference", "pedagogy"] } } Read request schema: json { "query": "How should I explain backpropagation to Raj?", "options": { "top_k": 5, "hops": 2, "namespaces": ["learning_style"], "require_consent": true, "return_lineage": true } } Read response schema: json { "answers": [ { "text": "Use diagram-first, then short text, then runnable code.", "label": "current", "confidence": 0.92, "provenance": [ {"store": "engram", "id": "e107", "version": 2}, {"store": "schema", "id": "edge_raj.prefers->diagrams_first"}, {"store": "indexing","id": "mem_9f2c"} ] } ], "latency_ms": 87 } 3.7 Data Model and Storage PostgreSQL Schema Extensions: uuid-ossp, pgvector , pg_ trgm ┌──────────────────────┐ ┌────────────────────────────────┐ │ memory_items │ │ schema_graph_nodes │ │ ├── id (UUID PK) │ │ ├── id (UUID PK) │ │ ├── tenant_id │ │ ├── tenant_id │ │ ├── namespace │ │ ├── namespace │ │ ├── content (TEXT) │ │ ├── concept (TEXT) │ │ ├── vector (1536-d) │ │ ├── data (JSONB) │ │ ├── salience (FLOAT) │ │ └── updated_at │ │ ├── usage (INT) │ └─────────────┬──────────────────┘ │ ├── authority(FLOAT) │ │ │ ├── conflict (FLOAT) │ ┌─────────────▼──────────────────┐ │ ├── confidence │ │ schema_graph_edges │ │ ├── permanence(BOOL) │ │ ├── src (TEXT, FK nodes) │ │ ├── consent (TEXT) │ │ ├── rel (TEXT) │ │ ├── source (TEXT) │ │ ├── dst (TEXT, FK nodes) │ │ ├── last_use (FLOAT) │ │ ├── weight (FLOAT) │ │ └── created_at │ │ ├── tenant_id │ └──────────────────────┘ │ └── updated_at │ └────────────────────────────────┘ ┌──────────────────────┐ ┌────────────────────────────────┐ │ engram_lineage │ │ reconsolidation_log │ │ ├── id (UUID PK) │ │ ├── id (UUID PK) │ │ ├── tenant_id │ │ ├── old_id (UUID) │ │ ├── subject_key │ │ ├── new_id (UUID) │ │ ├── current_version │ │ ├── decision (TEXT) │ │ ├── versions (JSONB) │ │ ├── quorum_score (FLOAT) │ │ │ [{version, ts, │ │ ├── evidence (JSONB) │ │ │ checksum, │ │ └── decided_at │ │ │ parquet_key}] │ └────────────────────────────────┘ │ └── namespace │ └──────────────────────┘ ┌────────────────────────────────┐ │ sleep_cycle_reports │ ┌──────────────────────┐ │ ├── id (UUID PK) │ │ consent_tags │ │ ├── started_at │ │ ├── item_id (FK) │ │ ├── completed_at │ │ ├── consent (TEXT) │ │ ├── reinforced (INT) │ │ ├── tenant_id │ │ ├── pruned (INT) │ │ └── updated_at │ │ ├── deduped (INT) │ └──────────────────────┘ │ ├── engraved (INT) │ │ └── health_score (FLOAT) │ └────────────────────────────────┘ MinIO Parquet Layout (Deep Recess) s3://cls-deep-recess/ {tenant_id}/ {namespace}/ {YYYY}/ {MM}/ {DD}/ *.parquet *.bloom ← columnar engrams, VQ-compressed vectors ← Bloom filter for existence checks manifest.json ← partition summary + min/max stats
- Capacity Planning and Sizing 4.1 Traffic Estimates Scale Users Memories/user Total Items Reads/sec Writes/sec Sleep Items/night Dev 1 100K 100K 5 1 2,000 Small Prod 1,000 500K 500M 5,000 1,000 2M Medium Prod 100K 1M 100B 500K 100K 200M Large Prod 10M 1M 10T 50M 10M 20B 4.2 Compute Sizing Component Dev (1 node) Small Prod (1K users) Medium Prod (100K users) cls-core 2 vCPU / 4GB RAM 8 vCPU / 32GB RAM × 4 32 vCPU / 128GB RAM × 50 embedding-svc 1 vCPU / 2GB RAM GPU (T4) × 2 GPU (L4) × 20 Component Dev (1 node) Small Prod (1K users) Medium Prod (100K users) postgres + pgvector 2 vCPU / 8GB RAM 32 vCPU / 128GB NVMe × 8 64 vCPU / 256GB NVMe × 100 minio 1 vCPU / 2GB RAM Object storage cluster S3-compatible (managed) etcd 1 vCPU / 1GB RAM 3-node HA cluster 5-node HA cluster Total RAM ~17GB ~640GB ~12.8TB 4.3 Storage Sizing Data Type Per User 1K Users 100K Users Growth Rate L1 Indexing (pgvector, fp32) 6GB (1M × 1536- d) 6TB 600TB ~100MB/user/month L1 with PQ (8-bit) 600MB 600GB 60TB ~10MB/user/month L2 Schema Graph 500MB 500GB 50TB ~10MB/user/month Data Type Per User 1K Users 100K Users Growth Rate L3 Deep Recess (Parquet + VQ) 600MB 600GB 60TB ~5MB/user/month (compressed) PostgreSQL metadata 200MB 200GB 20TB ~5MB/user/month Total (with compression) ~2GB ~2TB ~200TB 4.4 Sharding Strategy Store Partition Key Shard Count (1K users) Notes L1 Indexing tenant _ id → namespace → hash(id) 8–16 shards IVF-PQ index per shard; BM25 pre-filter before ANN L2 Schema Graph tenant _ id → community detection 4–8 partitions METIS/Leiden community partition; min-cut across social graph structure Store Partition Key Shard Count Notes (1K users) L3 Deep Recess tenant _ id/namespace/YYYY/MM/DD S3 prefix routing Parquet partition pruning; Bloom filters per file Routing etcd directory service N/A Consistent hash; TTL cache at API tier 4.5 Cost Estimate Resource Dev (1 user) Small Prod (1K users/mo) Notes Compute (EC2/GKE equivalent) $0 (local) ~$8,000/mo CPU-only for core; GPU for embedding Storage (S3 + block) $0 (local) ~$500/mo Compressed Parquet; PQ vectors Embedding API (if OpenAI) ~$5/mo ~$500/mo Replace with local all-MiniLM-L6-v2 to reduce ~90% Resource Dev (1 user) Small Prod (1K users/mo) Notes Backup storage $0 (local) ~$100/mo Compressed pg_dump + Parquet snapshot Total ~$5/mo ~$9,100/mo Self-hosted embedding eliminates largest variable cost
- Resilience and Fault Tolerance 5.1 Timeout Configuration Dependency Connect Timeout Read Timeout Total Timeout Retry? Embedding Service 3s 10s 13s Yes (1 retry; fallback to local model) PostgreSQL (pgvector) 5s 30s 35s Yes (reads safe; writes idempotent with UUID) Dependency Connect Timeout Read Timeout Total Timeout Retry? MinIO (Parquet write) 5s 30s 35s Yes (append-only; duplicate check via Bloom) etcd (directory) 2s 5s 7s Yes (cached routing tolerates staleness) Sleep cycle (per- item) N/A 200ms 200ms Skip + checkpoint (bounded work per item) Conflict scoring N/A 200ms 200ms Skip (conflict defaults to 0.0 on timeout) 5.2 Retry Strategy Operation Max Backoff Dead Letter Retries Memory write (L1 insert) 3 0 → 1s → 5s Write to failed _ writes table; retry on next Doctor cycle Engram write (L3) 5 0 → 1s → 5s → 30s → 60s Halt sleep cycle item; checkpoint; alert Conflict adjudication 0 N/A Return rejected with error; caller retries via API Sleep cycle item 1 (skip on fail) N/A Logged in morning report; retried next night Embedding (external API) 2 Immediate → 2s Fallback to local model (all-MiniLM-L6-v2) 5.3 Circuit Breaker Configuration Condition Type Auto- Close Effect Embedding service down Normal (300s) Yes Fallback to local model; no write block PostgreSQL unavailable Normal (600s) Yes All writes buffered in Working Buffer; reads return L3-only results MinIO unavailable Normal (600s) Yes Engram writes queued in DB; flushed on recovery Backup daemon down > 10 min Emergency No Halt ALL writes; alert; only reset _ emergency() clears etcd split-brain Soft degradation N/A Use TTL-cached routing; accept stale shard maps for 60s 5.4 Failure Modes and Degradation Failure Scenario Impact Detection Automatic Mitigation pgvector index corruption L1 kNN fails; queries return empty Health check probes kNN daily Serve L2/L3 only; flag degraded in health response Sleep cycle crash Some memories not consolidated; no data loss Checkpoint gap detected on restart Resume from last checkpoint; partial report emitted Deep Recess checksum mismatch Potentially corrupted engram SHA-256 verified on every read Block read; return 500; alert Schema Graph fan- out explosion Graph traversal OOM Degree monitor in health check Degree cap enforced; evict lowest-weight edges Recovery Rebuild index from stored vectors (30– 60 min job) Complete next night automatically Restore from last Parquet snapshot Sparsification job in next sleep cycle Failure Scenario Impact Detection Automatic Mitigation Recovery Reconsolidation race Two conflicting updates simultaneously Per-subject FIFO queue (Kafka/SQS) Serialized per subject_key; second update waits Auto-resolved via queue ordering 5.5 Fault Tolerance Architecture — 4-Layer Defense Layer 1: Docker restart: unless-stopped Layer 2: Sleep cycle checkpointing Layer 3: Reconsolidation evidence quorum sources) Layer 4: Deep Recess immutability backup) (container crash → auto-restart in < 10s) (crash mid-sleep → resume from checkpoint on next run) (bad input → quorum gate rejects unless confirmed by multiple (engram corruption → checksum catches before serving; restore from
- Observability and Operations 6.1 SLIs, SLOs, and SLAs SLI SLO Target Measurement Method Error Budget Read P95 latency < 120ms at 100K items Ring buffer with per-query RTT 5% reads may exceed (P95 definition) Write P95 latency < 150ms Ring buffer with per-write RTT 5% writes may exceed Adjudication P95 < 250ms Per-adjudication RTT 5% Long-Horizon Recall (LHRA 14d) ≥ 0.90 Weekly probe harness 10% recall degradation allowed Drift Resistance (DR) ≥ 0.98 Weekly adversarial injection ≤ 2% false adoption test Correction Persistence (CP) ≥ 0.95 Post-reconsolidation probe (d+1, d+7, d+14) 5% regression allowed SLI SLO Target Measurement Method Error Budget Didn't-Have-to-Repeat (DHR) ≥ 0.85 User repetition tracking across sessions 15% repetition allowed Sleep cycle completion ≥ 95% within 60 min budget Job execution timer 5% budget overruns Engram integrity (checksum) 0 failures Checksum mismatch counter 0 budget (any failure = alert) RTBF SLA 95% within 24h Deletion receipt timestamp delta 5% may take up to 48h 6.2 Latency Breakdown Component P50 Target P95 Target Measurement Optimization Lever Working Buffer read < 1ms < 3ms In-process timer In-memory deque; no I/O Component P50 Target P95 Target Measurement Optimization Lever L1 Indexing kNN (100K items) < 20ms < 50ms record query rtt() per call HNSW ef_search tuning; BM25 pre-filter L2 Schema traversal (2 hops) < 40ms < 80ms Per-traversal timer Ego-net Redis cache; degree cap L3 Deep Recess lookup < 10ms < 30ms Per-lookup timer Bloom filter eliminates miss reads; Parquet predicate pushdown Plasticity score < 0.5ms < 1ms Unit benchmark Stateless scorer; no I/O Conflict scoring < 80ms < 200ms Per-conflict timer Cosine similarity batch; cache hot comparisons Embedding generation < 10ms < 40ms record embed _ _ rtt() Local model (all-MiniLM) vs API; batched encode End-to-end read (all stores) < 60ms < 120ms API middleware RTT Parallel async retrieval across stores 6.3 Dashboards Dashboard Audience Key Panels Data Source Memory Health Operator L0 buffer fill %, L1 item count + kNN P95, L2 node/edge count + degree distribution, L3 engram count + checksum pass rate, overall health score /v1/memory/health Quality Metrics Developer LHRA by lag bucket (7d/14d/30d), DHR trend, DR gauge, CP trend, promotion rate, conflict rate Eval harness outputs + ring buffers Sleep Reports Operator Per-cycle report: reinforced/pruned/deduped/engraved counts, cycle duration, budget adherence, health delta sleep_ cycle _ reports table Performance Developer Read/write P50/P95/P99 latency, embedding RTT, schema traversal RTT, error rate by store Ring buffers: request _ latency, embed _ rtt , graph _ rtt Dashboard Audience Key Panels Data Source Reconsolidation Developer Decision distribution (merge/replaced/rejected), quorum score histogram, conflict rate by namespace reconsolidation _ log table Privacy + Integrity Security RTBF pending/completed/failed, consent violation count (must be 0), checksum failure count (must be 0), deletion receipts issued Audit tables 6.3.1 Health Indicators The health endpoint returns a composite score and per-dimension breakdown. The dashboard follows exception-based notification: silence = healthy. Indicator Healthy Warning Critical LHRA 14d ≥ 0.90 0.85–0.90 < 0.85 Drift Resistance ≥ 0.98 0.95–0.98 < 0.95 Read P95 < 120ms 120–200ms > 200ms Checksum failures 0 N/A Any Sleep cycle overrun Never 1–2× budget > 2× budget Consent violations 0 N/A Any 6.4 Alarms and Alerting Alarm Condition Severity Auto-Action LHRA degradation 14d recall drops > 5% week-over- week WARNING Alert; increase reinforcement η in next sleep cycle Alarm Condition Severity Auto-Action Drift attack DR drops below 0.95 CRITICAL Raise quorum threshold to 0.9; alert operator Checksum mismatch Any engram fails SHA-256 check CRITICAL Block reads from affected Parquet partition; restore from backup Sleep budget breach Cycle exceeds 60 min WARNING Emit partial report; carry remainder to next night Schema graph fan-out Max node degree > cap (default 256) WARNING Trigger sparsification in next sleep; cap new edge writes Consent violation Any read returns none -tagged item with require consent=true _ CRITICAL Halt tenant reads; audit log; alert Backup daemon down Backup log freshness > 10 min CRITICAL (Emergency) Trip emergency circuit breaker; halt all writes Alarm Condition Severity Auto-Action Embedding P95 > 40ms for 5 consecutive WARNING Switch to local fallback model; SLA breach minutes alert 6.5 Logging and Tracing Structured prefixes: [WRITE] — Every memory write with plasticity score, promotion decision, tenant/namespace [READ] — Every read with query, stores hit, top-k results, latency per store [SLEEP] — Per-phase timing, item counts, checkpoint progress [RECONSOLIDATE] — Conflict score, quorum, decision, old/new versions [ENGRAM] — Formation event with checksum, confidence, usage_days [CIRCUIT BREAKER] — Trip and reset events with reason [RTBF] — Deletion requests with tenant, item counts, receipt hash Audit trail ( memory_ audit _ log table): Every write, read, adjudication, engram formation, reconsolidation, and RTBF event logged with: action_type, tenant_id, namespace, item_id, input_summary (no PII), decision, latency_ms, error (if any) Log levels: ERROR: Checksum failure, consent violation, reconsolidation corruption WARNING: Sleep budget overrun, graph fan-out, LHRA degradation INFO: Engram formation, reconsolidation decisions, sleep cycle milestones DEBUG: Per-item plasticity scores, embedding RTTs (disabled in production)
- Testing Strategy 7.1 Test Pyramid Test Type Scope File Execution Pass Criteria Count Unit Tests — Plasticity score(), decay(), boost() , coefficient tuning 8 Every commit, < 1 min Mathematical parity with spec to 1e-6; promotion at correct thresholds Unit Tests — Engram Formation Formation trigger logic, checksum, append-only guard 6 Every commit, < 1 min Correct formation on Day 5 simulation; no write without checksum Unit Tests — Reconsolidation Gate Conflict scoring, quorum, archive/engrave, lineage 8 Every commit, < 1 min Wrong-update rate = 0 on labeled corpus; lineage always preserved Test Type Scope File Execution Pass Criteria Count Unit Tests — Sleep Cycle Phase-by-phase correctness, checkpoint/resume, idempotency 10 Every commit, < 2 min Gold set retention > 90%; noise reduction > 60%; idempotent re-run produces identical output Integration Tests — API All 8 endpoints; schema validation; error codes 8 Every PR, < 5 min 100% pass; contract tests never break Integration Tests — Store Real DB write/read round- trips for all four stores 12 Every PR, < 5 min 100% pass; in-memory SQLite for L1/L2; MinIO mock for L3 Integration Tests — Consent + RTBF Consent enforcement, tombstone propagation, deletion receipt 6 Every deploy (blocking) Zero consent violations; RTBF receipt issued within test harness window Test Type Scope File Execution Pass Criteria Count Integration Tests — Namespace isolation Cross-namespace read returns empty; reconsolidation stays scoped 5 Every deploy (blocking) Zero bleed across namespaces Evaluation — LHRA Harness Seed K facts; probe at 7d/14d/30d lag with paraphrases 1 harness Weekly (automated) 7d ≥ 0.92; 14d ≥ 0.90; 30d ≥ 0.85 Evaluation — Drift Resistance Plant N truths; inject M conflicts (low authority); measure false adoption 1 harness Weekly DR ≥ 0.98 (≤ 2% false adoptions per 500 injections) Evaluation — Correction Persistence Reconsolidate fact; probe at d+1, d+7, d+14 under noise 1 harness Weekly CP ≥ 0.95 Test Type Scope File Execution Pass Criteria Count Load Tests 200 RPS mixed (70% read, 20% write, 10% adjudicate) at 100K items 1 Monthly No deadlocks; P95 within SLO; no memory leaks Chaos Tests Kill stores mid-query; corrupt one Parquet partition; split-brain etcd 1 checklist Weekly No data loss; graceful degradation; recovery within RTO 7.2 Test Infrastructure L1 (Indexing Store) mock: In-process FAISS Flat index; cosine similarity function; deterministic on fixed seed vectors L2 (Schema Graph) mock: In-process dict-based adjacency list; weight decay applied correctly L3 (Deep Recess) mock: In-memory dict; SHA-256 checksums computed and verified Embedding mock: Deterministic function seeded by hash of input text; cosine self-similarity always 1.0 Plasticity test fixtures: make _ memory(S, U, A, C, Δ) factory with all signals configurable Reconsolidation fixtures: make conflict _ _ case(old _ fact, new _ fact, evidence _ list, expected _ decision) corpus of 50 labeled cases Sleep cycle harness: Seeds known memory set; runs all four phases; asserts exact output counts and checksums
- DORA Metrics and Engineering Excellence 8.1 DORA Metrics Metric Target How Measured Deployment Frequency Weekly (main branch deploy on green CI) CI/CD deploy timestamp Lead Time for Change < 2 days (single developer, disciplined PR flow) Commit → deploy delta Change Failure Rate < 5% (all acceptance gates must pass) Eval harness failures post-deploy Metric Target How Measured MTTR < 15 min (auto-heal via circuit breaker + checkpoint resume) Incident detection → health restored timestamp 8.2 Memory-Domain Quality Metrics Metric Symbol Formula Target Long-Horizon Recall Accuracy LHRA(Δ) correct recalls at lag Δ / total queries at lag Δ 7d ≥ 0.92 / 14d ≥ 0.90 / 30d ≥ 0.85 Drift Resistance DR 1 − (false adoptions / total injections) Correction Persistence CP fraction of probes returning t₁ (updated truth) ≥ 0.98 ≥ 0.95 Didn't-Have-to-Repeat DHR 1 − (repetitions requested / eligible queries) ≥ 0.85 Metric Symbol Formula Target Compression Ratio CC raw bytes / compressed bytes (Deep Recess) ≥ 10:1 Memory System Health H (1/N) Σ(ω₁·conf_i + ω₂·stability_i) > 0.85 Promotion Rate PR items promoted / total items (daily) 5–10% (too high = threshold too low)
- Alternatives Considered
Decision Chosen Alternatives Rationale
1 L1 vector store pgvector (PostgreSQL) for dev; FAISS/HNSW shards for prod Pinecone, Qdrant, Weaviate, Milvus For single-tenant dev, pgvector runs in the same PostgreSQL instance with zero additional infrastructure. For production, FAISS/HNSW shards on NVMe provide controlled latency. Managed services (Pinecone) add external API dependency,
Decision Chosen Alternatives Rationale
egress cost, and no meaningful advantage below 100M vectors. Accept: self-managed index rebuild on hardware failure; must tune ef_search/nprobe for SLO. 2 L2 graph store Neo4j (dev/small prod); custom KV-graph (large prod) Amazon Neptune, TigerGraph, ArangoDB, in- memory NetworkX Neo4j is the most mature graph DB with Cypher query language, GDS algorithms, and simple Docker deployment. At large scale, a custom KV-graph (edge list per node, Redis-cached hot ego-nets) outperforms Neo4j on write throughput. Accept: two codepaths for dev vs prod; graph partitioning requires METIS/Leiden at scale. 3 L3 archival format Parquet + Vector Quantization (PyArrow
- Faiss PQ) HDF5, LanceDB, custom binary Parquet is columnar, supports predicate pushdown (partition pruning by date/namespace), compresses well with ZSTD (text) and PQ (vectors). MinIO
Decision Chosen Alternatives Rationale
provides S3-compatible object store with no cloud dependency. LanceDB is promising but less mature. Accept: no random-access update (append-only by design); PQ is lossy (calibrated to < 2% recall drop). 4 Conflict scoring Cosine distance + semantic overlap (two- stage) Pure embedding cosine, LLM- as-judge, n- gram overlap only Pure cosine misses semantic contradictions that are lexically distant ("the meeting is at 9 AM" vs "the meeting is at 3 PM" — these are nearly orthogonal vectors but clearly contradictory). LLM-as-judge is accurate but adds 2–5s latency per conflict check, making real-time write paths impractical. Two-stage (fast cosine pre-filter + semantic overlap for high-similarity pairs) hits > 95% accuracy at < 200ms. Accept: some false negatives on indirect contradiction;
Decision Chosen Alternatives Rationale
mitigated by reconsolidation quorum requirement. 5 Sleep cycle scheduling Single nightly job (cron, 02:00 IST) Continuous background process, event- driven compaction, weekly only Continuous compaction creates write contention during peak hours. Event-driven compaction (trigger on every N writes) adds operational complexity and unpredictable load spikes. Weekly-only misses the rhythm of daily reinforcement, causing memory drift. Nightly aligns with biological sleep: same period of low user activity, predictable resource budget. Accept: 24-hour lag for newly-promoted engrams; acceptable for executive search cadence. 6 Evidence quorum mechanism Weighted sum Q = Σ(w_i · confidence_i) ≥ 0.8 Majority vote (N sources agree), single Majority vote is brittle at low evidence counts. Single override is too easy to game (one authoritative-looking source can flip
Decision Chosen Alternatives Rationale
high-authority override, Bayesian update any belief). Bayesian update is mathematically elegant but requires a prior distribution per fact (impractical at scale). Weighted sum provides a middle path: multiple independent sources, each weighted by their domain authority, must collectively reach quorum. Accept: quorum threshold is a hyperparameter that requires domain tuning; medical and financial namespaces should use higher thresholds (0.9). 7 API design Mutation only through /adjudicate _ conflict ; Deep Recess append- only Direct PATCH on memory items, event- sourcing with replay Direct PATCH makes audit impossible — you lose the history of why a belief changed. Full event-sourcing with replay is architecturally pure but adds significant operational complexity (replay time grows unboundedly). Append-only Deep Recess + adjudication-gated mutation gives the audit
Decision Chosen Alternatives Rationale
trail of event-sourcing with the operational simplicity of a mutable store for L1/L2. Accept: older versions occupy storage permanently; cold storage tiering required at scale. 10. Cross-Cutting Concerns 10.1 Security Authentication model: Client → CLS++ API: OAuth2 Bearer (m2m or user), HS256 / RS256 API → Internal stores: Internal Docker network (trusted zone); mTLS optional for production API → MinIO: S3 access keys via environment variables (KMS-rotated) API → etcd: Client certificate authentication Secrets management: All API keys and credentials in environment variables, never in source or images Per-tenant AES-GCM encryption keys in KMS; key ID stored per tenant in tenant _ config table LoRA adapter training data: de-identified before leaving the memory pipeline; training job runs in isolated VM Input validation: All memory writes: text length capped (default 32KB), consent tag required, authority must be in [0,1] Namespace strings: alphanumeric + underscore only; max 64 characters; no path traversal Evidence lists for adjudication: max 10 items; each confidence in [0,1]; source_weight in [0,1] Threat model: Memory poisoning: addressed by authority weighting + quorum gates (low-authority sources cannot flip engrams alone) Cross-tenant read: addressed by RLS policies on all PostgreSQL tables + integration test suite Timing attacks on conflict scoring: non-constant-time cosine similarity; acceptable given non- security-critical nature of confidence values 10.2 Privacy Consent model: Tag Meaning Read enforcement Sleep behavior explicit User actively consented to this memory Returned on all reads Merged freely with same-tag items implicit Inferred from behavior (e.g., usage pattern) Returned only when caller opts in Only merged with or explicit implicit none System-generated or unverified Returns 403 when require consent=true _ Isolated; never merged with consented items Right-to-Forget (RTBF) flow:
- DELETE /v1/memory/forget → tombstone all L1, L2, L3 records for tenant
- Bloom filters updated within 1 compaction window
- Parquet compaction job scheduled within 24h; old partition files replaced with scrubbed versions
- Deletion receipt = SHA-256(sorted list of deleted item IDs) returned to caller
- Backup daemon scrubs tenant from next backup within 48h Ethical design principles: Forgetting is a feature: Items with S_t < 0.2 are pruned. Perfect recall is not the goal. Right to cognitive dignity: Users control what is remembered (consent tags) and can erase it (RTBF). No silent mutations: Every belief change creates a version record. Nothing is overwritten without evidence. Audit-ready history: Reconsolidation log is permanent. You can always answer: "Why does the system believe this?" 10.3 Cost Management Resource Dev Small Prod (1K users) Optimization Compute $0 ~$8,000/mo (local) ARM Graviton instances: −40% power; spot for sleep jobs: −60% Storage (S3/MinIO) $0 (local) ~$500/mo PQ compression (10:1) already applied; cold tier for > 180-day Parquet Embedding API ~$5/mo ~$500/mo Self-hosted all-MiniLM-L6-v2: eliminates API cost entirely Total ~$5/mo ~$9,100/mo With all optimizations: ~$5,000/mo at 1K users
- Implementation Plan Phase 1: Foundation — Four Stores + Plasticity (Weeks 1–6) Deliverable Key Files Exit Criteria Monorepo + CI scaffold Cargo.toml , pyproject.toml , .github/workflows/ CI green in < 5 min; mypy + ruff pass; unit coverage ≥ 60% Working Buffer (L0) stores/working_ buffer.py Insert/evict O(1); no memory leak over 1h soak; snapshot API functional Embedding service services/embedding/main.py gRPC/HTTP up; P95 embed < 40ms; cosine self-similarity > 0.999 Indexing Store (L1) stores/indexing_ store.py (FAISS / pgvector) P95 kNN < 50ms at 100K items; metadata round-trip exact Schema Graph (L2) stores/schema graph.py (Neo4j adapter) 2K nodes + 10K edges; traversal P95 < 80ms; degree cap enforced Plasticity Engine core/plasticity.py Mathematical parity with spec to 1e-6; promotion at correct threshold; all unit tests pass Deliverable Key Files Exit Criteria Write API (L0 → L1 → L2) api/routes/memory.py POST /v1/memory/write end-to-end; promotion flag in response; 400 on missing consent 60 unit + integration tests tests/ All pass; plasticity, store, and API test suites green Phase 2: Deep Memory — L3, Engrams, Reconsolidation (Weeks 7–10) Deliverable Key Files Exit Criteria Deep Recess (L3) stores/deep recess.py Write P95 < 30ms; checksum verified on every read; Bloom filter operational Engram formation core/engram.py Formation triggered correctly on Day 5 simulation; checksum computed as spec; append-only confirmed Reconsolidation Gate core/reconsolidation.py Wrong-update rate = 0 on labeled corpus; lineage table populated on every replace Deliverable Key Files Exit Criteria Read API (all stores) api/routes/memory.py POST /v1/memory/read returns provenance from all 4 stores; conflict labeling ( current/historical/contested ) correct Adjudication API api/routes/memory.py POST /v1/memory/adjudicate conflict correct on all 50 _ labeled conflict cases Item lineage API api/routes/memory.py GET /v1/memory/item/{id} returns all versions with checksums 40 additional tests tests/ Total ≥ 100 tests; reconsolidation and engram suites green Phase 3: Sleep Cycle + Evaluation Harness (Weeks 11–14) Deliverable Key Files Exit Criteria Nightly sleep cycle (all phases) jobs/sleep_ cycle.py Retention > 90% gold set; noise reduction > 60%; completes within 60 min budget; idempotent re-run Deliverable Key Files Exit Criteria Dream replay jobs/sleep_ cycle.py Schema edge weights updated for replayed items; consistency score > 0.85 LHRA evaluation harness eval/lhra _ harness.py 7d ≥ 0.92; 14d ≥ 0.90; 30d ≥ 0.85 on seed corpus Drift resistance harness eval/drift _ harness.py DR ≥ 0.98 on 500 injections Correction persistence harness eval/cp_ harness.py CP ≥ 0.95 at d+14 Sleep + Health APIs api/routes/memory.py POST /v1/memory/sleep ; GET /v1/memory/health with composite score Dashboard v1 dashboard/ All 6 panels live; refreshes on 60s poll; morning report visible 30 additional tests + chaos suite tests/ Total ≥ 130 tests; all eval harnesses green Phase 4: Multi-Tenancy, Privacy, Security (Weeks 15–18) Deliverable Key Files Exit Criteria Multi-tenant RLS (PostgreSQL) Per-tenant encryption infra/postgres/schema.sql core/crypto.py Zero cross-tenant leakage in isolation test suite (6 tests, blocking) AES-GCM encryption on L3 content; key per tenant; rotation support Namespace ACLs core/namespace.py Cross-namespace reads return empty; reconsolidation scoped correctly RTBF implementation api/routes/memory.py Tombstone + compaction + deletion receipt issued; 95% within test-harness SLA Consent enforcement core/consent.py Zero violations in integration suite; sleep cycle respects consent tags Export API api/routes/memory.py Full tenant export as valid JSON in < 30s Deliverable Key Files Exit Criteria 20 security + privacy tests tests/ Total ≥ 150 tests; all security tests blocking on deploy Phase 5: Hardening, Load Test, v1 Ship (Weeks 19–20) Deliverable Key Files Exit Criteria Load test (200 RPS) tests/test _ load.py P95 read < 120ms; P95 write < 150ms; no deadlocks; no memory leaks Chaos test suite tests/chaos _ checklist.md Kill each store; corrupt Parquet partition; etcd split- brain; all recover within RTO Full evaluation run Eval harnesses LHRA 14d ≥ 0.90; DR ≥ 0.98; CP ≥ 0.95; DHR ≥ 0.85 Python SDK python/clspp/client.py write(), read(), sleep(), adjudicate() methods; full test coverage Deliverable Key Files Exit Criteria cURL smoke tests + OpenAPI spec api/openapi.yaml All endpoints documented; smoke tests pass from clean environment v1 evaluation report reports/clspp_ eval _ v1.md All acceptance gates met; baselines (stateless RAG, MemGPT) included for comparison
- Open Questions and Future Work Open Questions
Question Impact Owner Target Date
1 Should the Schema Graph use a mature graph DB (Neo4j/Memgraph) or a custom KV-graph at all scales? Operational complexity vs query flexibility Rajamohan Phase 1 start
Question Impact Owner Target Date
2 What is the right decay constant k per namespace? A learning_ style preference decays slower than a current mood state. _ Incorrect k causes premature pruning or stale beliefs Rajamohan Phase 1 3 Should dream replay generate synthetic paraphrase probes (generative replay) or only re-embed existing items? Generative replay improves robustness but risks injecting synthetic noise into the memory store Rajamohan Phase 3 4 At what point should we replace pgvector with a dedicated ANN engine (FAISS shards, Qdrant)? Currently sufficient at 1M vectors per tenant; break point estimated at ~50M vectors Rajamohan TBD (scale- triggered) 5 Should the authority coefficient γ be globally fixed or per-namespace tunable? Medical and legal namespaces require much higher authority thresholds than general preferences Rajamohan Phase 2 Future Work Roadmap
Enhancement Priority Complexity Dependencies
1 LLM Fine-Tuning Adapters (LoRA) — Per- namespace LoRA adapters trained on de-identified Schema Graph; selected at inference time P2 High Phase 5 complete; de- identification pipeline 2 Agent Framework Plugins — LangChain VectorStore , LlamaIndex BaseMemory , custom OpenDevin adapter P1 Medium Phase 5 complete; plugin spec 3 Value-Weighted Plasticity — Add reward channel R from task outcomes; Score += ξ·R P2 Low Phase 3 complete; outcome tracking 4 Federated CLS++ — On-device episodic store; federated plasticity coefficient aggregation without raw data sharing P3 Very High Mobile client; secure aggregation
Enhancement Priority Complexity Dependencies
5 Causal Memory Overlay — Causal DAG on Schema Graph; do-calculus for "what caused this outcome?" reasoning P3 High Phase 5 complete; domain with interventional data 6 Right-to-Forget Proofs (Cryptographic) — Merkle tree over live engram IDs; signed non- membership proofs after RTBF P2 Medium Phase 4 complete 7 Generative Dream Curriculum — During REM phase, generate counterfactual probes; test robustness without storing synthetic content P2 Medium Phase 3 complete 8 Kubernetes Migration — StatefulSets for L1/L2; Argo CronJobs for sleep; HPA on API tier P3 High > 10K users 13. Known Issues, Design Gaps, and QA Findings
Issue Severity Status Mitigation
1 Schema Graph unbounded growth — Without aggressive degree capping and periodic sparsification, the Schema Graph grows O(N²) in edge count as concepts accumulate. At 100K nodes and no cap, graph traversal becomes impractical HIGH Open Enforce hard degree cap (default 256 edges per node); run edge weight decay and PageRank-based sparsification on every sleep cycle; monitor degree distribution histogram in dashboard 2 Conflict scoring false negatives on indirect contradiction — Two facts that contradict each other indirectly (e.g., "Raj is in Bangalore" and "Raj's meeting is in London tomorrow") may score low cosine conflict because their embedding spaces are distant MEDIUM Open Add a second-pass LLM conflict check on items in the same subject namespace before engram formation; flag for manual review rather than automatic reconsolidation 3 Reconsolidation race on concurrent writes — Two different clients writing HIGH Open Implement per-subject-key FIFO queue (Kafka/SQS) for reconsolidation; all
Issue Severity Status Mitigation
conflicting facts for the same subject simultaneously may both pass the gate before either is archived, resulting in two conflicting engrams adjudications for a subject serialize through the queue. Second write waits until first reconsolidation is committed 4 Parquet partition scan on exact-ID lookup — The current Deep Recess layout uses date-partitioned Parquet files. A point lookup by engram ID requires scanning all partitions unless a separate manifest index is maintained MEDIUM Open Maintain a engram manifest table in _ PostgreSQL mapping id → parquet_key; Bloom filter for fast existence check before any scan. Manifest updated atomically with each engrave operation 5 Decay constant k is currently global — All memory items decay at the same rate regardless of namespace. A current- mood preference should decay in hours; a permanent preference like communication style should decay over months MEDIUM Open Add decay_ k field to namespace configuration table; plasticity engine reads per-namespace k at score time; default k preserved for backwards compatibility
Issue Severity Status Mitigation
6 Dream replay can strengthen wrong items — If high-salience items in L1 contain early misinformation (before reconsolidation corrects them), dream replay reinforces the wrong belief MEDIUM Open Dream replay only re-embeds items with permanence=False AND confidence < 0.85 ; confirmed engrams (L3) are never dream-replayed since they've already passed the formation gate 7 Sleep cycle has no partial progress visibility — If the cycle runs for 50 of 60 minutes and then is killed, there is no way to know which of the four phases completed LOW Open Emit a sleep_progress record to the sleep_ cycle _ reports table after each phase; checkpoint includes last completed phase; morning report shows partial progress on incomplete cycles
Issue Severity Status Mitigation
8 pgvector recall degrades silently under PQ compression — Product Quantization introduces recall error that grows as the compression ratio increases. At 8-bit PQ (16:1 compression), recall@10 may drop from 0.98 to 0.91 without triggering any alert MEDIUM Open Run daily recall@k canary probes using shadow queries with known nearest neighbors; alert when recall drops > 5% from baseline; reduce compression if recall SLO is threatened Appendix A.1 Biological Mapping Reference Brain Region CLS++ Module Function Computational Analogue Prefrontal Cortex Working Buffer (L0) Short-term reasoning, active context Token ring buffer / Redis Brain Region CLS++ Module Function Computational Analogue Hippocampus Indexing Store (L1) Fast episodic encoding and pattern completion FAISS/pgvector + metadata Neocortex Schema Graph (L2) Slow semantic integration, concept abstraction Neo4j / custom KV-graph Thalamus + Basal Ganglia Deep Recess (L3) Attention gating, stable long- term archive MinIO Parquet + VQ Dopamine system Salience signal (S) Reward prediction, importance weighting LLM salience classifier Noradrenaline system Surprise signal (Δ) Novelty detection, learning rate boost Deviation from schema prediction REM sleep Dream replay Schema synthesis, generalization Re-embedding + edge weight update NREM sleep (deep) Consolidation phase Synaptic pruning, noise reduction Decay + deduplication + engram formation Brain Region CLS++ Module Function Computational Analogue Long-Term Potentiation (LTP) Reinforcement boost Synapse strengthening on repeated activation S {t+1} = S _ t + η·(1 − S _ t) Memory reconsolidation Reconsolidation Gate Belief revision on reactivation Evidence quorum + archive/engrave A.2 Technology Stack Layer Technology Version Role Core service Rust (Axum) + Python (FastAPI) Rust stable, FastAPI 0.115 API layer; Rust for hot paths Vector store (dev) pgvector (PostgreSQL extension) 0.7 L1 kNN, dev + small prod Vector store (prod) FAISS IVF-PQ 1.8 L1 kNN, large prod shards Graph store Neo4j Community / Memgraph Neo4j 5.x L2 Schema Graph Layer Technology Version Role Archive store MinIO + PyArrow (Parquet) MinIO latest, PyArrow 16 L3 Deep Recess Compression FAISS PQ + ZSTD FAISS 1.8, ZSTD 1.5 VQ embeddings + text compression Embeddings all-MiniLM-L6-v2 (sentence- transformers) 2.7 Local, zero API cost; 384-d or 1536-d Directory service etcd 3.5 Shard routing, namespace ACLs Scheduler APScheduler 3.10 Sleep cycle cron jobs Testing pytest + pytest-asyncio + aiosqlite pytest 8.3 In-memory SQLite for integration tests Metrics Prometheus + Grafana Latest Ring buffers + Prom export Layer Technology Version Role Tracing OpenTelemetry 1.x Spans across API → store → sleep A.3 Plasticity Coefficient Defaults Signal Symbol Default Weight Tuning Range Notes Salience S α = 1.0 0.5–2.0 Highest weight; drives most promotions Usage U β = 0.3 0.1–0.5 Log-scaled; prevents frequency-only dominance Authority A γ = 0.5 0.3–1.0 Raise for medical/legal namespaces (γ = 0.9) Conflict C λ = 0.7 0.5–1.5 Negative weight; raise for high-integrity namespaces Surprise Δ δ = 0.4 0.2–0.8 Novelty accelerator; lower for stable-fact domains A.4 Mathematical Reference Confidence update: conf{t+1} = conf_t + η₁·(A + S + log(1+U)) − η₂·C η₁ = 0.05 (positive reinforcement rate) η₂ = 0.03 (conflict penalty rate) bounded: 0 ≤ conf_t ≤ 1 Stability curve: Stability_i(t) = e^(−k_i · (t − t_last_use)) k_i = per-item decay constant High confidence → smaller k_i (slower decay) System health metric: Health = (1/N) · Σ_i (ω₁·conf_i + ω₂·Stability_i) ω₁ = 0.6, ω₂ = 0.4 Target: Health > 0.85 Schema graph edge update (Hebbian learning): w_{ij, t+1} = w_{ij,t} + ζ · sim(m_i, m_j) w_{ij} ← w_{ij} / Σ_k w_{ik} (normalized) ζ = 0.01 (co-activation learning rate) A.5 Project File Structure cls-plus-plus/ ├── Cargo.toml # Rust workspace ├── pyproject.toml # Python workspace (ruff, mypy) ├── Makefile ├── docker-compose.yml # All 6 services ├── docker-compose.override.yml # Dev hot-reload │ ├── services/ │ ├── api/ │ │ └── src/ │ │ ├── main.rs │ │ ├── routes.rs # All 8 endpoints
Rust (Axum): routes, auth, provenance
│ │ ├── auth.rs # OAuth2 Bearer │ │ └── provenance.rs # Store + version attribution │ │ │ └── embedding/ # Python (FastAPI): sentence-transformers │ └── app/ │ ├── main.py │ └── embedder.py │ ├── python/ │ └── clspp/ │ ├── client.py # Python SDK │ ├── stores/ │ │ ├── working_buffer.py # L0: deque + Redis │ │ ├── indexing_store.py # L1: FAISS/pgvector │ │ ├── schema_graph.py # L2: Neo4j adapter │ │ └── deep_recess.py # L3: MinIO Parquet │ ├── core/ │ │ ├── plasticity.py # 5-signal scorer + decay + boost │ │ ├── engram.py # Formation logic + checksum │ │ ├── reconsolidation.py # Conflict detection + quorum + gate │ │ └── consent.py # Consent enforcement + RTBF │ ├── jobs/ │ │ └── sleep_cycle.py # N1/N2/N3/REM phases + checkpoint │ └── eval/ │ ├── lhra_harness.py # Long-horizon recall probes │ ├── drift_harness.py # Adversarial injection tests │ └── cp_harness.py # Correction persistence probes │ ├── infra/ │ ├── postgres/ │ │ └── schema.sql │ └── k8s/ # All tables + pgvector + RLS
Future: StatefulSets, CronJobs
│ ├── tests/ │ ├── conftest.py │ ├── test_unit_plasticity.py │ ├── test_unit_engram.py │ ├── test_unit_reconsolidation.py │ ├── test_unit_sleep_cycle.py │ ├── test_integration_stores.py │ ├── test_integration_api.py │ ├── test_integration_consent.py
Fixtures, store mocks, factories
│ ├── test_integration_namespace.py │ └── test_load_200rps.py │ │ ├── scripts/ │ ├── backup.sh │ ├── restore.sh │ └── chaos_checklist.md └── reports/ └── clspp_eval_v1.md # pg_dump + MinIO Parquet sync
v1 evaluation report (post Phase 5)
A.6 Patent Claims Summary Claims filed in Provisional Patent (October 2025, 35 U.S.C. § 111(b)): Claim Type Summary 1 System Four-store CLS++ architecture (Working Buffer, Indexing Store, Schema Graph, Deep Recess) with plasticity controller and sleep cycle 2 Method Method for continuous AI memory: encode, score, promote, consolidate, sleep, reconsolidate 3 Storage Medium Computer-readable medium storing instructions for Claim 2 4 Distributed Same system operating across distributed cluster with directory service and dynamic shard rebalancing 5 Plasticity Adaptation Plasticity coefficients (α, β, γ, λ, δ) that adapt dynamically via reinforcement learning to optimize LHRA 6 Security Per-memory consent tagging; encrypted lanes; distributed deletion receipts for RTBF This document was written on 2026-02-19 and covers the CLS++ v1 design as specified in the research paper by Rajamohan Jabbala (AlphaForge AI Labs, October 2025). All section numbers, mathematical formulas, and architectural decisions are grounded in the original 179-page specification. This design document is the implementation-ready translation of that specification.