Memory system

Three memory tiers, five storage layers. Strict role separation is what prevents a long-running agentic system from collapsing under its own weight.

Why three tiers

An agent with only one memory — its context window — cannot hold a conversation beyond a few hours. An agent with a single external memory (a vector store, for example) cannot distinguish what just happened from what we discussed last week.

Nika OS posits three tiers for this reason:

  1. Working memory — the context window of the current instance. Bounded (~200 kB of tokens). Automatically compacted at 61 % fill.
  2. Episodic memory — the operational history: sessions, jobs, tournaments, decisions, pod events. Stored in Qdrant + JSONL.
  3. Semantic memory — general knowledge: ingested documents, patterns, code snippets, articles, answers to customer questions. Stored in Qdrant (nika_vault, 413,000+ points).

The five storage layers

LayerStoreContentGood forBad for
SemanticQdrantSummaries, decisions, docs, patterns, preferencesSemantic recall, similarityExact state, locks, ownership
Workflowhierarchy.py + YAMLEntity status, parent/child, deadlines, assignmentExecution source of truthSemantic search
IPCRedis Streams + HashesEntity feed, working memory, signaling, contractsReal-time inter-pod commsLong-term history
TransactionalRedisSemaphore, heartbeat, metrics, TTLReal-time state, concurrencyLong-term history
CommsJSONL busInter-agent messages, review requests, dispatchesAudit trail, coordinationComplex search

The golden rule: semantic RAG is never the source of truth for execution state. Qdrant recalls what has been. The YAML hierarchy says what is now. Redis IPC carries what is happening at time T.

Mixing these roles produces subtle bugs: a pod that believes a job is done because Qdrant has a summary, while the YAML still says in_progress, will take the wrong decisions.

The NIKA_META envelope

Every Qdrant point, every bus message, every memory chunk carries a standardized NIKA_META envelope. This envelope answers the questions: who? where? when? what? why? how is it linked to what?

GroupFieldsExamples
Identity (WHO)session_id, tmux, agent_type, triggersession_id=abc, agent=alpha, trigger=user
Hierarchy (WHERE)project_id, job_id, task_idPROJ-NIKA-CORE, JOB-HOOKS-V3, TASK-xxx
Temporal (WHEN)timestamp, duration_s2026-05-22T10:48:25Z, 120
Classification (WHAT/WHY)action_type, domain, intentcode, hooks, feature
Graph (LINKS)entity_ids[], files[], tools[], parent_id, produces[][JOB-xxx], [on_stop.py], [Edit, Bash]

Values are constrained by controlled enumerations:

  • action_type: code, research, debug, deploy, config, review, doc, system, comms
  • domain: hooks, infra, rag, hierarchy, mcp, ui, finance, browser, agent, memory
  • intent: feature, fix, refactor, optimize, explore, maintain, migrate, test
  • trigger: user, cron, autonomous, hook, pod, daemon, system

This envelope discipline enables three things:

  1. Filtering a RAG search by project, domain, or action type.
  2. Reconstructing a graph of related events (file edited by which pod, stemming from which job).
  3. Auditing retrospectively why the system made a given decision.

Automatic compaction

Nika OS automatically compacts the context window at 61 % fill (an arbitrary threshold chosen to preserve maneuvering room). Compaction triggers the PreCompact hook, which produces a handoff packet:

PreCompact handoff packet
├── Decisions taken (with timestamps)
├── Modified files (with summarized lines/diff)
├── Pending tasks (with minimal context to resume)
├── RAG state (recent queries, results)
└── Link to fired lifecycle hooks

This packet is ingested to Qdrant. The pod restarts with a clean context and can reconstitute the packet through a single RAG search at boot.

PALACE PROTOCOL: the factual hard gate

Before answering a factual question about a client, a project, a past decision, or a remembered entity, the system must first query RAG (nika_rag_search or qdrant-find).

If the retrieval returns zero relevant hit, the correct answer is: “no memory in RAG for {X}”. Not a fabrication. Not an approximation. Not a “general” answer.

This rule is called PALACE PROTOCOL, after the mempalace project (the highest-scoring memory system on public benchmarks, free under MIT). The underlying principle: RAG is the only source of truth about the past.

Auto-memory in Claude Code

Alongside RAG, the Alpha pod leverages Claude Code’s native auto-memory: a persistent directory ~/.claude/projects/-home-nika-vault/memory/ that contains:

  • an index file MEMORY.md (always loaded, max 200 lines);
  • topic-specific files (one-line description in frontmatter).

Entries are saved when the user provides non-trivial feedback, shares a current project, or corrects an approach. Triage is by four types: user, feedback, project, reference. The system does not save what can be derived from current code or git log.