Memory system

Three memory tiers, five storage layers. Strict role separation is what prevents a long-running agentic system from collapsing under its own weight.

Why three tiers

An agent with only one memory — its context window — cannot hold a conversation beyond a few hours. An agent with a single external memory (a vector store, for example) cannot distinguish what just happened from what we discussed last week.

Nika OS posits three tiers for this reason:

  1. Working memory — the context window of the current instance. Bounded (~200 kB of tokens). Automatically compacted at 61 % fill.
  2. Episodic memory — the operational history: sessions, jobs, tournaments, decisions, pod events. Stored in Qdrant + JSONL.
  3. Semantic memory — general knowledge: ingested documents, patterns, code snippets, articles, answers to questions. Stored in Qdrant (nika_vault).

The five storage layers

LayerStoreContentGood forBad for
SemanticQdrantSummaries, decisions, docs, patterns, preferencesSemantic recall, similarityExact state, locks, ownership
Workflowhierarchy.py + YAMLEntity status, parent/child, deadlines, assignmentExecution source of truthSemantic search
IPCRedis Streams + HashesEntity feed, working memory, signaling, contractsReal-time inter-pod commsLong-term history
TransactionalRedisSemaphore, heartbeat, metrics, TTLReal-time state, concurrencyLong-term history
CommsJSONL busInter-agent messages, review requests, dispatchesAudit trail, coordinationComplex search

The golden rule: semantic RAG is never the source of truth for execution state. Qdrant recalls what has been. The YAML hierarchy says what is now. Redis IPC carries what is happening at time T.

Mixing these roles produces subtle bugs: a pod that believes a job is done because Qdrant has a summary, while the YAML still says in_progress, will take the wrong decisions.

The NIKA_META envelope

Every Qdrant point, every bus message, every memory chunk carries a standardized NIKA_META envelope. This envelope answers the questions: who? where? when? what? why? how is it linked to what?

GroupFieldsExamples
Identity (WHO)session_id, tmux, agent_type, triggersession_id=abc, agent=kernel, trigger=user
Hierarchy (WHERE)project_id, job_id, task_idPROJ-NIKA-CORE, JOB-HOOKS-V3, TASK-xxx
Temporal (WHEN)timestamp, duration_s2026-05-22T10:48:25Z, 120
Classification (WHAT/WHY)action_type, domain, intentcode, hooks, feature
Graph (LINKS)entity_ids[], files[], tools[], parent_id, produces[][JOB-xxx], [on_stop.py], [Edit, Bash]

Values are constrained by controlled enumerations:

  • action_type: code, research, debug, deploy, config, review, doc, system, comms
  • domain: hooks, infra, rag, hierarchy, mcp, ui, finance, browser, agent, memory
  • intent: feature, fix, refactor, optimize, explore, maintain, migrate, test
  • trigger: user, cron, autonomous, hook, pod, daemon, system

This envelope discipline enables three things:

  1. Filtering a RAG search by project, domain, or action type.
  2. Reconstructing a graph of related events (file edited by which pod, stemming from which job).
  3. Auditing retrospectively why the system made a given decision.

Automatic compaction

Nika OS automatically compacts the context window at 61 % fill (an arbitrary threshold chosen to preserve maneuvering room). Compaction triggers the PreCompact hook, which produces a handoff packet:

PreCompact handoff packet
├── Decisions taken (with timestamps)
├── Modified files (with summarized lines/diff)
├── Pending tasks (with minimal context to resume)
├── RAG state (recent queries, results)
└── Link to fired lifecycle hooks

This packet is ingested to Qdrant. The pod restarts with a clean context and can reconstitute the packet through a single RAG search at boot.

WATERFALL RETRIEVAL: the factual hard gate

Before answering a factual question about a client, a project, a past decision, or a remembered entity, the system must first run a wide→narrow retrieval cascade: semantic RAG search (nika_rag_search or qdrant-find) → naming-key/metadata filter → dependency graph (graphify) → exact grep + structured state (Postgres) → targeted hits.

If the cascade returns zero relevant hit, the correct answer is: “no memory in RAG for {X}”. Not a fabrication. Not an approximation. Not a “general” answer.

This rule is called WATERFALL RETRIEVAL (formerly PALACE PROTOCOL), after the cascade that starts from broad meaning and converges to exact hits. The underlying principle is unchanged: retrieval is the only source of truth about the past — and the loop closes through well-named deliverables, which are themselves immediately findable by every stage of the cascade.

Auto-memory

Alongside RAG, the kernel agent leverages the CLI agent’s native auto-memory when it is available: a persistent directory (resolved at install, $NIKA_CONFIG/memory/) that contains:

  • an index file MEMORY.md (always loaded, max 200 lines);
  • topic-specific files (one-line description in frontmatter).

Entries are saved when the user provides non-trivial feedback, shares a current project, or corrects an approach. Triage is by four types: user, feedback, project, reference. The system does not save what can be derived from current code or git log.