Memory system

Three memory tiers, five storage layers. Strict role separation is what prevents a long-running agentic system from collapsing under its own weight.

Why three tiers

An agent with only one memory — its context window — cannot hold a conversation beyond a few hours. An agent with a single external memory (a vector store, for example) cannot distinguish what just happened from what we discussed last week.

Nika OS posits three tiers for this reason:

Working memory — the context window of the current instance. Bounded (~200 kB of tokens). Automatically compacted at 61 % fill.
Episodic memory — the operational history: sessions, jobs, tournaments, decisions, pod events. Stored in Qdrant + JSONL.
Semantic memory — general knowledge: ingested documents, patterns, code snippets, articles, answers to questions. Stored in Qdrant (nika_vault).

The five storage layers

Layer	Store	Content	Good for	Bad for
Semantic	Qdrant	Summaries, decisions, docs, patterns, preferences	Semantic recall, similarity	Exact state, locks, ownership
Workflow	`hierarchy.py` + YAML	Entity status, parent/child, deadlines, assignment	Execution source of truth	Semantic search
IPC	Redis Streams + Hashes	Entity feed, working memory, signaling, contracts	Real-time inter-pod comms	Long-term history
Transactional	Redis	Semaphore, heartbeat, metrics, TTL	Real-time state, concurrency	Long-term history
Comms	JSONL bus	Inter-agent messages, review requests, dispatches	Audit trail, coordination	Complex search

The golden rule: semantic RAG is never the source of truth for execution state. Qdrant recalls what has been. The YAML hierarchy says what is now. Redis IPC carries what is happening at time T.

Mixing these roles produces subtle bugs: a pod that believes a job is done because Qdrant has a summary, while the YAML still says in_progress, will take the wrong decisions.

The `NIKA_META` envelope

Every Qdrant point, every bus message, every memory chunk carries a standardized NIKA_META envelope. This envelope answers the questions: who? where? when? what? why? how is it linked to what?

Group	Fields	Examples
Identity (WHO)	session_id, tmux, agent_type, trigger	`session_id=abc`, `agent=kernel`, `trigger=user`
Hierarchy (WHERE)	project_id, job_id, task_id	`PROJ-NIKA-CORE`, `JOB-HOOKS-V3`, `TASK-xxx`
Temporal (WHEN)	timestamp, duration_s	`2026-05-22T10:48:25Z`, `120`
Classification (WHAT/WHY)	action_type, domain, intent	`code`, `hooks`, `feature`
Graph (LINKS)	entity_ids[], files[], tools[], parent_id, produces[]	`[JOB-xxx]`, `[on_stop.py]`, `[Edit, Bash]`

Values are constrained by controlled enumerations:

action_type: code, research, debug, deploy, config, review, doc, system, comms
domain: hooks, infra, rag, hierarchy, mcp, ui, finance, browser, agent, memory
intent: feature, fix, refactor, optimize, explore, maintain, migrate, test
trigger: user, cron, autonomous, hook, pod, daemon, system

This envelope discipline enables three things:

Filtering a RAG search by project, domain, or action type.
Reconstructing a graph of related events (file edited by which pod, stemming from which job).
Auditing retrospectively why the system made a given decision.

Automatic compaction

Nika OS automatically compacts the context window at 61 % fill (an arbitrary threshold chosen to preserve maneuvering room). Compaction triggers the PreCompact hook, which produces a handoff packet:

PreCompact handoff packet
├── Decisions taken (with timestamps)
├── Modified files (with summarized lines/diff)
├── Pending tasks (with minimal context to resume)
├── RAG state (recent queries, results)
└── Link to fired lifecycle hooks

This packet is ingested to Qdrant. The pod restarts with a clean context and can reconstitute the packet through a single RAG search at boot.

WATERFALL RETRIEVAL: the factual hard gate

Before answering a factual question about a client, a project, a past decision, or a remembered entity, the system must first run a wide→narrow retrieval cascade: semantic RAG search (nika_rag_search or qdrant-find) → naming-key/metadata filter → dependency graph (graphify) → exact grep + structured state (Postgres) → targeted hits.

If the cascade returns zero relevant hit, the correct answer is: “no memory in RAG for {X}”. Not a fabrication. Not an approximation. Not a “general” answer.

This rule is called WATERFALL RETRIEVAL (formerly PALACE PROTOCOL), after the cascade that starts from broad meaning and converges to exact hits. The underlying principle is unchanged: retrieval is the only source of truth about the past — and the loop closes through well-named deliverables, which are themselves immediately findable by every stage of the cascade.

Auto-memory

Alongside RAG, the kernel agent leverages the CLI agent’s native auto-memory when it is available: a persistent directory (resolved at install, $NIKA_CONFIG/memory/) that contains:

an index file MEMORY.md (always loaded, max 200 lines);
topic-specific files (one-line description in frontmatter).

Entries are saved when the user provides non-trivial feedback, shares a current project, or corrects an approach. Triage is by four types: user, feedback, project, reference. The system does not save what can be derived from current code or git log.