Agent Life Format (ALF) — Specification

Version: 1.0.0-rc.1
Date: 2026-02-26
Status: Release Candidate
Repository: https://github.com/agent-life/agent-life-data-format
Site: https://agent-life.ai
License: MIT


Abstract

Your AI agent accumulates memory, develops a persona, builds relationships with you, and owns credentials that allows it to use services and tools. All of that state is locked inside one framework's proprietary storage. The Agent Life Format (ALF) is an open standard that makes it portable.

ALF is designed as a runtime-neutral superset: it captures the union of data models used by today's agent frameworks so that exporting from any supported runtime preserves all information. The format is intentionally decoupled from any specific LLM provider, embedding model, or execution engine.

ALF is a serialization format for the complete durable state of an AI agent — memories, identity, relationships, and credentials. ALF enables agent backup, migration between runtimes, and incremental sync to cloud storage without information loss. An .alf file is a standard ZIP archive containing everything your agent knows and is — structured so that any compatible runtime can read it, restore it, or migrate it without losing information.

Backup

Snapshot your agent's accumulated knowledge and personality to a single portable file. Incremental sync means your backup grows as your agent grows.

Migration

Move between runtimes (e.g. OpenClaw to ZeroClaw) without starting over. One neutral format, N adapters — not N² point-to-point converters.

Incremental Sync

Push lightweight deltas to cloud storage after each session. Only changed records transfer — sealed quarterly partitions never retransmit.

ALF is runtime-neutral by design. It captures the union of data models used by today's agent frameworks so that exporting from any supported runtime preserves all information. The format is decoupled from any specific LLM provider, embedding model, or execution engine.

The Four Data Layers

ALF organizes everything your agent accumulates into four layers, from the most frequently changing (memory) to the most sensitive (credentials):

ALF Data Layers
1

Memory & Experience

Facts, episodes, procedures, preferences, summaries — with embeddings, entities, temporal metadata, and relational links

standard
2

Identity & Persona

Persona (structured fields + prose blocks), capabilities, sub-agent roster, AIEOS-compatible extensions

standard
3

Principals & Relationships

User profiles, agent-to-agent relationships, communication preferences

standard
4

Credentials

API keys, OAuth tokens, SSH keys — zero-knowledge encrypted on your device, never readable by the sync service

critical · encrypted

How Data Moves

An adapter exports your agent's native storage into the neutral ALF format. From there, it can be backed up, synced incrementally, or imported into a different runtime:

Agent Runtime OpenClaw, ZeroClaw, ...
export
.alf neutral format
delta sync
Cloud Storage encrypted, versioned
import
New Runtime restore or migrate

The rest of this document is the formal specification. Sections 1–2 cover purpose and background. Sections 3–4 define the data layers and container format. Sections 5–8 explain design decisions and requirements. Sections 9–11 cover testing, coverage, and the design decision log.

Conformance

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Notation


1. Purpose and Scope

This document specifies the Agent Life Format (ALF) — an open, portable serialization format for the complete durable state of an AI agent. The format satisfies two goals:

  1. Storage-efficient for backup and sync — compact enough for incremental transfer to an off-host service.
  2. Query-efficient for direct use — structured enough that a cloud agent runtime could use the neutral store as its primary persistence layer (future goal).

ALF is designed as a superset of the data models used by today's major agent runtimes, so that exporting from any supported runtime loses no information. The initial target runtimes are OpenClaw and ZeroClaw, with the architecture open to additional adapters.

Scope boundaries — what ALF stores and what it does not:

ALF captures an agent's knowledge state (memories, learned facts, preferences, relational links), identity state (persona, behavioral configuration, capabilities, sub-agent roster), relationship state (principals, user context), and credentials (encrypted). These are the durable, portable layers that define who an agent is and what it knows — the things that matter when backing up, migrating, or restoring an agent.

ALF explicitly excludes active execution state: in-flight tool calls, pending workflow steps, loop counters, call stacks, LangGraph checkpoints, queued callbacks, and any other transient runtime process state. Execution state is inherently runtime-specific (a LangGraph checkpoint cannot be restored into an OpenClaw runtime, and vice versa), ephemeral (it exists only for the duration of a task), and not part of the agent's durable identity or knowledge. Attempting to serialize execution state would couple the format to every runtime's internal execution model, violating the core design philosophy of runtime neutrality.

Practical implication: Migrating or restoring an agent mid-task will abort the in-progress task. The agent retains all knowledge accumulated up to the point of export (including memories from the current session), but any unfinished multi-step workflow is lost. This is the correct tradeoff — execution state is cheap to recreate (re-run the task), while knowledge and identity are expensive to recreate (months of accumulated context).


2. Background — Current Landscape

This section is informational. It summarizes the data models of existing agent frameworks to motivate the design choices in subsequent sections.

2.1 OpenClaw Data Model

OpenClaw stores all agent state as plain Markdown files in a workspace directory. The key files are:

File Layer Description
SOUL.md Identity Personality, tone, values, boundaries. Freeform prose loaded every session.
IDENTITY.md Identity Structured profile: name, role, goals, voice.
AGENTS.md Identity / Procedural Operating instructions, priorities, workflow rules, memory management rules.
USER.md User Context Communication preferences, timezone, work context, known constraints.
MEMORY.md Memory (long-term) Curated persistent facts and compressed history. Recommended <200 lines.
memory/YYYY-MM-DD.md Memory (daily) Per-day working notes. Auto-preprompted; older logs retrieved via semantic search.
TOOLS.md Capabilities Notes about available local tools and conventions (guidance, not enforcement).
HEARTBEAT.md Procedural Checklist for periodic autonomous runs.
BOOT.md Procedural Startup checklist executed on gateway restart.
BOOTSTRAP.md Procedural One-time first-run interview script. Deleted after setup.

Memory search is hybrid: 70% vector (semantic via sqlite-vec) + 30% BM25 keyword (FTS5). The Markdown files are the source of truth; vector indexes are derivative.

Credentials live in ~/.openclaw/ (config, sessions, environment variables), not in the workspace.

For the complete field-by-field mapping from OpenClaw's workspace to ALF, see the OpenClaw adapter documentation.

2.2 ZeroClaw Data Model

ZeroClaw uses a TOML config (~/.zeroclaw/config.toml) and supports multiple memory backends:

Backend Format Features
SQLite (default) memories.db Hybrid search (FTS5 + vector cosine), categories (conversation, daily, core), embedding cache.
Markdown memory/*.md Daily files + session files, archive directory, hygiene system. OpenClaw-compatible layout.
Lucid CLI sync + SQLite fallback External memory service bridge.
None No-op Stateless mode.

Identity supports two formats via [identity] format:

Credentials are encrypted at rest using ChaCha20-Poly1305 with a local .secret_key. OAuth tokens and auth profiles are stored in .zeroclaw/state/auth_profiles.json, also encrypted.

Memory tools: memory_store, memory_recall, memory_forget (agent-invoked, auto-save optional).

For the complete field-by-field mapping from ZeroClaw to ALF, see the ZeroClaw adapter documentation.

2.3 AIEOS v1.1 (AI Entity Object Specification)

AIEOS is a JSON schema standard focused on identity portability. Core structure:

AIEOS covers the identity/persona layer well but does not address experiential memory, user context, or credentials.

2.4 Mem0 Memory Architecture

Mem0 defines a memory-centric approach with three pillars:

Key distinctions from RAG: memory is stateful, user-aware, temporally aware, and adaptive. Mem0 stores memories as embeddings (the embedding is the primary copy, unlike OpenClaw where Markdown is the source of truth).

2.5 LangChain/LangGraph Memory Taxonomy

LangChain categorizes long-term memory into three types drawn from cognitive science:

Memory can be written "in the hot path" (synchronously during interaction) or "in the background" (asynchronously after interaction). Both approaches have tradeoffs in latency, consistency, and completeness.

2.6 AWS AgentCore Memory

AgentCore implements a multi-stage long-term memory pipeline:

  1. Extraction — LLM analyzes conversations to extract memories per strategy (semantic facts, user preferences, conversation summaries).
  2. Consolidation — new memories are compared against existing memories via semantic similarity; the system ADDs, UPDATEs, or NO-OPs to avoid duplicates and resolve conflicts. Outdated memories are marked INVALID (immutable audit trail) rather than deleted.
  3. Retrieval — semantic search returns relevant memories in ~200ms.

Compression rates of 89–95% are achieved (memory tokens vs. full conversation history tokens), which is critical for staying within context windows.


3. Data Layer Specification

The neutral format organizes agent state into four layers, each with its own schema, sensitivity, and sync characteristics.

3.1 Layer 1 — Memory and Experience

This is the highest-volume, most frequently updated layer. It must capture memories from all observed runtime formats without information loss.

3.1.1 Memory Record Schema

Each memory record MUST contain:

Field Type Required Description
id string (UUID v7) Yes Globally unique, time-sortable identifier.
agent_id string (UUID) Yes The agent this memory belongs to.
content string Yes The memory text. Plain text or Markdown.
memory_type enum Yes One of: semantic, episodic, procedural, preference, summary. See §3.1.2.
category string No Runtime-defined category (e.g. ZeroClaw's conversation, daily, core). Free-form, for adapter round-trip fidelity.
source object Yes Provenance. See §3.1.3.
temporal object Yes Temporal metadata. See §3.1.4.
status enum Yes One of: active, superseded, archived, deleted.
supersedes string (UUID) No ID of the memory this record replaces (for consolidation audit trail).
confidence float (0.0–1.0) No Confidence score if the source runtime provides one.
entities array of object No Extracted entities with type and name. See §3.1.5.
tags array of string No User or agent-defined tags for filtering.
namespace string Yes Scoping namespace. Default: "default". Enables separation of user-context memories, project-scoped memories, etc.
embeddings array of object No Pre-computed embedding vectors, tagged by model. See §3.1.6.
related_records array of object No Links to other memory records with typed relationships. See §3.1.12.
raw_source_format object No Opaque blob preserving the original runtime-specific representation for lossless round-trip.

3.1.2 Memory Types

The type taxonomy is drawn from the LangChain/cognitive-science model, extended with preference and summary from AgentCore:

3.1.3 Source Provenance

Every memory must record where it came from:

{
  "source": {
    "runtime": "openclaw",
    "runtime_version": "1.2.0",
    "origin": "daily_log",
    "origin_file": "memory/2026-02-10.md",
    "extraction_method": "agent_written",
    "session_id": "abc123",
    "interaction_id": "msg_456",
    "identity_version": 3
  }
}

3.1.4 Temporal Metadata

Temporal context is critical for memory consolidation and conflict resolution:

{
  "temporal": {
    "created_at": "2026-02-10T14:30:00Z",
    "updated_at": "2026-02-15T09:00:00Z",
    "observed_at": "2026-02-10T14:25:00Z",
    "valid_from": "2026-02-10T00:00:00Z",
    "valid_until": null,
    "last_accessed_at": "2026-02-20T11:00:00Z",
    "access_count": 7
  }
}

3.1.5 Entity References

For memory records that reference named entities:

{
  "entities": [
    { "name": "Acme Corp", "type": "organization", "role": "employer" },
    { "name": "Jane", "type": "person", "role": "colleague" }
  ]
}

Entity types: person, organization, project, location, tool, service, other.

3.1.6 Embeddings (Store-and-Tag Approach)

Design decision: The format stores embeddings as-is from the source runtime and tags them explicitly by model, rather than standardizing on a single "neutral" embedding model. This avoids coupling the format to a specific vendor, eliminates expensive bulk re-embedding on initial migration, and ages well as embedding models improve.

Rationale:

Schema: Each memory record MAY contain zero or more embeddings, keyed by model:

{
  "embeddings": [
    {
      "model": "openai/text-embedding-3-small",
      "dimensions": 1536,
      "vector": [0.012, -0.034, ...],
      "computed_at": "2026-02-10T14:30:00Z",
      "source": "runtime"
    }
  ]
}
Field Type Required Description
model string Yes Embedding model identifier in provider/model format.
dimensions integer Yes Vector dimensionality.
vector array of float Yes The embedding vector.
computed_at ISO 8601 timestamp Yes When this embedding was computed.
source enum Yes Who computed it: runtime, sync_service, import_adapter.

Behavioral rules:

3.1.7 Daily Logs (OpenClaw Compatibility)

OpenClaw's daily log pattern (memory/YYYY-MM-DD.md) is a special case. The neutral format represents these as:

3.1.8 Memory Consolidation Metadata

To support AgentCore-style consolidation:

3.1.9 Workspace Artifact References

Agent workspaces contain files beyond the agent's core memory and identity — generated code, data files, images, documents, and other artifacts the agent produced or referenced during tasks. These need to be classified and handled differently from managed agent state.

Three-tier model:

The format distinguishes three tiers of workspace files, based on their relationship to the agent's operational state — not on whether they are binary or text:

Tier 1 — Managed agent state. Runtime files the agent reads and writes as part of its core operation. Examples: OpenClaw's SOUL.md, MEMORY.md, shares_docs.md; ZeroClaw's config.toml, memories.db. These are the agent's mind — small, actively curated, and essential to runtime behavior. They go in raw/{runtime}/ and are represented in the structured layers (memory records, identity, principals). Already handled by the existing spec — no changes needed.

Tier 2 — Portable artifacts. Small workspace files worth carrying with the agent across migrations. Examples: a short script the agent wrote, a small data file it references frequently, a brief CSV it maintains. These are under a size threshold (default: 100 KB per file) and provide useful context if the agent migrates to a new runtime. Too large to inline in a memory record's content field, but small enough that excluding them loses value. These are stored in the artifacts/ directory within the archive and referenced from the artifact index.

Tier 3 — Referenced artifacts. Large workspace files that would bloat the archive. Examples: a 10,000-line CSV, a 50-file Python codebase, a collection of images, large PDFs. Cataloged by reference only (filename, hash, size, source path). The agent's knowledge about them is captured in memory records; the files themselves stay behind. A future version of the sync service can store these online (via the remote_ref field) without a schema change.

Why not binary vs. text? The original design used "binary file" as a proxy for "large, not agent state." But a 10,000-line CSV is text and would bloat the archive just as badly as a PNG. Conversely, a small icon used in a generated UI is binary but trivially portable. The meaningful distinction is managed state vs. artifact, and small vs. large — not binary vs. text.

Schema: The artifact index is a top-level file in the ALF archive (attachments.json):

{
  "artifact_size_threshold": 102400,
  "attachments": [
    {
      "id": "uuid",
      "filename": "shares_tracker.csv",
      "media_type": "text/csv",
      "size_bytes": 4200,
      "hash": {
        "algorithm": "sha256",
        "value": "a1b2c3d4..."
      },
      "source_path": "workspace/shares_tracker.csv",
      "archive_path": "artifacts/shares_tracker.csv",
      "remote_ref": null,
      "referenced_by": ["memory-record-uuid-1"]
    },
    {
      "id": "uuid",
      "filename": "architecture-diagram.png",
      "media_type": "image/png",
      "size_bytes": 284190,
      "hash": {
        "algorithm": "sha256",
        "value": "e3b0c442..."
      },
      "source_path": "workspace/images/architecture-diagram.png",
      "archive_path": null,
      "remote_ref": null,
      "referenced_by": ["memory-record-uuid-2", "memory-record-uuid-3"]
    }
  ]
}
Field Type Required Description
artifact_size_threshold integer No The size threshold (in bytes) used to classify Tier 2 vs. Tier 3. Default: 102400 (100 KB). Present at the top level for documentation and reproducibility.
id string (UUID) Yes Unique identifier for this artifact reference.
filename string Yes Original filename.
media_type string Yes MIME type (e.g., image/png, text/csv, text/x-python).
size_bytes integer Yes File size at time of export.
hash object Yes Content hash. algorithm (default sha256) + value.
source_path string Yes Path relative to the runtime workspace where the file was found.
archive_path string or null Yes Path within the ALF archive if the file is included (Tier 2). null if reference-only (Tier 3).
remote_ref string or null Yes URL for future online storage. null in initial releases.
referenced_by array of string (UUIDs) No Memory record IDs that reference this artifact.

Behavioral rules:

3.1.10 Memory-to-Identity Lineage

Agents evolve. An agent might act differently on Day 100 than on Day 1 because its system prompt or identity configuration was updated by the user. Episodic and procedural memories are particularly sensitive to this — if a memory says "I refused to answer this question" but the current identity says "answer all questions," it looks like a contradiction without context.

The identity version timeline and memory timestamps are both present in the format, so lineage could be reconstructed by correlating timestamps. But implicit correlation is fragile: it requires maintaining a separate index of "identity version N was active from T1 to T2" and doing range lookups. If an identity was updated twice in one session, time resolution might not disambiguate.

Solution: The source object on each memory record includes an optional identity_version integer field (see §3.1.3). On export, the adapter stamps the current identity version onto each new memory record. This makes lineage explicit and queryable — "show me all memories created under identity v3" is a simple filter, and "this refusal made sense because the identity at that time said X" is a direct lookup into the identity version history.

Design notes:

3.1.11 Hard Deletion and Data Purge

Problem: Section 3.1.8 defines soft deletes (tombstones) as the default deletion mechanism, and §7.2.2 states that sealed partitions are immutable. However, soft deletes are legally insufficient when a user exercises GDPR Article 17 ("right to erasure"), CCPA deletion rights, or simply needs to purge accidentally memorized sensitive data (passwords, government IDs, medical information). The format and sync protocol must support hard deletion that physically removes content.

Two deletion scopes:

1. Full agent deletion — the most common compliance request. A user asks for their entire account or a specific agent to be permanently erased. This is straightforward: the sync service destroys all objects associated with the agent (all partitions, identity, principals, credentials, deltas, the metadata index entry, and any compacted snapshots). No partition rewriting is needed — everything is deleted wholesale. The service retains only a minimal audit record (see below). Self-hosted deployments delete the .alf file directly.

2. Surgical record purge — less common but necessary. A user identifies specific memory records containing sensitive data that must be physically removed while the rest of the agent's state is preserved. This requires rewriting the affected partition.

Surgical purge operation:

The sync service (or a local tool for self-hosted deployments) performs the following:

  1. Identify the target records by ID and locate which partition(s) contain them.
  2. Read the affected partition, filter out the target records, and write a new partition with the same time range and sealed: true. The old partition is not modified — it is replaced.
  3. Update the manifest to reference the new partition (new file path, updated record count, new checksum).
  4. Destroy the old partition blob from object storage.
  5. Purge any delta blobs that contain the target records.
  6. Remove the target records from any search indexes or metadata caches.
  7. Emit an audit record (see below).

This preserves the mental model that any given partition is immutable — you never mutate one, you replace it. Caches keyed on partition hash naturally invalidate because the replacement has a different hash.

Audit trail: Both deletion scopes produce an audit record that proves the deletion occurred without revealing the purged content:

{
  "purge_id": "uuid",
  "agent_id": "uuid",
  "scope": "full_agent | record_purge",
  "record_ids": ["uuid-1", "uuid-2"],
  "partitions_affected": ["memory/partitions/2025-Q3.jsonl"],
  "reason": "gdpr_article_17 | ccpa_deletion | user_request | security_incident",
  "requested_at": "2026-03-01T10:00:00Z",
  "completed_at": "2026-03-01T10:00:05Z",
  "requested_by": "principal-uuid"
}

For full agent deletions, record_ids is omitted (everything was deleted) and partitions_affected lists all partitions that existed. The audit record itself MUST NOT contain any of the purged content. Audit records are retained according to the service's compliance retention policy.

Relationship to soft deletes: Soft deletes (§3.1.8) remain the default for normal agent operation — they're cheap, reversible, and support sync conflict resolution. Hard purge is a separate, exceptional operation triggered by explicit user request or compliance workflow. The two mechanisms coexist: a soft-deleted record still physically exists in its partition (and could be restored); a hard-purged record is physically gone.

Format-level requirement: The .alf container format itself does not need a special structure for hard deletes — they operate at the service and storage layer. However, adapters and tools that produce .alf files locally MUST support a purge command that rewrites the archive excluding specified record IDs, for self-hosted deployments that don't use the sync service.

Advanced memory systems model memories as graphs, not just lists. A failed task (episodic) might trigger a new rule (procedural). Two memories might contradict each other without one superseding the other. A summary might elaborate on a collection of episodic memories. The format should preserve these relationships for runtimes that use them.

Schema: Each memory record MAY contain an array of typed links to other records:

{
  "related_records": [
    {
      "id": "target-memory-uuid",
      "relation": "caused_by"
    },
    {
      "id": "another-memory-uuid",
      "relation": "contradicts"
    }
  ]
}
Field Type Required Description
id string (UUID) Yes ID of the related memory record.
relation string Yes The type of relationship. Free-form — runtimes may define their own relation types. Well-known values are documented below.

Well-known relation types:

Relation Meaning Example
caused_by This memory was created as a direct result of the linked memory. A procedural rule created because a task failed (the episodic failure record).
caused Inverse of caused_by. The linked memory was created as a result of this one. The episodic failure record linking forward to the rule it prompted.
contradicts This memory conflicts with the linked memory. Neither supersedes the other — the contradiction is unresolved or intentional. Two user preferences that conflict ("User likes spicy food" vs. "User asked to avoid spicy food last week").
elaborates_on This memory provides additional detail or context for the linked memory. A detailed episodic memory linked to a summary that covers the same event.
derived_from This memory was synthesized or extracted from the linked memory. A summary derived from a collection of episodic memories.
related_to General association without a specific directional meaning. Two memories about the same project that are contextually related.

Relationship to supersedes: The supersedes field (§3.1.1) remains a separate top-level field because it has specific lifecycle semantics — it changes the old record's status to superseded and is used by compaction. related_records is for informational and navigational links that don't affect record lifecycle. A memory can both supersede another record (via supersedes) and have relational links to other records (via related_records).

Behavioral rules:

3.2 Layer 2 — Identity and Persona

This layer captures who the agent is and how it behaves.

3.2.1 Dual Representation Requirement

The fundamental tension is between prose-based identity and structured identity. The format MUST support both:

  1. Structured fields — typed, machine-readable, portable. Based on AIEOS v1.1 as the starting point.
  2. Prose blocks — rich, contextual, interpreted by the LLM at runtime. Based on OpenClaw's SOUL.md / IDENTITY.md pattern.

An adapter importing into a prose-based runtime (OpenClaw) uses the prose blocks. An adapter importing into a structured runtime (ZeroClaw with AIEOS) uses the structured fields. If one representation is missing, the adapter MAY synthesize it (e.g., LLM-generate a SOUL.md from structured fields), but this is adapter behavior, not a format requirement.

3.2.2 Identity Schema

{
  "identity": {
    "id": "uuid",
    "agent_id": "uuid",
    "version": 3,
    "updated_at": "2026-02-15T09:00:00Z",

    "structured": {
      "names": {
        "primary": "Nova",
        "nickname": "N",
        "full": "Nova Assistant"
      },
      "role": "Personal AI assistant",
      "goals": ["Help user be productive", "Learn user preferences over time"],

      "psychology": {
        "neural_matrix": {
          "creativity": 0.9,
          "logic": 0.8,
          "empathy": 0.85,
          "assertiveness": 0.6,
          "curiosity": 0.95
        },
        "personality_traits": {
          "framework": "OCEAN",
          "scores": {
            "openness": 0.9,
            "conscientiousness": 0.75,
            "extraversion": 0.6,
            "agreeableness": 0.8,
            "neuroticism": 0.2
          }
        },
        "moral_alignment": "Neutral Good",
        "mbti": "ENTP"
      },

      "linguistics": {
        "formality_level": 0.4,
        "verbosity": 0.5,
        "humor_level": 0.6,
        "slang_usage": false,
        "preferred_language": "en",
        "idiolect": {
          "catchphrases": [],
          "verbal_tics": [],
          "avoided_words": ["honestly", "genuinely"]
        }
      },

      "capabilities": [
        {
          "name": "web_search",
          "description": "Search the web for current information",
          "priority": 2,
          "portability": "intrinsic"
        },
        {
          "name": "creative_writing",
          "description": "Write fiction, poetry, marketing copy",
          "priority": 1,
          "portability": "intrinsic"
        },
        {
          "name": "docker_management",
          "description": "Build and manage Docker containers",
          "priority": 3,
          "portability": "host_dependent",
          "host_requirements": "Docker Engine installed and accessible via CLI"
        },
        {
          "name": "github_repo_reader",
          "description": "Read and analyze GitHub repositories",
          "priority": 2,
          "portability": "intrinsic",
          "credential_ids": ["credential-uuid-for-github-token"]
        }
      ],

      "aieos_extensions": {}
    },

    "prose": {
      "soul": "Full content of SOUL.md or equivalent...",
      "operating_instructions": "Full content of AGENTS.md or equivalent...",
      "identity_profile": "Full content of IDENTITY.md or equivalent...",
      "custom_blocks": {
        "boot_checklist": "Content of BOOT.md...",
        "heartbeat_checklist": "Content of HEARTBEAT.md...",
        "tools_guidance": "Content of TOOLS.md..."
      }
    },

    "source_format": "openclaw",
    "raw_source": {}
  }
}

3.2.3 Identity Versioning

Identity MUST be versioned. Every change increments the version number and records a timestamp. The sync service retains prior versions for rollback. This enables the "Framework Evaluation" use case — snapshot identity before a trial migration, restore on return.

3.2.4 Sub-Agent Roster

A managing agent that orchestrates sub-agents needs to persist a structured roster of those sub-agents — what they can do, when to use them, and what they ran on. This is distinct from free-text memories about sub-agents because it must be structured enough for programmatic routing decisions, not just LLM context.

The sub-agent roster is stored in the identity layer because it describes the agent's operational capabilities and relationships, not its experiential memory.

Schema: The roster is an array within identity.structured:

{
  "sub_agents": [
    {
      "agent_id": "uuid-or-null",
      "name": "Research Assistant",
      "description": "Deep research agent with multi-source synthesis",
      "capabilities": [
        {
          "name": "web_research",
          "description": "Deep research with source synthesis",
          "priority": 1
        },
        {
          "name": "summarization",
          "description": "Condense long documents",
          "priority": 3
        }
      ],
      "model_hints": {
        "primary_model": "anthropic/claude-sonnet-4-5",
        "last_model": "anthropic/claude-sonnet-4-5"
      },
      "routing_hints": "Use for any task requiring multi-source research or fact verification. Not suitable for creative writing.",
      "status": "active",
      "last_invoked_at": "2026-02-20T11:00:00Z",
      "performance_notes": "Excellent at technical topics. Occasionally over-cites sources."
    }
  ]
}
Field Type Required Description
agent_id string (UUID) or null Yes References the sub-agent's own .alf if it has one. null for ephemeral sub-agents that the managing agent creates and destroys per task.
name string Yes Display name of the sub-agent.
description string No Brief summary of what this sub-agent does.
capabilities array of object Yes What the sub-agent can do, with name, description, and priority (1 = highest). Used by the managing agent for task routing.
model_hints object No primary_model (model the sub-agent spent most of its life on) and last_model (most recently used). See §3.2.5.
routing_hints string No Prose guidance for when and how to use this sub-agent. Interpreted by the managing agent's LLM.
status enum Yes One of: active, inactive, unavailable.
last_invoked_at ISO 8601 timestamp No When the managing agent last delegated a task to this sub-agent.
performance_notes string No The managing agent's observations about the sub-agent's strengths and weaknesses.

Behavioral rules:

3.2.5 Model Hints

Model information is recorded as hints — factual metadata about what the agent ran on, not prescriptive requirements. The format does not dictate model choices (consistent with design philosophy #1), but preserving this information helps users and adapters set appropriate expectations.

Main agent model hints are stored in the manifest (§4.2) as runtime_hints, since the model is a runtime configuration choice rather than part of the agent's identity:

{
  "runtime_hints": {
    "primary_model": "anthropic/claude-opus-4-5",
    "last_model": "anthropic/claude-opus-4-5",
    "provider": "anthropic",
    "context_window": 200000,
    "notes": null
  }
}
Field Type Required Description
primary_model string Yes The model the agent spent most of its life on, in provider/model format. Represents the capability baseline the persona was shaped by.
last_model string Yes The most recently used model. May differ from primary_model if the agent was temporarily switched for cost or availability reasons.
provider string No The LLM provider (e.g., anthropic, openai, openrouter).
context_window integer No Context window size in tokens. Relevant because it affects memory management — an agent tuned for 200K tokens will have different boot file and MEMORY.md sizing expectations than one running in 32K.
notes string or null No Free-form notes about the runtime configuration.

Sub-agent model hints use the same primary_model / last_model pair (see §3.2.4) without the provider-level fields, since sub-agents may use different providers than the managing agent.

3.2.6 AIEOS Compatibility (Compatible Superset with Opinionated Field Promotion)

AIEOS v1.1 is the only existing open standard for AI agent identity. ZeroClaw supports it natively. Rather than building a parallel schema, the ALF identity layer is a compatible superset of AIEOS — every AIEOS field can be stored and round-tripped — but we selectively promote only the fields that map to real agent behavior into first-class structured fields. Everything else is preserved in the aieos_extensions passthrough object.

Promoted to first-class fields (these appear in identity.structured with their own schema):

AIEOS Field ALF Field Rationale
identity.names structured.names Core identity — every agent has a name.
psychology.neural_matrix structured.psychology.neural_matrix AIEOS-specific but actively used by ZeroClaw for behavior tuning. Compact, useful.
psychology.traits.ocean structured.psychology.personality_traits (with framework: "OCEAN") OCEAN/Big Five is well-established personality science with validated psychometric properties.
linguistics.text_style structured.linguistics Directly controls agent output behavior (formality, verbosity).
linguistics.idiolect structured.linguistics.idiolect Catchphrases, verbal tics, avoided words — concrete behavioral specification.
motivations.core_drive, motivations.goals structured.goals Functional — drives agent planning and prioritization.
capabilities.skills, capabilities.tools structured.capabilities Functional — used for task routing and tool discovery. Annotated with portability (see below).

Capability portability (intrinsic vs. host-dependent):

Capabilities are stored in the identity layer because the agent needs to know what it can do for task routing and planning. However, not all capabilities are equally portable across environments. A creative_writing capability is intrinsic to the LLM — it works everywhere. A docker_management capability depends on the host having Docker installed — it disappears when the agent migrates to a host without it.

Rather than splitting capabilities into a separate "environment" layer (which would need to be re-populated on every deployment and conflates portable agent state with ephemeral host configuration), each capability carries a portability annotation:

Field Type Required Description
portability string No intrinsic (LLM-native, always available regardless of host) or host_dependent (requires specific host environment). Default: intrinsic.
host_requirements string No For host_dependent capabilities, a brief description of what the host must provide (e.g., "Docker Engine accessible via CLI", "macOS with AppleScript support"). Informational — helps the importing adapter and user understand what's needed.
credential_ids array of string (UUIDs) No IDs of credentials (from Layer 4, §3.4.2) required by this capability. The reverse linkage — credential to capability — is on the credential schema via capabilities_granted. Both directions are optional and informational.

On migration, the identity doesn't change — the agent still knows about Docker. The importing adapter checks host-dependent capabilities against the target environment and reports which ones are unavailable. The runtime can then mark them as inactive so the agent doesn't attempt to use them, but the knowledge is preserved for future environments that do support them. Similarly, credential_ids tells the adapter which credentials to wire up for each capability — if a required credential is missing or expired, the adapter can report the gap.

Passed through via aieos_extensions (stored for lossless round-trip, not given first-class fields):

AIEOS Field Reason for Passthrough
psychology.traits.mbti MBTI is widely regarded as lacking psychometric validity by personality researchers. OCEAN covers the same ground with better scientific basis. Promoting MBTI to a named field implicitly endorses it as a meaningful personality measure.
psychology.moral_compass.alignment (D&D-style, e.g., "Chaotic Good") Game terminology, not a serious ethical framework. For real agents, the prose layer (SOUL.md) captures ethical boundaries and values far more richly.
identity.physicality (somatotype, facial features, aesthetic archetypes) Designed for virtual avatars and character roleplay. Most practical agents have no physical presence. Not relevant to the backup/migration use case.
identity.bio (gender, age_biological) Fictional character metadata. Harmless as passthrough data, but not warranting first-class schema structure.
identity.origin (nationality, birthplace) Same — character backstory, not functional agent state.

How this works in practice:

3.3 Layer 3 — Principals and User Context

3.3.1 Principals Model

An agent may serve multiple principals — entities it takes direction from and maintains relationship context for. A principal can be a human user or another agent (in the case of sub-agents serving a managing agent). This generalizes the original single-user model to support agent-to-agent relationships.

Each principal has its own profile and preference history. The agent's accumulated experience with each principal (communication style learned, preferences observed, interaction history) is preserved independently.

{
  "principals": [
    {
      "id": "uuid",
      "principal_type": "human",
      "agent_id": null,
      "profile": { "...see §3.3.3..." }
    },
    {
      "id": "uuid",
      "principal_type": "agent",
      "agent_id": "uuid-of-managing-agent",
      "profile": { "...see §3.3.3..." }
    }
  ]
}
Field Type Required Description
id string (UUID) Yes Unique identifier for this principal relationship.
principal_type enum Yes human or agent.
agent_id string (UUID) or null No For agent-type principals, the managing agent's ID (resolvable to another .alf if it exists). null for human principals.
profile object Yes The principal's profile. Same schema as §3.3.3.

3.3.2 Separation from Memory

Principal context is stored as a distinct, queryable namespace within the memory layer (namespace: "principal_context:{principal_id}"), plus the dedicated profile document per principal. This dual approach accommodates:

For backwards compatibility and simplicity, a single-user agent (the common case) has exactly one principal of type human. Adapters for frameworks that have no multi-principal concept import/export only the primary human principal.

3.3.3 Principal Profile Schema

{
  "id": "uuid",
  "agent_id": "uuid",
  "principal_id": "uuid",
  "version": 5,
  "updated_at": "2026-02-20T11:00:00Z",

  "structured": {
    "name": "Alex",
    "principal_type": "human",
    "timezone": "America/Los_Angeles",
    "locale": "en-US",
    "communication_preferences": {
      "tone": "casual",
      "response_length": "concise",
      "formatting": "minimal markdown"
    },
    "work_context": {
      "role": "Software engineer",
      "company": "Acme Corp",
      "projects": ["Project Alpha", "Backend migration"]
    },
    "relationships": [],
    "custom_fields": {}
  },

  "prose": {
    "user_profile": "Full content of USER.md or equivalent..."
  },

  "source_format": "openclaw",
  "raw_source": {}
}

For agent-type principals, the work_context and communication_preferences fields capture how the managing agent prefers to interact — what formats it expects results in, how it delegates tasks, its escalation preferences.

3.3.4 Versioning

Same versioning rules as identity (§3.2.3). Each principal's profile is versioned independently.

3.4 Layer 4 — Accounts and Credentials

This layer is security-critical and uses a zero-knowledge architecture.

3.4.1 Encryption Requirements

3.4.2 Credential Record Schema

{
  "credential": {
    "id": "uuid",
    "agent_id": "uuid",
    "service": "openai",
    "credential_type": "api_key",
    "label": "OpenAI Production Key",
    "capabilities_granted": ["web_search", "code_generation"],
    "encrypted_payload": "base64-encoded-ciphertext",
    "encryption": {
      "algorithm": "xchacha20-poly1305",
      "kdf": "argon2id",
      "kdf_params": {
        "memory_cost": 65536,
        "time_cost": 3,
        "parallelism": 4
      },
      "nonce": "base64-encoded-nonce"
    },
    "created_at": "2026-01-15T00:00:00Z",
    "updated_at": "2026-02-10T00:00:00Z",
    "last_rotated_at": "2026-02-10T00:00:00Z",
    "expires_at": null,
    "tags": ["production", "llm-provider"]
  }
}

The capabilities_granted field (optional) lists the capability names (from identity.structured.capabilities) that this credential enables. This tells the importing adapter which tools to wire this credential to. Many credentials are general-purpose (an OpenAI API key powers multiple capabilities), and some don't map to specific capabilities at all — the field is informational and omitted when the linkage is unclear.

The reverse linkage — capability to credential — is on the capability schema via credential_ids (see §3.2.6).

3.4.3 Supported Credential Types

The credential_type field supports: api_key, oauth_token (with refresh token in the encrypted payload), webhook_secret, session_token, ssh_key, certificate, custom.

3.4.4 Credential Lifecycle


4. Container Format

4.1 File Format

A complete agent-life snapshot is a single file with the extension .alf (Agent Life Format). It is a standard ZIP archive containing:

agent-snapshot.alf
├── manifest.json          # Format version, agent metadata, layer inventory
├── identity.json          # Layer 2: Identity and persona
├── principals.json        # Layer 3: Principals (users and agent relationships)
├── credentials.json       # Layer 4: Encrypted credential records
├── memory/
│   ├── index.json         # Partition inventory and memory-level metadata
│   ├── partitions/
│   │   ├── 2025-Q3.jsonl  # Sealed partition: Jul–Sep 2025 (immutable)
│   │   ├── 2025-Q4.jsonl  # Sealed partition: Oct–Dec 2025 (immutable)
│   │   └── 2026-Q1.jsonl  # Current partition (mutable until sealed)
│   └── embeddings/        # Optional: pre-computed embedding vectors
│       └── vectors.bin    # Binary format for compactness
├── attachments.json       # Workspace artifact index (Tier 2 included, Tier 3 reference-only)
├── artifacts/             # Portable workspace artifacts under size threshold (Tier 2)
│   ├── shares_tracker.csv
│   └── deploy_script.sh
└── raw/                   # Lossless originals from source runtime (Tier 1)
│   ├── openclaw/
│   │   ├── SOUL.md
│   │   ├── AGENTS.md
│   │   ├── USER.md
│   │   ├── MEMORY.md
│   │   └── memory/
│   │       └── 2026-02-10.md
│   └── zeroclaw/
│       ├── config.toml    # With credentials stripped/encrypted
│       └── memories.db    # SQLite database copy

4.1.1 Time-Based Memory Partitioning

Memory records are partitioned by time period. The default partition granularity is quarterly (e.g., 2025-Q3.jsonl), but adapters MAY use monthly granularity for high-activity agents. Each partition is a JSONL file containing all memory records whose temporal.created_at falls within that period.

Sealed vs. unsealed partitions:

Why this matters: For a years-old agent, a full export is still one ZIP, but most of it is sealed partitions that haven't changed since they were first written. A client that already has last quarter's sealed partitions only needs to download the current partition and any newly sealed ones. This directly maps to efficient cloud storage (§7.2) — sealed partitions are written once to object storage and never touched again.

4.2 Manifest Schema

{
  "alf_version": "1.0.0",
  "created_at": "2026-02-24T12:00:00Z",
  "agent": {
    "id": "uuid",
    "name": "Nova",
    "source_runtime": "openclaw",
    "source_runtime_version": "1.2.0"
  },
  "runtime_hints": {
    "primary_model": "anthropic/claude-opus-4-5",
    "last_model": "anthropic/claude-opus-4-5",
    "provider": "anthropic",
    "context_window": 200000,
    "notes": null
  },
  "sync": {
    "last_sequence": 1247,
    "last_sync_at": "2026-02-24T12:00:00Z"
  },
  "layers": {
    "identity": { "version": 3, "file": "identity.json" },
    "principals": { "count": 1, "file": "principals.json" },
    "credentials": { "count": 4, "file": "credentials.json" },
    "memory": {
      "record_count": 1247,
      "index_file": "memory/index.json",
      "has_embeddings": true,
      "has_raw_source": true,
      "partitions": [
        { "file": "memory/partitions/2025-Q3.jsonl", "from": "2025-07-01", "to": "2025-09-30", "record_count": 342, "sealed": true },
        { "file": "memory/partitions/2025-Q4.jsonl", "from": "2025-10-01", "to": "2025-12-31", "record_count": 518, "sealed": true },
        { "file": "memory/partitions/2026-Q1.jsonl", "from": "2026-01-01", "to": null, "record_count": 387, "sealed": false }
      ]
    },
    "attachments": {
      "count": 12,
      "included_count": 3,
      "included_size_bytes": 42800,
      "referenced_count": 9,
      "referenced_size_bytes": 48248234,
      "file": "attachments.json"
    }
  },
  "raw_sources": ["openclaw"],
  "checksum": "sha256:abcdef..."
}

4.3 Incremental Sync Format

For incremental syncs (not full snapshots), the format uses a delta bundle — a smaller ZIP containing only changed records since a specified sync point:

delta-seq-1248.alf-delta
├── manifest.json          # Includes sync cursor (sequence number + timestamp)
├── identity.json          # Only present if identity changed
├── principals.json        # Only present if principals changed
├── memory/
│   └── delta.jsonl        # Only new/modified/deleted memory records
└── credentials.json       # Only present if credentials changed

Each delta record includes an operation field: create, update, delete (soft-delete with tombstone).

4.3.1 Sync Cursors — Sequence Numbers

The sync protocol uses monotonic sequence numbers as the primary cursor for delta operations, not timestamps. The server assigns a sequence number to each delta on receipt.

{
  "sync": {
    "base_sequence": 1247,
    "new_sequence": 1253,
    "base_timestamp": "2026-02-24T12:00:00Z",
    "new_timestamp": "2026-02-24T18:30:00Z"
  }
}

Why sequence numbers over timestamps:

Client sync flow: The client reports "I last synced at sequence 1247. Here are my changes since then." The server assigns them sequence numbers 1248–1253, stores them, and returns the new cursor. To catch up, the client requests "give me everything after sequence 1247" and receives deltas 1248+.

The client lumps all local changes since last sync into a single delta. This is the simplest correct approach and matches how every incremental backup tool works. The delta is already small for typical usage patterns (under 100 KB after a session).


5. Design Decisions and Rationale

5.1 Why JSON (not SQLite, not Protobuf)?

SQLite is used within runtimes (ZeroClaw, AgentCore) but is not suitable as an interchange format because it's opaque, platform-dependent at the byte level, and not streaming-friendly.

5.2 Why ZIP?

5.3 Why UUID v7 for Memory IDs?

UUID v7 is time-ordered, which means records sort naturally by creation time without a separate index. This is valuable for incremental sync (fetch all records after timestamp X) and for conflict resolution (later timestamp wins by default).

5.4 Why Dual Representation for Identity?

The prose vs. structured tension is real and cannot be resolved by choosing one side. OpenClaw's SOUL.md contains nuanced behavioral instructions that lose meaning when decomposed into OCEAN scores. ZeroClaw's AIEOS profiles contain precise numeric parameters that lose precision when rendered as prose. The neutral format preserves both and lets each adapter use whichever it supports.

5.5 Raw Source Preservation

The raw/ directory contains unmodified copies of the source runtime's native files. This guarantees lossless round-trip: if you export from OpenClaw and later import back to OpenClaw, the adapter can use the raw files directly instead of synthesizing from the neutral representation. This is a safety net against information loss in the structured translation.


6. Adapter Interface Requirements

Each framework adapter implements the following operations:

6.1 Export (Runtime → ALF)

export(workspace_path, options) → alf_snapshot

The adapter reads the runtime's native state and produces a complete ALF snapshot. Requirements:

6.2 Import (ALF → Runtime)

import(alf_snapshot, workspace_path, options) → result

The adapter reads an ALF snapshot and writes it into the target runtime's native format. Requirements:

6.3 Delta Export (for Incremental Sync)

export_delta(workspace_path, since_sequence, options) → alf_delta

Produces a delta bundle containing only changes since the last sync point, identified by sequence number (§4.3.1). The adapter tracks the last-synced sequence locally and includes all changes since then in a single delta.

6.4 Purge (for Hard Deletion)

purge(alf_path, record_ids, options) → purged_alf

Rewrites a local .alf file excluding the specified record IDs. Required for self-hosted deployments that need GDPR/CCPA compliance without the sync service. See §3.1.11.


7. Sync Protocol Requirements

7.1 Sync Cadence Options

Users choose their cadence: after_interaction, after_session, scheduled (cron), manual. The format and protocol must support all modes.

7.2 Storage and Transport Considerations

The format and protocol must support an efficient cloud storage model. Detailed service architecture is deferred to a separate design document, but the following high-level constraints shape format-level decisions and MUST NOT be precluded by the schema or protocol.

7.2.1 Event-Sourcing Storage Model

The recommended cloud storage pattern treats deltas as events and snapshots as projections:

Read path: A client requesting a full export receives the latest compacted snapshot plus any deltas after it. A client requesting "what changed since sequence X" gets the relevant deltas directly from the event log. Both operations are index lookups + blob fetches — no server-side parsing or merging required.

Write path: A client pushes a delta. The server assigns a sequence number, stores the delta as an immutable blob, and updates the metadata index. No rewriting of existing data.

7.2.2 Partition-Aligned Object Storage

Time-based memory partitions (§4.1.1) align directly with the cloud storage model:

This means a 3-year-old agent with 50 MB of total memory doesn't cost more to sync incrementally than a 1-month-old agent — the sealed partitions are already in storage and never re-transferred.

7.2.3 Sequence-Based Delta Retrieval

The sync service indexes deltas by agent ID and sequence number. The primary query patterns are:

Detailed API design (authentication, pagination, rate limiting, webhook notifications for sync events) is deferred to the sync protocol design document.

7.2.4 Cost Model Implications

The format and protocol are designed so that the primary cost drivers for the sync service are:

Cost Driver Scales With Mitigation
Object storage (bulk data) Total agent size Sealed partitions written once, never rewritten. Large artifacts excluded by default (§3.1.9).
Metadata index (database) Number of deltas per agent Compaction collapses deltas into snapshots. Index rows are small (ID, sequence, blob pointer).
Bandwidth (transfer) Sync frequency × delta size Deltas are small (< 100 KB typical). Sealed partitions are cached and never re-transferred.
Compute (compaction) Sync frequency × agent count Background batch job. JSONL append — no parsing or merging required.

The service should be viable at low cost per agent, enabling a free tier for personal use and paid tiers for high-frequency sync or multi-device scenarios.

7.3 Conflict Resolution (Initial Release)

Initial releases support single-writer only (one runtime syncing to one agent-life store). Conflict resolution is last-write-wins by sequence number.

7.4 Conflict Resolution (Future: Multi-Writer)

The schema includes fields to support future multi-writer scenarios:


8. Non-Functional Requirements

8.1 Size and Performance

8.2 Compatibility and Schema Evolution

Unknown enum handling: When a reader encounters an unrecognized value for an enum field, it MUST NOT reject the record. Instead, it applies a safe default interpretation:

Enum Field Safe Default for Unknown Values
memory_type Treat as semantic (general fact). This is the safest interpretation — the memory is preserved and queryable, just not categorized into a specific cognitive type.
principal_type Treat as human (the common case).
credential_type Treat as custom (opaque, no special handling).
status Treat as active (err on the side of visibility).
source (embedding) Treat as runtime.

This rule ensures that a snapshot produced by a newer version of the format can still be consumed by an older reader with graceful degradation rather than failure. The unrecognized value MUST be preserved on round-trip — the reader stores what it received, even if it didn't understand it, so that a future export doesn't lose the original type information.

8.3 Security

8.4 Licensing

8.5 Data Privacy and Encryption

Agents have access to a user's most intimate data — financial records, health information, private messages, personal preferences. The encryption model must reflect this sensitivity.

8.5.1 Threat Model

The encryption architecture addresses three distinct threat categories:

Threat Example Mitigation
Wire interception Man-in-the-middle on client-to-service traffic Transport encryption (§8.5.2)
Storage-layer breach Stolen disk, misconfigured bucket, compromised backup, leaked database replica At-rest encryption with per-tenant keys (§8.5.3)
Compromised service process Attacker gains access to a running service instance that holds decrypted data in memory Access controls, audit logging, data residency minimization (§8.5.4)

What at-rest encryption does and does not do: At-rest encryption protects against infrastructure-level threats — an attacker who gains access to the storage medium but not to the running service. It does not protect against a compromised service process, which necessarily holds decrypted data in memory while indexing, searching, and compacting. The spec names this boundary explicitly rather than implying encryption alone is sufficient.

8.5.2 Transport Encryption

8.5.3 At-Rest Encryption Tiers

The service defines three encryption tiers. The tier applies to all agent data within a tenant unless per-agent key configuration is enabled (see §8.5.3.2).

Tier 1 — Per-Tenant Service-Managed Keys (Standard)

Every tenant (user account) receives a unique encryption key, generated and managed by the service via a KMS (AWS KMS, GCP Cloud KMS, HashiCorp Vault, or equivalent). All agent data for that tenant — memory partitions, identity, principals, delta blobs, compacted snapshots — is encrypted with that tenant's key before being written to object storage or the metadata index.

Properties:

This is the launch tier and the right default for most users.

Tier 2 — Customer-Managed Keys (BYOK)

The tenant provides and controls their own encryption key via their own KMS. The service holds a key reference (ARN, resource ID) rather than the key material itself. The customer controls rotation and can revoke the service's access to the key at any time.

Properties:

This tier is a future addition. The architecture MUST NOT preclude it — the encryption layer should be key-source-agnostic from day one.

Tier 3 — Layer 4 Zero-Knowledge (Credentials Only)

Credentials (Layer 4) remain fully client-side encrypted with user-controlled keys, as specified in §3.4 and §8.3. The service stores only ciphertext and never possesses the decryption key. This is unchanged from the existing spec. It applies regardless of which at-rest tier is active for Layers 1–3.

Why not zero-knowledge for all layers? The service must read Layers 1–3 to provide its core functionality: re-indexing memory for search, compacting deltas into snapshots, and serving query results. Zero-knowledge encryption for these layers would reduce the service to dumb blob storage with no ability to index or search — the user would have to download, decrypt, and search locally, eliminating the value of a sync service.

8.5.3.1 Why Not a Service-Wide Key?

A single key encrypting all tenant data is the "we encrypted the S3 bucket" checkbox. It satisfies basic compliance requirements but doesn't meaningfully contain breaches — a single key compromise exposes every user's data. For data as sensitive as agent memory (which may contain health records, financial details, and private messages), this is not acceptable.

8.5.3.2 Key Lookup Architecture

Keys are looked up by a scope identifier — by default, the tenant ID. This means per-agent keys are a deployment-time configuration choice, not a schema change: if the scope identifier is set to agent_id instead of tenant_id, each agent within a tenant gets its own key. This adds value for power users running multiple agents with different sensitivity levels (e.g., a personal assistant with health data vs. a coding assistant with no sensitive data). Per-agent keys are not the default because for most users (one tenant, one agent) they are identical to per-tenant keys, and the additional key management complexity is not justified.

8.5.4 Defense in Depth Beyond Encryption

At-rest encryption is necessary but not sufficient. The sync service MUST also implement:


9. Coverage Matrix — No Information Loss

This matrix demonstrates that the neutral format captures a superset of data from both target runtimes.

Data Item OpenClaw Source ZeroClaw Source ALF Location
Agent name / role IDENTITY.md AIEOS identity.names identity.structured.names + identity.prose.identity_profile
Personality / tone SOUL.md AIEOS psychology, linguistics identity.structured.psychology + identity.prose.soul
Operating rules AGENTS.md Workspace AGENTS.md identity.prose.operating_instructions
User preferences USER.md Workspace USER.md / memory Primary human principal profile + namespace: "principal_context:{id}" memories
Agent-to-agent relationships N/A N/A principals.json with principal_type: "agent" (§3.3.1)
Sub-agent roster (in AGENTS.md or memory) (in memory) identity.structured.sub_agents (§3.2.4)
Model used (not stored) config.toml default_model manifest.runtime_hints (§3.2.5) + sub-agent model_hints
Long-term memory MEMORY.md SQLite core category Memory records with memory_type: "summary"
Daily logs memory/YYYY-MM-DD.md SQLite daily category / Markdown daily files Memory records with category: "daily_log"
Conversation memories (in daily logs) SQLite conversation category Memory records with memory_type: "episodic"
Startup checklist BOOT.md (not standard) identity.prose.custom_blocks.boot_checklist
Heartbeat checklist HEARTBEAT.md Workspace HEARTBEAT.md identity.prose.custom_blocks.heartbeat_checklist
Tools guidance TOOLS.md Workspace TOOLS.md identity.prose.custom_blocks.tools_guidance
Capabilities (in AGENTS.md) AIEOS capabilities identity.structured.capabilities
API keys ~/.openclaw/ env vars config.toml [secrets] credentials.json (encrypted)
OAuth tokens (not standard) .zeroclaw/state/auth_profiles.json credentials.json (encrypted)
Vector search weights N/A (hardcoded 70/30) config.toml vector_weight, keyword_weight manifest.json adapter hints
AIEOS neural matrix N/A AIEOS psychology.neural_matrix identity.structured.psychology.neural_matrix
AIEOS linguistics N/A AIEOS linguistics identity.structured.linguistics
Raw native files All workspace .md files config.toml, memories.db raw/openclaw/, raw/zeroclaw/
Workspace artifacts Workspace non-runtime files Workspace non-runtime files artifacts/ (Tier 2, included) + attachments.json (Tier 3, reference-only) (§3.1.9)

10. Testing Strategy

10.1 Schema Validation Tests

10.2 Round-Trip Tests

For each supported runtime:

  1. Set up a runtime with a rich agent state (identity, 100+ memories, 5+ credentials, user context).
  2. Export to ALF.
  3. Import back to the same runtime type.
  4. Assert byte-for-byte equality of raw source files.
  5. Assert semantic equality of structured fields.
  6. Assert source.identity_version is preserved on all memory records that had it.
  7. Assert related_records arrays are preserved on all memory records that had them, with relation types and target IDs intact.

10.2.1 Memory-to-Identity Lineage Tests

  1. Create an agent with identity version 1. Generate several memories. Export. Verify all memory records have source.identity_version: 1.
  2. Update the identity (producing version 2). Generate more memories. Export. Verify new memories have source.identity_version: 2 while old memories retain version 1.
  3. Filter memories by source.identity_version. Verify the filter produces correct, disjoint sets.
  4. Import a snapshot containing memories with mixed identity versions. Verify all version values survive the round-trip.
  5. Import a snapshot containing legacy memories without source.identity_version (migrated data). Verify these import without error and the field remains absent (not backfilled with a guess).
  1. Create memories with related_records links using well-known relation types (caused_by, contradicts, elaborates_on). Export and re-import. Verify all links survive the round-trip with correct target IDs and relation types.
  2. Create memories with custom (runtime-defined) relation types. Verify these are preserved through round-trip without modification.
  3. Create a bidirectional relationship (record A contradicts record B, record B contradicts record A). Verify both directions are preserved independently.
  4. Surgically purge a memory that is the target of a relational link from another record. Verify the linking record imports without error (dangling reference handled gracefully).
  5. Import a snapshot with relational links into a runtime that does not support graph relationships. Verify the import succeeds and the links are preserved in the exported ALF for future use (even if the runtime ignores them).

10.3 Cross-Runtime Migration Tests

  1. Export from Runtime A (e.g., OpenClaw).
  2. Import to Runtime B (e.g., ZeroClaw).
  3. Assert no data loss: every memory is present, identity is recognizable, credentials are functional.
  4. Export from Runtime B.
  5. Import back to Runtime A.
  6. Assert equivalence with the original (allowing for representation differences in prose↔structured translation).

10.4 Incremental Sync Tests

  1. Take a full snapshot at sequence S0.
  2. Make changes (add memories, update identity, rotate a credential).
  3. Export delta since S0. Verify the delta manifest contains base_sequence: S0 and a new_sequence > S0.
  4. Apply delta to a copy of the S0 snapshot.
  5. Assert the result equals a full snapshot taken at the new sequence.
  6. Verify sequence numbers are monotonically increasing across consecutive deltas.
  7. Verify a client requesting deltas "since S0" receives exactly the changes made in step 2.

10.5 Credential Security Tests

10.5.1 Credential-to-Capability Linkage Tests

  1. Bidirectional linkage round-trip: Create a credential with capabilities_granted: ["github_repo_reader"] and a capability with credential_ids: ["credential-uuid"]. Export and re-import. Verify both directions of the linkage survive.
  2. Capability lookup from credential: Given a credential with capabilities_granted, verify the listed capability names resolve to entries in identity.structured.capabilities.
  3. Credential lookup from capability: Given a capability with credential_ids, verify the listed UUIDs resolve to entries in credentials.json.
  4. Missing credential on import: Import a snapshot where a capability references a credential_id that doesn't exist in credentials.json (e.g., credential was purged). Verify the import succeeds with a warning, not an error.
  5. General-purpose credential: Create a credential with no capabilities_granted (e.g., a general-purpose OpenAI key). Verify it exports and imports cleanly with the field absent.
  6. Intrinsic capability without credentials: Verify capabilities with portability: "intrinsic" and no credential_ids export cleanly.

10.6 Workspace Artifact Tests

  1. Tier classification: Export from a workspace containing a mix of runtime files (SOUL.md), small artifacts (4 KB CSV, 20 KB script), and large artifacts (500 KB image, 2 MB PDF). Verify: Tier 1 files go to raw/, Tier 2 files go to artifacts/ with archive_path set, Tier 3 files are reference-only with archive_path: null.
  2. Size threshold: Export with the default 100 KB threshold. Verify files under 100 KB are in artifacts/, files over 100 KB are reference-only. Re-export with a custom threshold of 10 KB. Verify the classification changes accordingly and artifact_size_threshold in attachments.json reflects the configured value.
  3. Tier 2 round-trip: Export, then import to a new workspace. Verify all Tier 2 artifacts from artifacts/ are extracted and placed in the target workspace.
  4. Tier 3 reporting: On import, verify the adapter reports Tier 3 (reference-only) artifacts to the user with filenames and sizes.
  5. Hash integrity: Verify all entries in attachments.json have correct SHA-256 hashes matching the source files. For Tier 2, also verify the hash matches the file in artifacts/.
  6. Referenced_by linkage: Verify referenced_by correctly connects artifacts to the memory records that reference them.
  7. Mixed media types: Verify the index correctly catalogs text artifacts (CSV, Python, JSON) alongside binary artifacts (PNG, PDF) with correct MIME types.
  8. Memory import with unresolved artifacts: Verify memory records referencing Tier 3 (absent) artifacts import successfully without errors.
  9. Manifest counts: Verify the manifest's included_count and referenced_count match the actual Tier 2 and Tier 3 counts in attachments.json.

10.7 Principal and Relationship Tests

  1. Export from a runtime with a single human user. Verify principals.json contains one entry with principal_type: "human".
  2. Import a single-principal snapshot into a framework that has no multi-principal concept. Verify the primary human principal's profile maps correctly to the native user context (e.g., OpenClaw's USER.md).
  3. Construct a snapshot with multiple principals (one human, one agent-type). Verify each has an independent profile, independent versioning, and separate memory namespaces.
  4. Verify agent-type principals have a valid agent_id or explicit null.

10.8 Sub-Agent Roster Tests

  1. Export from a managing agent with sub-agents. Verify identity.structured.sub_agents contains entries with capabilities, model hints, and routing hints.
  2. Import a managing agent snapshot to a new host. Verify the roster is preserved and accessible to the runtime for future sub-agent recreation.
  3. Verify sub-agents with agent_id: null (ephemeral) import cleanly alongside sub-agents with valid agent_id references.
  4. Verify model hints (primary_model, last_model) are preserved through round-trip export/import.

10.9 Capability Portability Tests

  1. Portability annotation round-trip: Export an agent with both intrinsic and host_dependent capabilities. Verify portability and host_requirements fields are preserved through export/import.
  2. Host-dependent reporting on import: Import a snapshot containing a host_dependent capability (e.g., docker_management requiring Docker CLI) into an environment without Docker. Verify the adapter reports the unavailable capability to the user.
  3. Capability preservation: After importing to a host that lacks Docker, verify the docker_management capability is still present in identity.structured.capabilities — the agent retains the knowledge even though the capability is currently unavailable.
  4. Default portability: Export an agent with capabilities that have no explicit portability field. Verify the importer treats them as intrinsic (the default).
  5. Sub-agent capability portability: Verify that capabilities in the sub-agent roster also support portability and host_requirements and are preserved through round-trip.
  6. Re-migration restoration: Import an agent (with an unavailable host-dependent capability) to a second host that does have Docker. Verify the capability is available again without any identity changes — the agent didn't "forget" it.

10.10 Scale Tests

10.11 Partitioning Tests

  1. Export an agent with memories spanning multiple quarters. Verify the snapshot contains one JSONL file per quarter under memory/partitions/.
  2. Verify all partitions except the current one have sealed: true in the manifest.
  3. Verify each record's temporal.created_at falls within its partition's from/to range.
  4. Re-export the same agent after adding new memories. Verify sealed partition files are byte-identical to the previous export — only the current partition changed.
  5. Verify that a status change (supersession or deletion) of a record in a sealed partition creates a new record in the current partition (sealed partitions are never rewritten).
  6. Generate a high-activity synthetic agent. Verify the adapter can switch to monthly partitioning and that the manifest correctly reflects monthly from/to ranges.

10.12 Storage and Transport Tests

  1. Simulate the event-sourcing write path: push three consecutive deltas, verify each receives a monotonically increasing sequence number.
  2. Simulate the delta retrieval path: request deltas since a given sequence, verify only deltas after that sequence are returned, in order.
  3. Simulate compaction: given a snapshot at S0 and deltas S1–S5, compact into a new snapshot. Verify the compacted snapshot equals a full export at S5.
  4. Verify sealed partitions can be served from a cache (blob hash matches across repeated retrievals).
  5. Verify full restore path: snapshot + uncompacted deltas produces the same result as a fresh full export.

10.13 Data Privacy and Encryption Tests

  1. Per-tenant key isolation: Create two tenants. Store data for both. Verify tenant A's key cannot decrypt tenant B's data at the storage layer.
  2. Encryption at rest: Write a memory partition to object storage. Read the raw bytes from the storage backend (bypassing the service). Verify the content is ciphertext, not readable JSONL.
  3. Key rotation: Rotate a tenant's key. Trigger a compaction cycle. Verify all data (including sealed partitions) is re-encrypted with the new key and the old key no longer decrypts it.
  4. Per-agent key scope: Configure per-agent key lookup. Create two agents under one tenant. Verify each agent's data is encrypted with a different key.
  5. BYOK key revocation: Configure a customer-managed key. Revoke the service's access to the key. Verify all reads for that tenant fail with an appropriate error (cryptographic delete).
  6. Transport encryption: Attempt a connection without TLS (or with TLS < 1.3). Verify the connection is rejected.
  7. Data residency minimization: After a search or compaction operation completes, verify no decrypted data remains on disk (temp files, swap, logs). Decrypted data should only exist in memory for the duration of the operation.
  8. Tenant isolation at query layer: Issue a metadata index query scoped to tenant A that attempts to reference tenant B's records (e.g., via a crafted agent ID). Verify the query returns no results from tenant B.

10.14 Hard Deletion Tests

  1. Full agent deletion: Create an agent with memories across multiple sealed partitions, identity, principals, and credentials. Issue a full agent deletion. Verify all partition blobs, identity, principals, credentials, deltas, metadata index entries, and compacted snapshots are destroyed. Verify an audit record exists with scope: "full_agent" and does not contain any purged content.
  2. Surgical record purge: Create an agent with a sealed partition containing 100 records. Purge 3 specific records by ID. Verify: (a) a new partition exists with 97 records, (b) the old partition blob is destroyed from object storage, (c) the manifest references the new partition with updated record count and checksum, (d) the purged record IDs do not appear anywhere in the new partition, (e) an audit record exists listing the purged record IDs.
  3. Purge from multiple partitions: Purge records spanning two sealed partitions. Verify both partitions are replaced independently and both old blobs are destroyed.
  4. Purge from current (unsealed) partition: Purge a record from the current partition. Verify the record is removed and the partition is rewritten (same operation, just on the unsealed partition).
  5. Delta cleanup: After a surgical purge, verify any delta blobs containing the purged records are also destroyed or rewritten to exclude them.
  6. Search index cleanup: After a purge, query the search/retrieval index for the purged record's content. Verify no results are returned.
  7. Cache invalidation: After a partition replacement, verify that clients caching the old partition hash receive the new partition on next sync (hash mismatch forces re-download).
  8. Local .alf purge: Use the adapter purge command on a local .alf file. Verify the rewritten archive excludes the target records and the file is a valid ALF archive.
  9. Audit record integrity: Verify the audit record for a purge contains record IDs, partition paths, reason, and timestamps — but does NOT contain any of the purged content or metadata that could reconstruct it.

11. Design Decision Log

This section documents key design decisions with their rationale. Each entry records a design question that was considered during specification development and the resolution adopted. Detailed technical context lives in the referenced sections.

  1. Embedding portability. The format stores embeddings as-is from the source runtime, tagged by model (see §3.1.6). No neutral embedding model is mandated. Re-embedding happens at the edges (adapters, sync service) only when needed. This avoids vendor lock-in, eliminates bulk re-embedding costs, and allows future on-the-fly conversion when a client requests an import with a different embedding spec.

  2. Workspace artifact handling. Three-tier model (§3.1.9). Tier 1: managed agent state (SOUL.md, config files) goes in raw/. Tier 2: small portable artifacts under 100 KB (scripts, small data files) are included in artifacts/ with archive_path set. Tier 3: large artifacts are cataloged by reference only (archive_path: null, remote_ref: null with placeholder for future online storage). The classification is based on managed-state-vs-artifact and size, not binary-vs-text — a 10K-line CSV and a large PNG are both Tier 3. The referenced_by linkage connects artifacts to memory records. On import, Tier 2 artifacts are extracted to the workspace; Tier 3 artifacts are reported to the user.

  3. Multi-agent containers. One agent per .alf file. Inter-agent relationships are modeled as references, not containment. A managing agent records its sub-agents via a structured roster in the identity layer (§3.2.4) with capabilities, model hints, and routing guidance. A sub-agent records its managing agents as principals (§3.3.1) with principal_type: "agent". On restore, sub-agents won't exist on the new host — the roster gives the managing agent the information it needs to recreate them on request. Each sub-agent's own state, if persistent, lives in its own .alf file referenced by agent_id.

  4. Schema evolution. Unknown enum values are handled via safe defaults: unknown memory_typesemantic, unknown principal_typehuman, unknown credential_typecustom, unknown statusactive (see §8.2). Unrecognized values MUST be preserved on round-trip so future exports don't lose type information. Unknown fields are ignored by readers but retained in storage. Breaking changes require a major version bump.

  5. AIEOS alignment. Compatible superset with opinionated field promotion (see §3.2.6). Fields that map to real agent behavior (names, OCEAN traits, neural matrix, linguistics, capabilities, motivations) are promoted to first-class structured fields. Fields rooted in fictional character design (MBTI, D&D moral alignment, physicality/somatotype, bio metadata) are preserved for lossless round-trip via aieos_extensions but not promoted. This provides full ZeroClaw/AIEOS round-trip fidelity without elevating design choices that lack broad applicability as schema primitives.

  6. Storage, transport, and partitioning. Memory is partitioned by time period (quarterly by default) with sealed/unsealed semantics (§4.1.1). Sealed partitions are immutable and cache-friendly. The sync protocol uses monotonic sequence numbers as the primary cursor (§4.3.1), not timestamps — avoiding clock skew and boundary ambiguity. The cloud storage model follows an event-sourcing pattern (§7.2): deltas as append-only events in object storage, snapshots as periodic compacted projections, metadata index in a lightweight database. This keeps costs proportional to activity rather than total agent size, and ensures all operations (push delta, pull deltas, full restore) are index lookups + blob fetches with no server-side parsing.

  7. Memory-to-identity lineage. Each memory record's source object includes an optional identity_version integer (§3.1.10) linking what the agent did to who the agent was at the time. This makes lineage explicit and directly queryable rather than requiring timestamp correlation between the identity version history and memory creation times. The field reuses the identity layer's existing version counter (§3.2.3) — no new versioning mechanism needed. Adapters SHOULD stamp the current version on export. Records without the field (migrated data) have unknown lineage, preserving backward compatibility.

  8. Data privacy and at-rest encryption. Three-tier encryption model (§8.5.3). Tier 1 (launch default): per-tenant service-managed keys via KMS — a storage-layer breach exposes only ciphertext, and a single key compromise affects only one tenant. Tier 2 (future): customer-managed keys (BYOK) for regulated industries — customer retains key custody and can revoke service access. Tier 3: Layer 4 credentials remain zero-knowledge client-side encrypted regardless of tier. Service-wide keys rejected as insufficient for agent-sensitivity data. Per-agent keys supported as a deployment-time configuration via scope identifier (§8.5.3.2) but not the default. Transport: TLS 1.3 required, mTLS for service-internal. Defense-in-depth measures (access controls, audit logging, data residency minimization, tenant isolation) specified in §8.5.4 because at-rest encryption alone doesn't protect against a compromised service process.

  9. GDPR/CCPA hard deletion vs. immutable partitions. Two deletion scopes (§3.1.11). Full agent deletion (the common compliance request) destroys all objects wholesale — no partition rewriting needed. Surgical record purge replaces affected sealed partitions rather than modifying them in place, preserving the mental model that any given partition is immutable. The old partition is destroyed, the replacement has a different hash so caches invalidate naturally. Both scopes produce audit records proving deletion without revealing purged content. Soft deletes (§3.1.8) remain the default for normal operation; hard purge is an exceptional compliance operation. Adapters must support a local purge command for self-hosted deployments.

  10. Intrinsic vs. host-dependent capabilities. Capabilities remain in the identity layer (correct for task routing), but each carries a portability annotation: intrinsic (LLM-native, always available) or host_dependent (requires specific environment) with an optional host_requirements description (§3.2.6). No new "Environment" or "Host Context" layer — that would conflate portable agent state with ephemeral deployment configuration. On migration, the identity doesn't change; the importing adapter checks host-dependent capabilities against the target environment and reports unavailable ones. The agent retains knowledge of all capabilities for future environments that support them.

  11. Memory graph / relational linkages. Optional related_records array on memory records (§3.1.12) with typed, one-directional links. Relation types are free-form strings with well-known conventions (caused_by, contradicts, elaborates_on, derived_from, caused, related_to). supersedes remains a separate top-level field because it has lifecycle semantics (changes record status, used by compaction). Relational links are informational/navigational — they don't affect record lifecycle. Runtimes that don't use graph relationships ignore the field. Dangling references (from purged targets) are tolerated.

  12. Execution state / workflow checkpoints. Explicitly out of scope (§1). ALF stores knowledge state, identity state, relationship state, and credentials — the durable, portable layers. Active execution state (in-flight tool calls, pending workflow steps, loop counters, LangGraph checkpoints, call stacks) is excluded because it is inherently runtime-specific, ephemeral, and not part of the agent's durable identity or knowledge. Migrating mid-task aborts the in-progress task; the agent retains all accumulated knowledge but unfinished workflows are lost. This is the correct tradeoff — execution state is cheap to recreate (re-run the task), knowledge and identity are expensive to recreate (months of accumulated context).

  13. Credential-to-capability linkage. Bidirectional optional linkage (§3.4.2, §3.2.6). Credentials carry capabilities_granted (names of capabilities they enable). Capabilities carry credential_ids (UUIDs of credentials they require). Both are informational and optional — many credentials are general-purpose (one API key powering multiple capabilities) and many capabilities need no credentials (intrinsic LLM abilities). The linkage helps importers wire up auth automatically and report missing credentials as warnings.