State & Checkpoints Guide

State checkpoints let you save and restore agent data at any point. This is essential for recovering from corrupted outputs, debugging issues, and auditing what changed.

The Problem

Without checkpoints:

  • Agent writes bad data → data corrupted
  • No way to see what changed
  • No way to roll back
  • Weeks of work lost

With checkpoints:

  • Agent writes bad data
  • You identify the issue in audit logs
  • Roll back to the checkpoint before corruption
  • Continue from known-good state

Creating Checkpoints

from anchor import Anchor

anchor = Anchor(api_key="anc_...")

# Create a checkpoint before risky operations
checkpoint = anchor.checkpoints.create(
    agent_id="agent-123",
    label="before-migration"  # Optional human-readable label
)

print(f"Checkpoint ID: {checkpoint.id}")
print(f"Created: {checkpoint.created_at}")
print(f"Data entries: {checkpoint.entry_count}")

When to Checkpoint

Create checkpoints at natural boundaries:

# Before batch operations
checkpoint = anchor.checkpoints.create(agent.id, label="pre-batch-import")
try:
    for item in large_dataset:
        anchor.data.write(agent.id, item.key, item.value)
except Exception:
    anchor.checkpoints.restore(agent.id, checkpoint.id)
    raise

# Before deployments
checkpoint = anchor.checkpoints.create(agent.id, label="pre-deploy-v2.1")
deploy_new_agent_version()

# Daily automatic checkpoints
checkpoint = anchor.checkpoints.create(agent.id, label=f"daily-{date.today()}")

Rolling Back

# Roll back to a specific checkpoint
result = anchor.checkpoints.restore(
    agent_id="agent-123",
    checkpoint_id="chk_abc123"
)

print(f"Rolled back to: {result.checkpoint_id}")
print(f"Entries restored: {result.entries_restored}")
print(f"Entries removed: {result.entries_removed}")

Safe Operations Pattern

def safe_batch_operation(agent_id: str, items: list):
    # 1. Create checkpoint
    checkpoint = anchor.checkpoints.create(agent_id, label="pre-batch")

    try:
        # 2. Perform operation
        for item in items:
            anchor.data.write(agent_id, item.key, item.value)

        # 3. Verify results
        if not verify_data_integrity(agent_id):
            raise ValueError("Data integrity check failed")

    except Exception as e:
        # 4. Rollback on failure
        anchor.checkpoints.restore(agent_id, checkpoint.id)
        raise RuntimeError(f"Batch failed, rolled back: {e}")

Comparing Checkpoints

# List checkpoints
checkpoints = anchor.checkpoints.list(agent.id)

for cp in checkpoints:
    print(f"{cp.id}: {cp.label} ({cp.created_at})")
    # Get checkpoint details to compare
    details = anchor.checkpoints.get(agent.id, cp.id)

Best Practices

  • Checkpoint before risky operations (batch imports, deployments, experiments)
  • Use descriptive labels (e.g., "pre-deploy-v2.1" not "backup")
  • Set up retention policies to avoid keeping checkpoints forever
  • Verify after rollback to ensure it worked correctly
  • Document rollbacks in audit logs

For more details, see the Checkpoints API reference.