Built on Grounded Commitment Learning principles.

Aegis

Infrastructure Layer for Autonomous Agents

Aegis provides durability, verification, and policy enforcement beneath agent frameworks. It treats agent commitments as first-class objects with explicit failure modes and recovery strategies.

This is not a "better LangChain." Agent frameworks handle orchestration—what should the agent do next? Aegis handles infrastructure—how do we ensure commitments are kept across failures? You can run a LangGraph workflow on Aegis to gain durability and verification that LangGraph doesn't provide natively.

Event-sourced stateGCL commitmentsPolicy gateway303 tests

The Gap

Workflow Engines (Temporal, Prefect)

State survives restarts. Tasks retry. Execution history is clear. But no agent-specific abstractions: no commitments, no policy evaluation before tool invocation, no multi-agent coordination primitives.

Agent Frameworks (LangChain, AutoGPT)

Tool calling, memory, planning. But state is ephemeral—restart the process and you lose everything. No checkpoint/replay, no formal verification of what the agent promised, no structured recovery.

What's Missing

Neither treats agent commitments as first-class objects. When an agent says "I will complete this task by 5pm," that's a string in a conversation. No mechanism to verify fulfillment, detect violation, or recover gracefully.

Architecture

Requests flow top-to-bottom; events flow bottom-to-top. The state machine coordinates: receives events, applies transitions, triggers checkpoints, notifies listeners.

Core Components

1. Event-Sourced State

State derives from an append-only event log. Each user message, LLM response, tool invocation, and commitment update is an event. Current state computes by replaying events from the last checkpoint.

aegis/core/state.py
1class AgentState(BaseModel):
2    model_config = ConfigDict(frozen=True)
3    
4    def with_message(self, message: Message) -> AgentState:
5        return self.model_copy(
6            update={
7                "conversation_history": (*self.conversation_history, message),
8                "version": self.version + 1,
9            }
10        )

Trade-off: Storage grows with event count; replay adds latency on restore. Configurable checkpoint intervals mitigate this—checkpoint every N transitions or M seconds, whichever comes first.

2. Commitments as First-Class Objects

Commitments use the GCL 5-tuple:(debtor, creditor, action, condition, deadline). The condition field contains an evaluable expression, not a description.

aegis/commitments/models.py
1class RuntimeCommitment(BaseModel):
2    model_config = ConfigDict(frozen=True)
3    
4    debtor: str      # Who made the commitment
5    creditor: str    # Who receives the commitment
6    action: str      # What was committed
7    condition: str   # Evaluable expression: "task_complete AND error_count == 0"
8    deadline: datetime | None
9    status: CommitmentStatus  # CREATED → ACTIVE → FULFILLED/VIOLATED/CANCELLED

This enables: verification (check if condition holds), violation detection (deadline passed, condition failed), recovery (select and execute strategy), audit (track commitment lifecycle).

3. Policy Enforcement at the Gateway

Policy enforces at invocation time, not planning time. The gateway sees actual arguments—an agent might plan to "read a file" but the actual path could be /etc/passwd.

aegis/tools/policy.py
1class PolicyRule(BaseModel):
2    name: str
3    action: PolicyAction  # ALLOW, DENY, REQUIRE_APPROVAL
4    tool_pattern: str     # Glob: "file_*", "web_search"
5    argument_conditions: dict[str, Any]  # {"path": {"not_contains": "/etc"}}

Trade-off: Can't prevent the agent from wasting tokens planning a disallowed action. The cost of a rejected tool call is low compared to the security benefit.

GCL Integration

GCL provides the theoretical foundation; Aegis provides the runtime. The GCL 5-tuple maps directly to RuntimeCommitment:

GCL ConceptAegis Implementation
Debtorcommitment.debtor (agent ID)
Creditorcommitment.creditor (user/system ID)
Actioncommitment.action (string)
Conditioncommitment.condition (evaluable expression)
Deadlinecommitment.deadline (datetime)

When GCL isn't installed, Aegis falls back to its own expression evaluator. Supports basic comparisons, logical operators, and membership tests. Unsafe expressions (function calls, imports, attribute access) are rejected.

Validation

303 tests covering state machine transitions, checkpoint integrity, policy evaluation edge cases, and multi-agent message ordering. Property-based tests via Hypothesis verify state serialization round-trips.

core/

State, checkpoint, replay: 42 tests

state_machine/

Transitions, validation: 28 tests

tools/

Gateway, policy, auth: 48 tests

commitments/

GCL integration, verification: 32 tests

llm/

Client, streaming, tool execution: 52 tests

recovery/

Violation detection, strategies: 26 tests

Limitations

Single-node only

State stores locally. Distributed coordination (multiple agents across nodes, shared state) requires a distributed event log (Kafka, Redis Streams) and consensus for checkpoint coordination. Not implemented.

No content-aware policy

Constitutional AI principles check metadata, not content. Evaluating whether a response "contains harmful content" requires an external classifier.

Synchronous recovery

Recovery strategies execute synchronously. Long-running recoveries (waiting for human approval) block the agent. Async recovery with callbacks is not yet implemented.

No performance benchmarks

Checkpoint latency, message throughput, and policy evaluation overhead have not been systematically measured under realistic workloads.