Built on Grounded Commitment Learning principles.
Aegis
Infrastructure Layer for Autonomous Agents
Aegis provides durability, verification, and policy enforcement beneath agent frameworks. It treats agent commitments as first-class objects with explicit failure modes and recovery strategies.
This is not a "better LangChain." Agent frameworks handle orchestration—what should the agent do next? Aegis handles infrastructure—how do we ensure commitments are kept across failures? You can run a LangGraph workflow on Aegis to gain durability and verification that LangGraph doesn't provide natively.
The Gap
Workflow Engines (Temporal, Prefect)
State survives restarts. Tasks retry. Execution history is clear. But no agent-specific abstractions: no commitments, no policy evaluation before tool invocation, no multi-agent coordination primitives.
Agent Frameworks (LangChain, AutoGPT)
Tool calling, memory, planning. But state is ephemeral—restart the process and you lose everything. No checkpoint/replay, no formal verification of what the agent promised, no structured recovery.
What's Missing
Neither treats agent commitments as first-class objects. When an agent says "I will complete this task by 5pm," that's a string in a conversation. No mechanism to verify fulfillment, detect violation, or recover gracefully.
Architecture
Requests flow top-to-bottom; events flow bottom-to-top. The state machine coordinates: receives events, applies transitions, triggers checkpoints, notifies listeners.
Core Components
1. Event-Sourced State
State derives from an append-only event log. Each user message, LLM response, tool invocation, and commitment update is an event. Current state computes by replaying events from the last checkpoint.
1class AgentState(BaseModel):
2 model_config = ConfigDict(frozen=True)
3
4 def with_message(self, message: Message) -> AgentState:
5 return self.model_copy(
6 update={
7 "conversation_history": (*self.conversation_history, message),
8 "version": self.version + 1,
9 }
10 )Trade-off: Storage grows with event count; replay adds latency on restore. Configurable checkpoint intervals mitigate this—checkpoint every N transitions or M seconds, whichever comes first.
2. Commitments as First-Class Objects
Commitments use the GCL 5-tuple:(debtor, creditor, action, condition, deadline). The condition field contains an evaluable expression, not a description.
1class RuntimeCommitment(BaseModel):
2 model_config = ConfigDict(frozen=True)
3
4 debtor: str # Who made the commitment
5 creditor: str # Who receives the commitment
6 action: str # What was committed
7 condition: str # Evaluable expression: "task_complete AND error_count == 0"
8 deadline: datetime | None
9 status: CommitmentStatus # CREATED → ACTIVE → FULFILLED/VIOLATED/CANCELLEDThis enables: verification (check if condition holds), violation detection (deadline passed, condition failed), recovery (select and execute strategy), audit (track commitment lifecycle).
3. Policy Enforcement at the Gateway
Policy enforces at invocation time, not planning time. The gateway sees actual arguments—an agent might plan to "read a file" but the actual path could be /etc/passwd.
1class PolicyRule(BaseModel):
2 name: str
3 action: PolicyAction # ALLOW, DENY, REQUIRE_APPROVAL
4 tool_pattern: str # Glob: "file_*", "web_search"
5 argument_conditions: dict[str, Any] # {"path": {"not_contains": "/etc"}}Trade-off: Can't prevent the agent from wasting tokens planning a disallowed action. The cost of a rejected tool call is low compared to the security benefit.
GCL Integration
GCL provides the theoretical foundation; Aegis provides the runtime. The GCL 5-tuple maps directly to RuntimeCommitment:
| GCL Concept | Aegis Implementation |
|---|---|
| Debtor | commitment.debtor (agent ID) |
| Creditor | commitment.creditor (user/system ID) |
| Action | commitment.action (string) |
| Condition | commitment.condition (evaluable expression) |
| Deadline | commitment.deadline (datetime) |
When GCL isn't installed, Aegis falls back to its own expression evaluator. Supports basic comparisons, logical operators, and membership tests. Unsafe expressions (function calls, imports, attribute access) are rejected.
Validation
303 tests covering state machine transitions, checkpoint integrity, policy evaluation edge cases, and multi-agent message ordering. Property-based tests via Hypothesis verify state serialization round-trips.
core/
State, checkpoint, replay: 42 tests
state_machine/
Transitions, validation: 28 tests
tools/
Gateway, policy, auth: 48 tests
commitments/
GCL integration, verification: 32 tests
llm/
Client, streaming, tool execution: 52 tests
recovery/
Violation detection, strategies: 26 tests
Limitations
Single-node only
State stores locally. Distributed coordination (multiple agents across nodes, shared state) requires a distributed event log (Kafka, Redis Streams) and consensus for checkpoint coordination. Not implemented.
No content-aware policy
Constitutional AI principles check metadata, not content. Evaluating whether a response "contains harmful content" requires an external classifier.
Synchronous recovery
Recovery strategies execute synchronously. Long-running recoveries (waiting for human approval) block the agent. Async recovery with callbacks is not yet implemented.
No performance benchmarks
Checkpoint latency, message throughput, and policy evaluation overhead have not been systematically measured under realistic workloads.