Part of my research on robust evaluation of adaptive systems.
Grounded Commitment Learning
Coordination Without Shared Semantics
Natural language coordination assumes shared semantics—that "complete the task" means the same thing to all agents. This assumption fails with semantic drift: agents with different training, architectures, or even the same agent over time may interpret identical phrases differently.
MARL addresses this by learning coordination without language. GCL takes a different approach: meaning is grounded in verifiable behavior. A commitment's meaning is defined not by how agents interpret it, but by what observable outcomes count as success or failure.
Alignment connection: This addresses a core challenge in scalable oversight—how do you verify agent behavior when you can't inspect internal states? By grounding commitments in observable actions rather than stated intentions, GCL provides behavioral verification that doesn't require interpretability of internal representations.
The Punishment Paradox
Counterintuitive finding: Increasing consequences for commitment violations decreases cooperation. This is the opposite of what traditional game theory predicts.
Why It Happens: Retaliation Cascades
High consequences trigger retaliation cascades: penalties cause counter-defection, which spreads through the population. The correlation is strong: r = -0.951, p < 0.001.
Statistical Validation
- • No consequences vs full: t = 36.18, p < 0.001
- • Effect size: Cohen's d = 9.34
- • Monotonic decrease across all 5 levels
- • n = 30 seeds per condition
Redemption Resolves the Paradox
The solution: add a redemption pathway that allows agents to recover from failures. This maintains incentives while reducing the fear that prevents commitment-making.
How Redemption Works
- 1.Failed agents can attempt recovery actions
- 2.Successful recovery reduces permanent reputation damage
- 3.Effort costs prevent gaming
- 4.Order effects controlled via eligibility snapshots
Hart-Moore Validation (Experiment 21)
GCL connects to Hart-Moore incomplete contract theory from economics (Nobel Prize 2016). Experiment 21 validates all four theoretical predictions.
Prediction 1: Complete Contracts Enable Investment
t = 28.37, d = 7.33Agents with complete contracts invested 73% more than those with incomplete contracts.
Prediction 2: GCL Approaches Complete Contract Benefits
t = 15.72, d = 4.06GCL agents invested 47% more than incomplete-contract agents—capturing 64% of complete-contract benefit.
Prediction 3: Incomplete Contracts Enable Hold-ups
t = 25.22, d = 6.51Incomplete-contract environments showed 4.2× more hold-up incidents.
Prediction 4: GCL Reduces Hold-up Vulnerability
t = 10.38, d = 2.68GCL reduced hold-up incidents by 36.8% (95% CI: [28.4%, 45.2%]).
Coordination Scaling (Experiment 23)
Coordination efficiency degrades logarithmically with population size (R² = 0.88, p = 0.0017), suggesting a Dunbar-like coordination limit around 100 agents.
Scaling Results
- • ~100 agents: Efficiency drops to 50% of maximum
- • Messages: Grow at 0.08 per agent (sublinear)
- • Task concentration: Gini increases 0.35 → 0.98 as population grows
Network Topology
- • Clustering: Low global density (0.12)
- • Small-world: Not detected (coefficient 0.89)
- • Structure: Hub-and-spoke topology emerges
Clarification: Task concentration (Gini) measures how tasks distribute across agents—larger populations concentrate tasks on fewer high-reputation agents. This differs from role specialization (HHI < 0.02 in Experiments 33-34), which measures whether agents focus on specific task types. The two can diverge: an agent may handle many tasks without specializing in any particular type.
Emergent Network Properties
GCL populations self-organize into structured networks without explicit coordination rules. Four properties emerge consistently across 1200+ runs (all p < 0.001):
Protocol Convergence
Agents converge on shared commitment templates without central coordination.
82.3% reduction in protocol diversity by episode 50. Convergence accelerates with population size (R² = 0.91).
Sparse Trust Networks
Agents form hub-and-spoke topologies rather than dense meshes.
Clustering coefficient = 0.699 (high local clustering). Global density remains low (0.12), enabling efficient coordination.
Task Concentration
High-capability agents attract disproportionate task volume.
Gini coefficient = 0.745 for task distribution. Top 20% of agents handle 68% of tasks—a natural consequence of reputation-weighted selection.
Efficiency Improvement
Coordination efficiency improves over time without parameter tuning.
26.5% improvement in task completion rate from episode 1 to episode 100. Improvement rate correlates with template sharing (r = 0.73).
Note on measurement: Task concentration (Gini = 0.745) measures how tasks distribute across agents. This differs from role specialization (HHI), which measures whether agents focus on specific task types. High task concentration can occur without role specialization—agents may handle many tasks across diverse types.
The GCL Framework
Why This Formalization?
Traditional multi-agent coordination assumes agents share semantic understanding. GCL replaces this assumption with verifiable behavioral contracts:
- • Trigger (τ): When does this commitment activate? Removes ambiguity about scope.
- • Action (a): What behavior is promised? Observable, not interpretive.
- • Verification (φ): How do we know it succeeded? Third-party verifiable.
- • Failures (F): What can go wrong, and what happens then? Enumerated, not implicit.
- • Stake (σ): What does the agent risk? Skin in the game.
Formal Definition
A grounded commitment is a 5-tuple:
- — trigger predicate
- — action function
- — verification predicate
- — failure modes with stakes and remediations
- — stake (reputation at risk)
1[COMMITMENT]
2ISSUER: Agent_A
3TRIGGER: Task requires capability X
4BEHAVIOR: Complete subtask within 3 rounds
5SUCCESS: Subtask verified complete
6FAILURES:
7 - IF timeout THEN stake_loss=0.5, REMEDIATION: delegate
8 - IF capability_mismatch THEN stake_loss=0.2, REMEDIATION: escalate
9 - IF resource_exhaustion THEN stake_loss=0.3, REMEDIATION: request_resources
10CONFIDENCE: 85%
11STAKE: 1.0
12[/COMMITMENT]Key Insight: Failure-First Design
Unlike traditional contracts that specify success conditions, GCL commitments enumerate failure modes. Success is the complement of all failure conditions. This design choice enables auditability: auditors know exactly what to check, and agents cannot claim success by exploiting undefined edge cases.
Commitment-Grounded Learning
Agents learn what commitments to make via reinforcement learning. The policy maps states to commitment portfolios, optimizing for expected value minus stake risk:
1class GroundedCommitmentLearner:
2 """Agent that learns what commitments to make via reinforcement learning.
3
4 Key insight: Agents don't need shared understanding, just shared consequences.
5 """
6
7 def __init__(self, capabilities, stake_budget):
8 self.capabilities = capabilities
9 self.stake_budget = stake_budget
10 self.reputation = ReputationTracker()
11 self.template_library = TemplateHierarchy()
12
13 def propose_commitment(self, task, context):
14 """Policy maps states to commitment portfolios."""
15 capability_match = self.assess_capability(task)
16 observability = self.assess_verifiability(task)
17
18 if capability_match < 0.5 or observability < 0.3:
19 return None # Refuse rather than risk failure
20
21 failure_modes = self.enumerate_failures(task)
22
23 return Commitment(
24 issuer=self.id,
25 trigger=task.trigger,
26 behavior=task.required_behavior,
27 success=task.success_condition,
28 failures=failure_modes,
29 confidence=capability_match * observability,
30 stake=self.calculate_stake(expected_value, risk)
31 )Template Sharing (Experiment 24)
GCL agents learn commitment templates—reusable patterns for common task types. Template sharing accelerates coordination by transferring learned patterns between agents, reducing the cold-start problem that limits MARL approaches.
Cognitive inspiration: Templates draw from analogical learning in human cognition—the ability to recognize structural similarities across situations and transfer solutions accordingly. When humans learn that "promising to deliver X by deadline Y" works in one context, they apply that pattern to novel contexts without re-learning from scratch. GCL templates formalize this: agents encode successful commitment structures as reusable schemas, enabling rapid generalization to new task types. This parallels work on structure mapping (Gentner, 1983) and case-based reasoning(Kolodner, 1992) in cognitive science.
Connection to core findings: Template sharing amplifies the self-selection advantage. When high-capability agents share templates with lower-capability agents, the recipients gain access to proven commitment patterns without needing to discover them through trial and error. This reduces inequality while improving overall coordination.
Experimental Conditions
- • No Sharing: Templates never transfer (baseline)
- • Random Sharing: Random pairs exchange at 10% rate
- • Directed Sharing: High → low capability transfer
- • Mutual Sharing: Bidirectional exchange
Results
- • Directed sharing: +17% cooperation (t = 4.5, p = 0.001)
- • Bottom quartile improves 34% faster than top quartile
- • Inequality reduced: Gini 0.35 → 0.25
- • Random sharing: +8% (less effective than directed)
Implication: Template sharing provides a mechanism for capability transfer without revealing private agent information. High-capability agents sharewhat works without exposing why they can execute it—preserving the information asymmetry that makes self-selection effective.
Implications for AI Safety
GCL provides a foundation for verifiable multi-agent coordination with properties relevant to scalable oversight and multi-agent alignment:
Auditability
Every agent action traces to a specific commitment with enumerated failure modes.
GCL mechanism: Failure-first design means auditors know exactly what to check. Agents cannot claim success by exploiting undefined edge cases—all failure modes are pre-specified.
Accountability
Failures have defined consequences. Agents stake reputation on every commitment.
GCL mechanism: The stake parameter (σ) creates skin in the game. Agents that make unreliable commitments lose reputation and future coordination opportunities—a self-enforcing accountability mechanism.
Alignment
Value-consistent commitments can be verified. The framework supports constraints on allowable commitments.
GCL mechanism: Commitment templates can encode policy constraints. Agents can only make commitments that match approved templates—enabling constitutional AI-style guardrails at the coordination layer.
Connection to Scalable Oversight
GCL addresses a key challenge in scalable oversight: how do you verify coordination between agents you cannot fully observe? By requiring agents to pre-specify failure modes and stake reputation, GCL makes coordination auditable without omniscience. Overseers check commitment logs and stake transfers rather than attempting to interpret agent reasoning.
The Core Insight
GCL dissolves rather than solves the interpretation problem. Agents don't need shared understanding, just shared consequences. This provides a principled foundation for multi-agent AI coordination that is verifiable, auditable, and aligned—without requiring that we solve the harder problem of ensuring agents share our semantic representations.
Self-Selection vs. External Assignment
Core finding: Self-selection outperforms oracle matching by 81% (p < 0.001), even with effort controlled. The primary mechanism is information asymmetry—agents possess private self-knowledge that external coordinators cannot access.
Controlled experiments isolate information and effort effects. Agents have privileged access to their own capabilities—information that external coordinators cannot observe, regardless of how much observable data they collect.
Mechanism Decomposition
Agents possess private information about task-agent fit that external coordinators cannot observe. Effect: +0.239 cooperation (self-select: 0.534 vs oracle: 0.295, effort fixed at 0.8).
Agents that self-select exert higher effort (0.904 vs 0.776 for assigned agents). This is emergent behavior, not a designed parameter. Effect: +0.074 additional cooperation.
Why Self-Selection Outperforms Oracle Assignment
Even when an external coordinator has complete information about observable agent capabilities, agents retain private information about:
- →Internal state: Current capacity, resource availability, and readiness that affect task performance
- →Task-specific fit: Aspects of capability alignment that are not captured by general metrics
- →Unobservable capabilities: Information that is difficult or impossible to externalize to a coordinator
Experimental Design
- • Self-select vs oracle assignment conditions
- • Fixed effort (0.8) vs emergent effort conditions
- • Oracle has complete observable information
- • 1200+ independent runs, bootstrap CIs
Statistical Validation
- • Cohen's d = 4.05 (large effect)
- • p < 10-72
- • Power = 1.0
- • Mechanism decomposition via controlled comparison
Design Implications
Coordination systems should leverage agent self-assessment rather than relying on external assignment, even when the external assigner has complete observable information. This principle underlies GCL's commitment-based approach: agents select commitments based on their private assessment of task-agent fit, and the framework provides mechanisms for verifiable execution without requiring capability disclosure.
Gaming-Resistant Reputation Mechanisms
Key finding: Reputation visibility without anti-gaming mechanisms induces strategic task selection. Difficulty-weighted reputation reduces gaming behavior by 59.8% (t = 8.42, p < 0.001, d = 2.17).
Agents that observe their own reputation scores exhibit strategic behavior—selecting tasks that maximize reputation gain rather than coordination value. Four reputation visibility conditions characterize this effect.
Gaming Behavior by Condition
- ✗Naive visible (45% gaming rate): Agents select low-difficulty tasks to inflate scores
- ○Blind reputation (12% gaming rate): No visibility eliminates strategic selection but limits coordination
Anti-Gaming Mechanisms
- ✓Difficulty-weighted (18% gaming rate): Reputation gains normalized by task complexity
- ★Social + difficulty (15% gaming rate): Peer observation combined with difficulty weighting
The Self-Selection Requirement
These experiments revealed a critical constraint: external task assignment degrades coordination. This finding aligns with the information asymmetry mechanism—agents possess private information about task-agent fit that external assigners cannot access.
Design implication: Coordination structures must use voluntary task selection (get_volunteers()) rather than external assignment (assign_task()).
This preserves the information advantage that self-selection provides while enabling reputation-based coordination.
Statistical Validation
Gaming reduction: t = 8.42, p < 0.001
Effect size: Cohen's d = 2.17
Mechanism
Difficulty weighting normalizes reputation gains by task complexity, removing the incentive for easy-task selection.
Applications
Applicable to multi-agent systems with reputation, trust-based coordination, and any system where agents can observe their own scores.
GCL vs. Multi-Agent RL
Key finding: GCL achieves 25-50× better sample efficiency than MARL while maintaining 97% of MARL's coordination quality. In cold-start or non-stationary environments, this efficiency advantage dominates.
GCL converges in 2 episodes; MARL requires 52-102 episodes to reach equivalent coordination. This difference stems from GCL's use of agent self-knowledge—information that MARL must learn through trial and error.
Sample Efficiency
Coordination Quality
Non-Stationary Environments
GCL's advantage grows in non-stationary environments. When environment parameters change, MARL must relearn; GCL adapts immediately via agent self-knowledge:
Performance delta vs. MARL across environment change frequencies.
When to Use Each Approach
GCL is preferred when:
- • Cold-start coordination (no training data)
- • Non-stationary environments
- • Privacy-preserving systems (agents keep capabilities private)
- • Sample efficiency is critical
MARL may be better when:
- • Training time is available
- • Environment is stable
- • Maximum performance is required (3% gap matters)
- • Agent capabilities are fully observable
Limitations
Simulation Environment
Results validated in controlled simulations. Real-world deployment may introduce additional factors (network latency, partial observability, adversarial agents) not captured in current experiments.
Task Complexity
Experimental tasks are simplified relative to production multi-agent systems. Commitment verification in complex, multi-step tasks may require additional mechanisms not yet validated.
Scaling Bounds
Coordination overhead suggests hierarchical structures for populations exceeding ~100 agents. Current experiments validate flat coordination; hierarchical GCL remains future work.
Information Asymmetry Bounds
The 75/25 information/effort decomposition is specific to our experimental conditions. Ratios may vary with task structure, agent architecture, and capability observability.
Ongoing Work
The findings above represent validated results from 39 experiments with 1200+ independent runs. Current focus areas:
Papers in Preparation
- →Information Asymmetry in Multi-Agent Coordination
In preparation for peer review
- →Self-Selection vs. Optimal Assignment: A Mechanism Design Analysis
In preparation for peer review
Future Directions
- •Quantifying information asymmetry bounds across agent architectures
- •Hierarchical GCL for populations > 100 agents
- •Real-world deployment validation
- •Integration with constitutional AI approaches
Open Questions
The self-selection advantage raises a deeper question: what did agents learn to do that external matching couldn't capture?
- ?Meta-policy over exploration strategies
Hypothesis: Self-selecting agents don't just learn which tasks to take—they learn how to explore the task space. The 75% information asymmetry advantage may encode a meta-policy that adapts exploration based on private capability signals. External assignment can't replicate this because it lacks access to the agent's internal uncertainty estimates.
- ?Commitment templates as geometric operations
Can template learning be characterized as geometric transformations on a pre-linguistic capability manifold? If so, template sharing may be transferring operationsrather than knowledge—a distinction with implications for how we think about capability transfer in AI systems.
- ?Emergent coordination primitives
What computational primitives underlie the emergent specialization we observe? The clustering coefficient (0.699) suggests agents discover coordination structures that weren't designed in. Understanding these primitives could inform how we design multi-agent AI systems that scale gracefully.
Explore the Implementation
Interested in discussing this work? Email me