ML Research

Building AI systems that are safe, reliable, and genuinely useful

My research focuses on the intersection of frontier ML capabilities and production safety requirements. I work on continual learning (how models adapt without forgetting), safe RAG systems (how to ground LLM outputs in verified sources), and the infrastructure needed to deploy AI reliably in high-stakes environments.

🧠

Continual Learning: Nested Learning Extension

Research Implementation • November 2025
+89% at high regularizationPareto-dominantSingle-day implementation

Extended Google's NeurIPS 2025 Nested Learning paper with bidirectional knowledge bridges — enabling direct cross-scale communication between non-adjacent optimization timescales.

What I Extended

  • 3 → 5 optimization timescales with geometric 5× progression (mirroring brain oscillation patterns)
  • 2 → 9 knowledge bridges including 5 non-adjacent connections that prevent information bottlenecks
  • Gated adaptive transfer with gradient-surprise detection for selective knowledge consolidation

Key Result: +89% at High Regularization

At regularization strength λ=5.0, where the baseline collapses to ~10% accuracy, the bidirectional bridges maintain ~19% accuracy — an 89% improvement. This matters because high regularization is exactly where you need continual learning to work: when preserving prior knowledge is critical.

Why It Matters for AI Safety

Catastrophic forgetting is a fundamental challenge for safe AI deployment. As models are updated with new capabilities or fine-tuned for specific tasks, they risk losing safety-critical behaviors learned during alignment. Clinical AI systems need to learn new protocols without forgetting established safety constraints. This work demonstrates that architectural innovations can help preserve important behaviors while enabling continued learning — a key requirement for maintaining alignment in deployed systems.

🛡️

Safety-Critical RAG Systems

Production Research • High-Stakes Domains
Hallucination MitigationEvidence-Required PromptingConfidence Calibration

In high-stakes domains like healthcare and legal, incorrect AI responses aren't just annoying — they're dangerous. My work on safety-critical RAG focuses on making retrieval-augmented generation reliable enough for professional decision support.

Hallucination Mitigation

LLMs confidently generate plausible-sounding but incorrect information. I implement multi-stage verification: retrieval confidence scoring, source attribution requirements, and explicit "I don't know" responses when evidence is insufficient.

Evaluation Methods

Standard RAG metrics (recall@k, MRR) don't capture safety. I develop domain-specific evaluation: factual accuracy against source documents, harmful omission detection, and calibration of confidence scores.

Evidence-Required Prompting

Every response must cite specific retrieved passages. The system refuses to answer if retrieved evidence doesn't meet confidence thresholds — prioritizing safety over helpfulness.

Hybrid Retrieval

Domain-specific queries require both semantic understanding and exact terminology matching. I combine dense embeddings with BM25 sparse retrieval via reciprocal rank fusion.

Research Philosophy: Safety Over Helpfulness

The system is designed to say "I don't know" rather than risk providing incorrect information. This conservative approach — explicit confidence thresholds, mandatory source attribution, refusal to extrapolate beyond retrieved evidence — is essential for deployment in high-stakes domains.

🤖

Agentic AI & Multi-Agent Orchestration

Systems Research • 2024
Multi-Agent CoordinationProduction LLM SystemsAsync Workflow Orchestration

Built a production multi-agent orchestration platform to deeply understand the challenges of coordinating autonomous AI agents — LLM reliability at scale, async workflow management, and the infrastructure required for agentic systems.

What I Built

  • ~15,000+ lines of TypeScript across backend (NestJS), web (Next.js), and mobile (React Native)
  • Hybrid model routing — DistilBERT for fast sentiment analysis, GPT-4 for complex content generation
  • Production job queue with priority scheduling, exponential backoff, and real-time WebSocket updates
  • A/B testing infrastructure for AI-generated content with statistical significance tracking

What I Learned

Production LLM systems are 20% prompt engineering and 80% infrastructure. Error handling, retries, cost management, and observability are where the real complexity lives.

Hybrid model architectures work well in practice. Route intelligently based on task requirements — fast models for high-volume tasks, powerful models for complex generation.

Real-time feedback transforms user experience. WebSocket updates showing live progress make async AI tasks feel responsive and trustworthy.

Why It's Relevant for Labs

Agentic AI is moving from research demos to production systems. The challenges I solved — LLM reliability, async coordination, cost optimization, multi-model routing — are exactly what labs need as they deploy agent-based systems at scale. This hands-on experience with production agentic infrastructure is directly applicable to frontier AI development.

Research Philosophy

🔬

Reproducibility & Rigor

Every result should be reproducible. I reproduce baselines before extending, design ablations to isolate causal drivers, and prioritize engineering clarity.

🛡️

Safety as Design Constraint

Safety isn't an afterthought — it's a design requirement. Better to refuse than to be wrong. Explicit uncertainty quantification. Evaluate reliability under failure modes.

⚙️

Production Quality

Research code that can ship. Comprehensive testing, proper error handling, real-world reliability. The gap between demo and deployment is where most research fails.

🎯

Alignment-Aware Development

Building systems that remain helpful, harmless, and honest. Understanding how models behave under distribution shift, adversarial inputs, and edge cases is essential for safe deployment.

The most important technical challenge of our time is building AI systems that are genuinely beneficial. This requires both advancing capabilities and ensuring those capabilities are deployed safely. My work sits at this intersection — implementing frontier research with the engineering discipline required for high-stakes environments.

Interested in Collaborating?

I'm always interested in discussing frontier ML research, production AI systems, and the challenges of making research work in the real world.