About
I build the evaluation infrastructure that keeps advanced AI aligned — from chain-of-thought faithfulness to multi-agent governance, from research prototypes to production systems.
Current Projects (2026)
- CoT Faithfulness Evaluator: Is LLM reasoning actually driving outputs or post-hoc rationalization?
- Inter-Query Attack Detector: Detecting harmful intent spread across multiple innocent-looking queries
- LLM Judge Bias Profiler: Systematic bias probing and de-biasing tools for LLM-as-judge systems
- Deceptive Alignment Detection Suite: Measuring when models behave differently in deployment vs. evaluation
Research Focus
I focus on practical evaluation tools that bridge the gap between AI evaluation research and real-world deployment. My work addresses:
- Behavioral anomaly detection in production AI systems
- Capability gap measurement between benchmarks and reality
- Unlearning verification and red-teaming
- Multi-agent alignment in collaborative AI systems