Arun Baby

330 technical articles · 18 peer-reviewed publications · Samsung Research · IIT Madras

Deep-Dives by Topic

AI Agents

85 articles

Autonomous systems, multi-agent orchestration, tool use, memory, and production deployment.

AI Security

36 articles

Red teaming, prompt injection, agent attack surfaces, adversarial audio, and governance.

Speech Tech

70 articles

ASR, TTS, speech separation, voice enhancement, and real-time audio processing.

ML System Design

79 articles

Production ML architecture, serving, feature stores, evaluation, and inference patterns.

DSA

60 articles

Data structures and algorithms with detailed solutions and complexity analysis.

Thoughts

26 posts

Short reflections on health, longevity, focus, and the science of living well.

---

Greatest Hits

Hand-picked articles across speech tech, AI agents, and ML systems.

Paperclip: the org chart your AI agents are missing

AI Agents

Multi-agent coordination fails without governance. Paperclip models agent systems as companies with org charts, budgets, and audit trails.

TraceR1: planning before moving

AI Agents

Adobe's RL framework trains agents to forecast the full trajectory before taking the first action -- 8-40% gains over reactive baselines on long-horizon tasks.

Prompt injection is a structural attack

AI Security

Prompt injection isn't a content moderation problem. It's a structural consequence of how LLMs process tokens. Here's what actually works.

Gemma 4: three architectural decisions that changed what a small model can do

ML System Design

Hybrid attention, native multimodality, and a 26B MoE variant at 12% compute cost -- the decisions behind Gemma 4's architecture.

VibeVoice: how ultra-low frame rate tokenization solved 90-minute TTS

Speech Tech

7.5 Hz tokenization, 80x compression over Encodec, four distinct voices, 90 minutes. Why frame rate is the core constraint in long-form speech synthesis.

---

Building something with AI agents or speech tech?

I help startups and teams ship production AI systems -- from architecture reviews and advisory engagements to hands-on fractional CTO work. I also help businesses go AI-native with agentic workflows and agent orchestration.

Discuss Your Project

---

Research Publications

Peer-reviewed work at international speech technology venues. Google Scholar profile →

---

Books

Long-form writing on building production AI systems.

AI Agent Orchestration

Architecture, patterns, and production lessons for multi-agent systems — beyond single-agent demos.

All Books →

---

Thoughts

Short reflections on experience, perspective, and learning.

---

Arun Baby

Deep-Dives by Topic

AI Agents

AI Security

Speech Tech

ML System Design

DSA

Thoughts

Greatest Hits

Paperclip: the org chart your AI agents are missing

TraceR1: planning before moving

Prompt injection is a structural attack

Gemma 4: three architectural decisions that changed what a small model can do

VibeVoice: how ultra-low frame rate tokenization solved 90-minute TTS

Building something with AI agents or speech tech?

Research Publications

Multilingual ASR with Improved Language Identification for Indic Languages

Speaker Personalization for ASR using Weight-Decomposed Low-Rank Adaptation

Deep Learning for Phonetic Segmentation in Indian Language TTS

Robust Speaker Personalisation Using Generalized Low-Rank Adaptation for ASR

Books

AI Agent Orchestration

Thoughts

ClawGuard: moving prompt injection defense from the prompt layer to the runtime

The dark factory era: what engineering looks like when 95% of code is not typed

Engineering Mental Health: Evidence Over Intuition

Stay in the Loop