OpenTelemetry | Chandu's Canvas

When Your System Lies in Complete Sentences: Observing LLMs in Production

Picture this scenario. Your AI-powered feature is running. Every metric you watch looks clean. Error rate: flat. P99 latency: within SLO. HTTP 200s across the board. Your on-call engineer has nothing to page about. But for the last three days, the model has been producing subtly wrong output. Not wrong in a way that crashes anything. Not wrong in a way that fires a single alert. The responses are fluent, confident, and structurally perfect. They just happen to be incorrect in

Chandra Sekar Reddy

May 248 min read