Medium LLM Jun 29, 2026

I Spent 3 Months Using Extended Thinking in Production. Here’s When It Actually Helps.

reasoning

What happened

A retrospective on using extended thinking/reasoning models (like OpenAI's o1/o3-mini series) in production for three months, exploring where the extra token cost is justified and where it degrades performance. (Note: Content is a short stub).

Why it matters

Helps builders avoid wasting token budgets on tasks where standard models perform equally well or better than reasoning models.

The take

Reasoning models are transforming agentic workflows, but they are expensive and slow. Knowing the exact boundary where extended thinking fails or adds negative value is highly valuable for production PMs.

Do this

Carefully benchmark reasoning models against standard models on your specific task to ensure the latency and cost of 'extended thinking' yield a statistically significant accuracy gain.

Read the source →

Don't read this site daily. Get it in your inbox.

The daily brief and Sunday deep dive — distilled, scored, and opinionated. For builders only.