The Bug That Took a Week to Find (Fixed by Changing One Number)

1 AM on a Tuesday. I’m staring at a blank response for the forty-seventh time.

My AI model runs fine on easy questions. “Summarize this” — perfect. “Format this list” — no problem. But the moment I ask it to actually think — compare two options, weigh trade-offs, recommend something — it hands me nothing. Empty string. Zero characters.

No error message. No crash. The API returns HTTP 200: “success.” Content field: empty.

I’ve been debugging this for five days.

The rabbit hole

Day one: I assumed my prompt was broken. Rewrote it. Simplified it. Tried ten versions. Made it longer, shorter, more explicit, less explicit. All blank.

Day two: I thought the model was corrupted. Reinstalled it. Restarted everything. Cleared caches. Prayed to no one in particular.

Day three: I sent the same prompt to a cloud model. Worked perfectly. Sent it back to my local model. Blank.

So the prompt was fine. The model should have been capable. Something in between was broken.

Day four: I read every open GitHub issue for the model, the framework, and the API wrapper. Nothing matched my symptoms. A friend asked what I was working on. “The same bug,” I said. “For the fourth day.”

Day five. Past midnight. The screen is too bright and my eyes hurt. I’m scrolling through documentation I’ve already read three times, the kind of slow, desperate reading where you’re hoping the words rearrange themselves into something new. One sentence catches my eye: “Reasoning models consume tokens during chain-of-thought processing.”

Wait. What?

The debugging timeline: Day 1 rewrote prompts, Day 2 reinstalled, Day 3 cloud test, Day 4 GitHub issues, Day 5 found it.

The $10 grocery trip

Here’s what was happening.

Newer AI models — the “reasoning” ones like qwen3 — don’t just spit out answers. They think first. Like a person who pauses, considers options, reasons through the problem internally before speaking.

That internal thinking generates hidden text you never see, but it counts against the same word budget as the answer.

I’d given the model a budget of 200 words. It was using all 200 on thinking. Zero left for the actual answer.

Token budget: with 200 tokens, thinking consumes everything and the answer is empty. With 800, plenty left for the answer.

It’s like giving someone $10 to buy groceries, but they spend $10 on gas getting to the store. They arrived. Mission accomplished. Cart is empty.

The fix

I changed one number. 200 → 800.

The fix: max_tokens 200 crossed out, replaced with 800.

Everything worked. Immediately. Every prompt that had been returning blank for five days now came back with clear, detailed answers.

Five days. Rewritten prompts. Reinstalled software. A full evening reading GitHub issues. Midnight documentation sessions. All because one configuration value was too small.

Why it hides so well

This bug is evil because the system doesn’t know anything is wrong.

The API returns “success.” Because technically, it is. The model received the prompt, processed it within the budget, and returned the result. The result just happened to be empty.

And it’s intermittent. Easy questions use maybe 50 tokens thinking, leaving 150 for an answer — you get a short but working response. Hard questions use all 200 on thinking, leaving zero — blank.

So the bug appears to depend on the question, which sends you chasing the wrong problem entirely.

Easy question: 50 tokens thinking, 150 left for answer — works. Hard question: 200 tokens thinking, 0 left — blank.

I was debugging prompts for days when the prompts were never the problem.

The pattern

I’ve learned something about the worst bugs. They’re never the complicated ones. The complicated ones announce themselves — stack traces, error messages, crashes. You know something is broken and roughly where to look.

The worst bugs are the ones where everything says “fine.” Green lights. Success codes. Clean logs. And somewhere, one number is slightly too small, and the system quietly produces nothing instead of something, and tells you it did a great job.

The system never showed an error because technically there wasn’t one.

Green lights, empty cart: dashboard showing all green checks next to an empty output box.

Now, any time I get a blank response from an AI model, I check the token budget before I touch anything else. It’s a thirty-second check. It would have saved me a week.

Ever spent days on a bug that turned out to be one setting? I collect these stories. Share yours — mo@fadaly.net.