I gave up on TDD years ago. AI coding agents made me a believer

Hi, my name is Ilia. I'm a solopreneur building apps for the Microsoft Teams Marketplace — the biggest one is Perfect Wiki, a knowledge base that lives inside Teams. The whole portfolio sits at almost $400K ARR, and the team is just two people: me on product and code, and a customer success manager handling support.

Working alone with AI agents like Claude Code is amazing — until you try to ship something serious in a niche ecosystem. Microsoft Teams is not the most popular environment in the world. The SDKs I touch every day (teamsjs, MS Graph API, the Teams toolkit) are not the kind of thing an LLM has seen a million times on GitHub. The agent guesses. It hallucinates. It "fixes" things that weren't broken. You end up babysitting it more than you'd like.

The old TDD I abandoned

Years ago, before LLMs ate the world, I got excited about Test-Driven Development. The pitch sounded perfect for a small team: define your business entities upfront, write tests for them, then build a clean app that's automatically covered with tests.

I tried. And honestly? It was boring. Writing a wall of tests before any code exists is not fun. There's no dopamine. The product isn't moving. I gave up after a few weeks and only used TDD for occasional gnarly modules — pricing logic, permission checks, that kind of thing.

What changed when agents arrived

Prompting an AI agent well is hard. You think you've described the task precisely — the agent disagrees, asks three clarifying questions, then "fixes" something else along the way. Every interruption is a context switch. Context switches kill solo founders.

So I asked myself a simple question: what if I could define the input, the output, and the definition of "done" so tightly that the agent has nowhere to drift?

That's TDD. Tests are exactly that contract.

My flow with Claude

Here's the loop I use now, almost daily:

Write the prompt as a list of tests. I describe every edge case I can think of in plain language, then I ask the agent: "What other edge cases should I cover for this feature? List them before writing anything."
Ask the agent to implement only the tests. No production code yet. I read every test carefully and tweak the assertions. This is the only step where I actually need to think hard. If the tests are right, the rest is mechanical.
Tell the agent: "Now write the code to make all tests green." Then I let it run. For hours, sometimes.

What I get back is code I can actually trust. Reviewing it is fast — the tests are the spec, and they're already passing. There are no surprises hiding three folders deep.

It's not just for backends

The biggest surprise: this works just as well on the frontend. With Playwright, my "tests" become end-to-end user flows — and those flows are a fantastic abstraction for telling an agent what the app should actually do.

An example prompt I'd write for a Teams page:

Write a Playwright test for the "share knowledge base" flow:
- User opens the wiki inside MS Teams
- Clicks "Share" in the top right
- A modal appears with a public link toggle
- Toggling it on reveals a copy-to-clipboard button
- Clicking copy puts a valid https://read.perfectwiki.xyz/... URL in the clipboard
- Closing the modal and reopening it remembers the toggle state

That's a complete spec. The agent has nowhere to be creative. And once those tests pass, I have a real, browser-verified feature — not a vibe.

The bonus: a regression net you didn't have to budget for

Six months in, my Teams app has more test coverage than anything I've ever shipped solo. Adding a new feature is no longer scary because I trust the existing surface. Refactoring is no longer a Friday-afternoon mistake.

I didn't sit down to "invest in test coverage." I just used TDD as a prompting technique, and the coverage was a free side effect.

The cons (because there are some)

The biggest one: test infrastructure has a real upfront cost. Auth flows, database fixtures, mocking the Microsoft Graph API, spinning up a Teams-like environment for Playwright — none of that is free. You'll spend a weekend on it before you see any payoff.

But once it's done, it's done. Every future feature pays you back. And the agent itself can help you build the test infrastructure — same loop, applied to the harness.

Wrapping up

LLM-assisted coding is genuinely a different craft from what we had two years ago. But some old ideas — ideas that were "correct but not fun" back then — get a second life now that there's a tireless agent willing to do the boring half.

TDD was always the right answer. We just needed a coworker who actually enjoyed writing the implementation.

If you're a solo dev or a tiny team trying to ship in a niche stack with AI agents — try this loop for one feature this week. Write the tests in the prompt. Let the agent build them. Then let it loose on the implementation. I think you'll be surprised how much trust you get back.