5 Ways AI-First Software Delivery Balances Innovation & Prac

Prev Article Next Article

In the fast-moving world of software development, teams constantly chase the next breakthrough while keeping their feet planted in realistic, working practices. This tension between innovation and practicality is especially sharp when teams adopt an AI-first approach to delivering software. One experienced practitioner, a technical partner at Thoughtworks currently working with a large U.S. state government, recently shared how his team balances these forces. They are building a knowledge graph from regulations using a deep research agent, leveraging tools like Claude Sonnet 4.5 and Cursor. The project reveals five concrete ways that ai first software delivery can harmonize ambitious experiments with the steady discipline required for production systems. Below, we explore each approach with real-world nuance and actionable takeaways.

ai first software delivery

Way 1: Align Your AI Approach with Code Longevity and Verification Capabilities

Not every software project is the same. Some codebases are written once and rarely touched again. Others evolve every week for years. Yet many teams treat AI-assisted coding as a one-size-fits-all decision. That is a mistake. The first way to balance innovation and practice is to choose your AI strategy based on two critical axes: how long the code is expected to live and how easily you can verify its correctness automatically.

Let us examine these axes more closely. Code longevity refers to whether the code you are writing is temporary — perhaps for a prototype, a one-time migration script, or a quick experiment — or whether it will be maintained by multiple developers over months or years. Automated verification describes the level of testing, linting, and contract checks you have in place. If you have robust automated checks, you can trust AI agents to generate more code with less supervision because mistakes are caught quickly. If verification is weak, you need tighter human oversight.

The Thoughtworks team working on the state knowledge graph project treats these axes as a two-by-two matrix. When both code longevity is low and automated verification is high, they might let an AI agent write larger blocks of code with minimal human intervention. When longevity is high but verification is still developing, they keep the human developer deeply engaged in every AI suggestion. This matrix prevents either reckless autonomy or overcautious micromanagement. According to Wes Reisz, the technical principal on the project, this framework helps him answer a question he gets constantly: “Why aren’t you using a full multi-agentic unsupervised approach?” The answer is that the specific project requires a supervised approach because the code must be maintainable for years and the existing test suite, while good, is not yet comprehensive enough to catch all AI-generated mistakes.

This selection process is the essence of balancing innovation — letting AI write code quickly — with practice — ensuring that code is safe and sustainable. The same ai first software delivery principle applies to any team. Before adopting any coding agent, evaluate the lifespan of the code you are generating and the strength of your automated checks. Then decide how much freedom to grant the tool.

Way 2: Partner with LLMs Using Structured Frameworks Like RIPER-5

Innovative AI tools often fail in practice because developers treat them as black boxes. They prompt once, accept the output, and move on. That is a recipe for inconsistent results. The second way to balance innovation and practice is to use a structured partnership framework with large language models (LLMs). One such framework is RIPER-5, a method developed within Thoughtworks that assigns the LLM a specific role in a conversation rather than treating it as an oracle.

RIPER-5 stands for five phases: Research, Investigate, Plan, Execute, and Review. The “5” indicates that each phase follows a structured cycle. In practice, the team using ai first software delivery on the knowledge graph project does not ask Claude Sonnet to write code in one shot. Instead, they first ask it to research the problem, then investigate possible solutions, then plan the implementation, then execute the code, and finally review the result together with the human developer. Each phase involves a focused prompt that puts the LLM in a “partnering mode” rather than a “delegate mode.”

Here is how it works on a concrete level. When the team needs to implement a new module for ingesting a regulation rule, they start with the Research phase. They ask the LLM: “Explain the typical pattern for parsing regulatory text in this domain, including edge cases like exceptions and appendices.” Once the LLM provides that research, the human developer and the AI discuss it together, often in a pair programming session. The human may refine the prompts or point out missing constraints. Then they move to Investigate, where the LLM outlines two or three different implementation strategies with trade-offs. The Plan phase produces a step-by-step sequence of changes. Execute generates the actual code. Review checks the code against the original requirements and the project’s coding standards.

This structured collaboration prevents the common trap of fully automating code generation and only checking it at pull request time. By working side by side with the LLM in real time, the team catches conceptual errors early and retains deep understanding of the codebase. The innovation of using a powerful model like Claude Sonnet 4.5 is balanced by the practical discipline of a repeatable process. Anyone can adopt RIPER-5 or a similar framework to make their ai first software delivery more deliberate and effective.

Way 3: Amplify Engineering Discipline — Do Not Replace It

A persistent myth is that AI will eventually make software engineers obsolete. The opposite is true in practice. AI does not replace engineering discipline; it amplifies it — for better or worse. If your team has weak testing practices, unclear requirements, or sloppy coding standards, AI will magnify those weaknesses. It will generate code that looks plausible but fails in subtle ways, and without good verification, those failures will reach production faster. Conversely, if your team has strong practices, AI will help you produce high-quality code more rapidly.

This is the third way to balance innovation and practice: invest in engineering rigor before you invest in AI tooling. On the state government project, the team practiced continuous delivery, pair programming, and thorough code reviews long before they added Cursor and Claude Sonnet. Those habits did not go away when AI entered the picture. Instead, the AI became an extra pair of hands that could write boilerplate, suggest tests, and catch typos instantly. The human developers still own the design decisions, the architecture, and the verification of correctness.

Consider a specific challenge: building a knowledge graph from complex state regulations. The rules are heterogeneous, sometimes contradictory, and updated irregularly. An AI might attempt to map relationships incorrectly. But because the team had already established a pattern of breaking work into small, verifiable increments and writing integration tests for each piece, they could safely let the AI generate candidate code and then quickly verify it against a known set of test cases. The discipline was already there; AI just made it faster.

Industry data supports this perspective. According to IDC, by 2029, 26% of worldwide IT spend — roughly $1.3 trillion — will be on agentic AI. That staggering investment will only yield returns if organizations combine it with solid engineering fundamentals. The most innovative AI adoption strategies are those that first shore up testing, deployment, and monitoring pipelines. Without those foundations, the promise of ai first software delivery turns into a liability.

Way 4: Shift AI Left Across the Entire Software Delivery Lifecycle

Many teams limit AI usage to code generation. They prompt an agent to write a function, then proceed with the rest of their workflow manually. That is a missed opportunity. The fourth way to balance innovation and practice is to “shift AI left” — that is, apply AI as early as possible in every stage of the software delivery lifecycle (SDLC), from planning and design through testing and deployment. Doing so maximizes the innovative impact while keeping practical constraints visible from the start.

You may also enjoy reading: 7 Hidden IT Problems Quietly Creating Massive Risk.

On the knowledge graph project, the team applies AI not just to coding but also to requirements analysis, test case generation, and documentation. When they ingest a new set of regulations, they first use the LLM to extract the key entities and relationships, producing a draft knowledge graph schema. That schema is reviewed by domain experts before any code is written. Then, as they code, the AI suggests test cases based on the same regulatory text. This early and continuous involvement of AI means that potential design flaws are caught before they are baked into hundreds of lines of code.

Shifting left also changes the role of the developer. Instead of being a code monkey, the developer becomes a curator and critic of AI-generated artifacts. That is a more innovative, high-value role. But it only works because the team has practical guardrails: they always run automated regression tests after any AI-generated change, and they never merge code without a human approving both the logic and the test coverage.

This approach also addresses a hidden challenge: domain knowledge. The team is working with a state client’s infrastructure and regulations, which they did not write. Their understanding of the domain grew as the project progressed. By using AI to generate early drafts of schemas, tests, and even deployment scripts, they could learn the domain faster and reduce costly rework. The innovation of AI-assisted exploration was balanced by the practical step of always keeping a human in the loop for domain-critical decisions.

Way 5: Prefer Supervised Coding Agents Over Fully Unsupervised Multi-Agent Systems

The allure of a fully autonomous multi-agent system — where multiple AI agents chat with each other to design, code, and test without human oversight — is strong. It promises extreme innovation and speed. But in most real-world enterprise settings, it is impractical. The fifth way to balance innovation and practice is to use supervised coding agents, at least until your team has built up trust and verification capabilities. On the state government project, the team explicitly chose a supervised approach with a single coding agent (Claude Sonnet 4.5 via Cursor) rather than an unsupervised multi-agent swarm.

Why? Because the cost of errors in a regulated environment is high. A misinterpreted regulation could lead to incorrect system behavior affecting citizens. A fully unsupervised agent might generate code that looks correct but subtly misrepresents the legal text. With a supervised agent, the human developer is present in every prompt and every output. They can catch misunderstandings immediately, redirect the agent, and ensure the final code aligns with the actual regulatory intent.

This does not mean multi-agent systems are never useful. They shine in well-understood domains with strong automated verification, such as generating unit tests for a mature codebase or proposing deployment configurations. But on this project, where domain knowledge was still being built and verification was still maturing, supervision was the responsible choice. As the team’s automated test suite grows and their familiarity with the domain deepens, they may gradually allow more autonomy. But they start with a supervised model to keep practical control

This balanced approach is supported by industry trends. The same IDC report projects that agentic AI will account for $1.3 trillion by 2029, but the most successful implementations will be those that phase in autonomy gradually. A supervised coding agent allows teams to learn the quirks and failure modes of LLMs without exposing production systems to unacceptable risk. Innovation happens through controlled experiments, not by handing over the keys.

Putting It All Together: A Practical Checklist

These five ways are not theoretical. They are the result of real decisions made by a team of 16 developers building real software under real constraints. If you are starting your own ai first software delivery journey, consider this summary checklist:

Map your project on the axes of code longevity and automated verification to decide how much autonomy to grant AI agents.
Adopt a structured partnership framework like RIPER-5 — research, investigate, plan, execute, review — to keep the LLM in a collaborative rather than a dictatorial role.
Strengthen your testing, coding standards, and deployment pipelines before introducing AI tools, because AI will amplify whatever practices you already have.
Shift AI left into planning, design, and testing phases, not just coding, to catch issues early and accelerate learning.
Prefer supervised coding agents when domain knowledge is still growing or verification is incomplete; scale autonomy only as confidence and safeguards increase.

Balancing innovation and practice is never a one-time decision. It is a continuous adjustment as your team, your tools, and your understanding of the problem evolve. By applying these five ways, you can enjoy the speed and creativity of AI while retaining the control and reliability that software delivery demands.