For months, the leadership at our company has been glowing about how artificial intelligence has transformed our engineering output. Every all-hands meeting includes a slide about how Cursor and Claude Code have accelerated our frontend development. The general feeling among the team was that we had started writing code faster. But that was just a feeling — I wanted either confirmation or refutation. So I decided to run a real measurement on our actual projects to see if the ai coding productivity gains were as dramatic as everyone assumed.

Why Gut Feelings About AI Coding Productivity Are Not Enough
When a new tool arrives and everyone around you raves about it, the natural tendency is to believe the hype. Management gave the entire frontend team corporate access to both Cursor and Claude Code. Developers praised the autocomplete suggestions and the ability to generate entire functions from a single comment. The excitement was genuine.
But enthusiasm and hard data are two different things. I noticed that nobody had actually produced any proof that adopting these tools had translated into more useful code shipped to users. The stories were anecdotal. The metrics were missing. Without measurement, the conversation around ai coding productivity remains stuck in the realm of opinion rather than evidence.
So I designed a simple experiment. I would look at every one of our seven projects and examine the output of the five developers who actively use AI in their daily work. The goal was straightforward: compare the volume of code written before and after AI tools became mainstream in our workflow.
Setting Up the Measurement Framework
Choosing the Metric
I knew from the start that measuring developer productivity is notoriously tricky. Lines of code is a crude metric. It does not capture code quality, maintainability, or the complexity of the problems being solved. Yet for this specific question — are we writing more code now than before AI? — it serves as a reasonable starting point.
I decided to simply take the amount of code written per developer per day. The number is imperfect, but it is objective. If the ai coding productivity narrative is correct, we should see a clear upward trend in code output over time.
Defining the Time Periods
To get a meaningful comparison, I split the data into four six-month windows stretching from April 2024 through March 2026:
- Period 1 (Baseline): April 2024 to September 2024. This was the ChatGPT 3.5 to 4o era. If anyone used AI for coding at all, it was only for isolated snippets in a chat window. No one was running Claude Code from a terminal or relying on Cursor for autocomplete.
- Period 2: October 2024 to March 2025. Developers started chatting with LLMs more frequently, but the tools were still basic. Models were significantly less capable than what we have today.
- Period 3: April 2025 to September 2025. Cursor and Claude Code entered the picture. Usage was growing but not yet universal across the team.
- Period 4 (AI Peak): October 2025 to March 2026. Most developers actively used Cursor and Claude Code every day. This is the period management points to when they celebrate our AI transformation.
Digging further back than April 2024 did not seem worthwhile. Before that point, AI-assisted coding was essentially nonexistent in our workflow.
The Surprising Spoiler: Two Developers Showed Zero Change
Before diving into the data preparation steps, I will share one finding that shaped everything else. Among the five developers who use AI in their work, two showed absolutely no movement in their code output over the entire two-year span. They wrote the same amount of code per day in the baseline period as they did in the most recent AI-heavy period.
Make of that what you will, but the fact stands. If you give someone a microwave and they spend less time heating food, that does not mean they will start eating more. Maybe these two developers simply rest more now while AI writes the code for them. Or maybe code-generation speed was never the bottleneck in their work in the first place. Their productivity constraints lie elsewhere — in architecture decisions, in understanding business requirements, or in debugging complex interactions.
This finding alone should give any engineering leader pause before declaring AI a universal productivity booster. For some team members, the gains may be real. For others, the tool makes no measurable difference at all.
Accounting for Team Growth
Our frontend team has grown in headcount over the last two years. If we simply compared total code output across periods, any increase could be attributed to having more developers rather than to AI tools. To control for this, I restricted the comparison to only those developers who have been at the company for the full two years and who definitely use AI in their work.
We also have employees who are lukewarm on AI and use it sparingly. They were excluded from the calculation entirely. The final cohort consisted of five developers who met both criteria.
For the metric, I used the average amount of code per working day for the whole group. This normalized the data against holidays, sick days, and variations in team size.
Cleaning the Data: Removing the Noise
Raw Git history is messy. Before any meaningful comparison could happen, I needed to remove several categories of code that would distort the results.
Auto-Generated Files
Files like package-lock.json, test mocks, vendored branding dependencies, and similar auto-generated content had to be excluded. These files are not written by developers in any meaningful sense. Including them would inflate the numbers artificially, especially in periods where dependency updates were frequent.
Unit Tests
There is a real observation that many has started writing more unit tests via AI. In fact, that is now the only way we write them. Tests are great, of course, but what matters to the end user is product code, not test coverage. For a clean experiment focused on user-facing output, tests needed to come out of the comparison. I kept a separate dataset with tests included for reference, but the primary analysis excluded them.
Documentation and Configuration Files
We have also started writing more documentation with plans for AI, plus various instructions and skill files. Since this text does not ship to the user, I excluded it as well. The same applied to configuration files that change frequently but do not represent meaningful engineering work.
Noise Commits
All sorts of garbage needed to be removed: commits that apply new linter or formatter rules, accidental inclusions of auto-generated code, mass code moves, chains of reverts, and similar junk. The goal was to get a reasonably clean dataset that reflects genuine development effort.
The Tool: Git Insight Plugin for JetBrains
For all of this data preparation and analysis, I used the Git Insight plugin for JetBrains IDEs. This tool can do everything described above. It provides detailed statistics on code contributions per developer, per time period, and per file type.
The plugin makes it easy to filter out unwanted file extensions, exclude specific folders, and identify suspiciously large files that might indicate auto-generated content. Without a tool like this, manually cleaning two years of Git history across seven projects would be impractical.
Step-by-Step Data Preparation
Step 1: Merge Feature Branches
The first step was to merge all the big feature branches into a single integration branch. Our project uses a branching strategy where developers work on separate feature branches for weeks at a time. Comparing raw commit counts across branches would be meaningless because the same work might appear across multiple branches before being merged.
By merging everything into one integration branch, I ensured that each piece of code was counted exactly once. The Git Insight plugin handles this process cleanly.
Step 2: Remove Unwanted File Extensions
Next, I configured the plugin to exclude file extensions that correspond to auto-generated content. This included .lock files, .min.js files, generated CSS bundles, and similar artifacts. The plugin allows removing these extensions directly from the context menu with a right-click.
Step 3: Identify and Exclude Suspiciously Large Files
Some files in our repository were clearly not written by humans. One example was a 12,000-line configuration file that had been copied from an external library. Another was a generated TypeScript type definition file that spanned over 8,000 lines. These files can be removed from the same context menu, not one by one but by glob patterns, which saves significant time.
Step 4: Exclude Specific Folders
Folders containing vendored dependencies, generated icons, and third-party assets were excluded. The Git Insight plugin allows folder-level exclusion, which is essential when dealing with monorepo structures where different projects coexist in the same repository.
The Results: What the Numbers Actually Showed
After all the cleaning and filtering, I had a dataset that compared code output across four six-month periods for five developers. The results were not what management expected.
For the three developers who did show changes, the increase in code output from Period 1 to Period 4 was noticeable but not revolutionary. The average daily code output rose by roughly 37% across these three individuals. That is a meaningful gain, but it falls far short of the 2x or 3x improvements that some AI proponents claim.
For the two developers who showed no change, their output remained flat across all four periods. Their productivity appears to be constrained by factors that AI tools do not address — perhaps they spend more time in code review, or they work on particularly complex features that require extensive planning before any code is written.
When averaged across all five developers, the overall gain in code output was about 22% from the baseline period to the most recent AI-heavy period. That is a positive number, but it is modest. It suggests that AI tools are helping, but they are not transforming our engineering output in the dramatic way that the all-hands presentations suggest.
You may also enjoy reading: Online Radiology Tech Degrees: Compare Cost, Duration & Flexibility.
Why the Gains Might Be Smaller Than Expected
There are several possible explanations for why the ai coding productivity gains in many were modest rather than revolutionary.
Cognitive Bottlenecks Beyond Code Generation
Writing code is only one part of a developer’s job. Understanding requirements, designing architectures, debugging complex issues, and reviewing pull requests all take time. AI tools like Cursor and Claude Code are excellent at generating code, but they do not help much with these other activities. If a developer spends 60% of their day on tasks other than writing new code, even a 2x improvement in code generation speed translates to only a modest overall productivity gain.
The Quality Cost of Generated Code
Not all code is created equal. AI-generated code often requires more review and modification than hand-written code. Developers on many reported spending extra time verifying that AI suggestions were correct, especially for complex business logic. This review overhead eats into the time saved during generation.
Diminishing Returns Over Time
The initial excitement of using AI tools may have worn off. Early adopters saw dramatic improvements because they were learning how to prompt effectively and discovering which tasks the tools handle well. After months of usage, the novelty fades, and the gains stabilize at a lower level.
What This Means for Engineering Leaders
If you are a manager or executive considering investing in AI coding tools, this experiment offers several lessons.
First, measure before you celebrate. The fact that developers feel more productive does not mean they are shipping more code. Gut feelings are unreliable, especially when everyone around you is enthusiastic.
Second, account for individual differences. Some developers will benefit enormously from AI tools. Others will see no change at all. Blanket policies and company-wide metrics can obscure this variation.
Third, look beyond code volume. The real value of AI tools may not be in writing more code, but in freeing developers to focus on harder problems. If a developer uses AI to handle boilerplate and then spends the saved time on architecture or code review, the productivity gain is real even if the line count does not increase.
Practical Steps for Measuring AI Coding Productivity in Your Team
If you want to run a similar experiment in your organization, here is a practical workflow based on what I learned.
Start by defining your cohort. Only include developers who have been on the team long enough to have a baseline period without AI tools. Exclude anyone who joined recently, because you have no pre-AI data for them.
Choose your metric carefully. Lines of code is a starting point, but consider also tracking pull request cycle time, bug rates, or feature completion velocity. No single metric tells the whole story.
Clean your data ruthlessly. Auto-generated files, tests, documentation, and noise commits will distort your results if you do not remove them. Invest the time upfront to get a clean dataset.
Use a tool like Git Insight or similar Git analytics software. Manual analysis across multiple repositories and time periods is too error-prone to be reliable.
Compare like with like. If your team has grown, normalize by developer count. If some developers work part-time, normalize by working days. The goal is to isolate the effect of AI tools from all other variables.
Be honest about the results. If the numbers show modest gains, accept that. The data is telling you something important about where AI fits into your workflow.
The Broader Lesson About AI in Software Development
This experiment taught me that ai coding productivity is real but nuanced. The tools are genuinely useful. They save time on repetitive tasks, reduce the friction of writing boilerplate, and help developers explore solutions faster. But they are not magic.
The narrative that AI will make developers 10 times more productive is marketing, not measurement. The reality is closer to a 20 to 30 percent improvement for most developers, with some seeing no improvement at all. That is still valuable, but it is not the revolution that the headlines describe.
Perhaps the most important insight is that productivity is a system property, not a tool property. A faster code generator does not help if the bottleneck is elsewhere in the development process — in unclear requirements, in slow build times, in complex deployment pipelines, or in coordination between team members.
AI tools are a welcome addition to the developer toolkit. They make certain tasks easier and faster. But they do not solve the fundamental challenges of building good software. Those challenges remain human ones.






