Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity
Joel Becker, Nate Rush, Elizabeth Barnes, David Rein
Stop extrapolating from student studies or greenfield tasks. Demand productivity metrics from developers with multi-year tenure on the same codebase. Use this RCT design—random assignment within familiar projects—as your benchmark for evaluating AI tooling ROI.
AI coding assistants promise productivity gains, but evidence from experienced developers working on real codebases—not toy problems—remains scarce.
Method: 16 developers with 5 years average experience on their projects completed 246 tasks in mature open-source repos, randomly assigned to allow or block early-2025 AI tools (GPT-4 class). This is the first RCT measuring AI impact on developers working in their own production codebases, not synthetic benchmarks. The study isolates tool effect from developer skill and project familiarity.
Caveats: Only 16 developers, all open-source contributors. Enterprise codebases with stricter review processes may show different patterns.
Reflections: How does AI impact vary between bug fixes, feature additions, and refactoring tasks in mature codebases? · Do productivity gains persist after 6+ months of continuous AI tool usage? · What's the effect on code review burden when AI-assisted code enters production?