The Death of Code Review (Again)
Why do we need Code Review?
Code review has many roles in modern software engineering, to quote a few:
- Quality: It's a peer review process (an engineering (A) review of the code of the engineer (B)), which allows an increase in quality (Built-in quality, a lean principle).
- Consistency: Consistency is established by following a uniform architecture vision, common standards, team agreements, and can be a collective call or centralized by an architect.
- Confidence: Code review helps increase our confidence that we can release software, and we will not break the system, service, app, or whatever we are building.
- Look for Architecture / Design Gaps
- Look for missing functionality or business requirements gaps
- Look for corner cases
- Look for missing Observability
- Look for poor error handling
- Look for poor coding practices and anti-patterns
- Look for missing tests or not enough testing diversity
Branches and Fake CI
Since 2009, I've been a strong advocate against branches. Branches are bad because they kill real continuous integration. I talked about that in another post from 2023: The Death of CI/CD. The logic is very simple: if all engineers are working on separate branches, then when you run a CI job in Jenkins, your code is not there because it has not merged into the develop or main yet. Like I mentioned in the 2023 post, release trains are also a very bad practice. Only when the release cycle happens does the code get fully merged and integrated - thats where all the problems appear.
Problems get to go under the rug and get secretly hidden until you try to release, and the release is problematic and never goes well, why? Because of the branches and the lack of real CI. That effect makes code review weaker as well because you are not reviewing the whole story only small bits at a time, and I wrote about that as well in a 2022 post: Beyond Code Deltas. Where I always preferred to do the code review out of the cycle and not related to a Pull Request, but instead reading the whole code base so I can grasp the big picture.
AI Disruption Force: From Copilot to Coding Agents
Back to 2026. AI is disrupting software engineering like we never seen before. When Copilot happened was a great innovation, but it did not disrupt the software engineering process too much; it was just a better tool, a better autocomplete, that saved us from doing searches in Google or StackOverflow.
However, since the rise of coding agents like Claude Code, Codex, Gemini CLI, Copilot CLI, and many others, software engineering has started a much deeper disruption process. Because engineers now spend much less time in traditional IDEs like IntelliJ and VSCode. In this post, De-Risking, I explained that Claude code is the new IDE; it's where you spend all or most of your time now. Claude Code is not a traditional IDE nor a code editor, but I'm pretty sure you get my point.
Coding Agents significantly speed up the engineering process. For instance, Boris Cherny, the creator of Claude Code, is doing 50-100 PRs per week, which is a lot. He also shared that he is using Claude Code to build Claude Code. That's impressive, but we need to remember that engineering tools and infrastructure solutions don't require as much from product/UX discovery as commercial or consumer software.
Claude Code is an engineering solution made by an engineer for engineers, using AI, of course, but it's not a baking app, it's not an e-commerce, it's not Netflix, it's not sausage factory management software, it's not a health care system. All these software programs I mentioned are fundamentally different. Because:
- The consumers are not only engineers (yes, engineers can see Netflix)
- You must have a much stronger UX structure
- You must have Business Analysts / Product Managers
- You have a regulation or multiple regulations, depending on the industry, like Health Care.
- You have Legal and Public Relations concerns
- There is a real need for the involvement of many more people
- Don't pay much attention and just LGTM
- Get more people to help in code reviews
- Create or use a code review agent like Greptile, Code Rabbit , or Github Copilot Code Review
- Code Review agent could also be a sub-agent or a custom command which is just a markdown file in Claude code (local folder in your machine).
- Find other ways to increase quality and depend less on code review
- Keep doing what we always do (but there will be a bottleneck)
- low: just AI
- medium: AI and humans sometimes (maybe a sampling like 1/5 PRs)
- high: AI coding Agents + Humans
- Some projects cannot fail under any circumstances (critical business rules, for instance)
- How can you tell the Architecture and the Design are right? (you need to review the code, maybe not every delta, but 1x per month?)
- Security (we know LLMs suck at security, we can't ignore that, so for security reasons, we need to look at what the code is doing - but it could be a scanner or an agent helping, but still, we would need to read)
More powerful Guardrails
Code Review is a manual process. Everything that is manual is error-prone. Software engineering is all about reliability and consistency. LLMs are not reliable because they are slot machines. However, engineering is reliable. So, we can add reliable guardrails, which would serve as compensating controls for less code review, for instance, consider:
- Increasing Testing Coverage
- Increasing Testing Diversity (Unit Test, Integration Tests, Chaos Testing, Stress, etc...)
- Having more comprehensive linters in the case of TypeScript
- Leveraging strongly typed languages like Scala 3 and Rust.
- Having better observability on the Code
- Leveraging Containers, K8s, and progressive rollout patterns, split traffic
- Beta Users Programs
- Code Review out of the delta(PR cycles) - maybe 1x per month?
- Leveraging Code as Policy and having more automated checks in the infrastructure on Terraform, K8s, AWS Resources, and everything you can use code to enforce policies you do.
- Real CI/CD with small deltas and constant deploys (not constant releases)
Signals
A good system has signals that can tell us what's going on. It's important to have LEAD time metrics, but we also need other signals, and thats how we tell things are okay or not. Here are some examples of signals:
- Number of incidents in production
- Number of bugs in production
- Number of support calls
- Number of comments (bad ones) at Apple and Google stores
- Site Traffic
- Revenue
How to make it better?
It's hard to say how software engineering will be in 2, 5, or even 10 years in the future, but here are somethings we can do that will help:
- Add more guardrails
- Increasing Testing Coverage
- Increasing Testing Diversity (Unit Test, Integration Tests, Chaos Testing, Stress, etc...)
- Having more comprehensive linters in the case of TypeScript
- Leveraging strongly typed languages like Scala 3 and Rust.
- Having better observability on the Code
- Leveraging Containers, K8s, and progressive rollout patterns, split traffic
- Beta Users Programs
- Code Review out of the delta(PR cycles) - maybe 1x per month?
- Leveraging Code as Policy and having more automated checks in the infrastructure on Terraform, K8s, AWS Resources, and everything you can use code to enforce policies you do.
- Real CI/CD with small deltas and constant deploys (not constant releases)
- Consider critically whether to go with more or fewer reviews:
- low: just AI
- medium: AI and humans sometimes (maybe a sampling like 1/5 PRs)
- high: AI coding Agents + Humans
- Consider doing code reviews outside of PR cycles (like 1x per month)
- Add proper observability with the right signals, like:
- Number of incidents in production
- Number of bugs in production
- Number of support calls
- Number of comments (bad ones) at Apple and Google stores
- Site Traffic
- Revenue
- Evaluate code review agents like: Greptile, Code Rabbit , or Github Copilot Code Review , but still review outside of the PR cycles.
- Understand that if engineering can produce code faster via agents, we can also fix problems faster with the same agents; bugs or bad behavior would not take long to be noticed, considering proper testing.
Cheers,
Diego Pacheco
