Scaling Code Reviews: Adapting to a Surge in AI-Generated Code
Salesforce faced challenges in code reviews due to a significant increase in AI-generated code, which made traditional file-by-file review ineffective and risked compromising trust and quality. To address this, they developed Prizm, a system that reconstructs developer intent and incorporates contextual information to improve review efficiency and maintain rigorous standards. This approach scales human judgment alongside AI contributions without sacrificing security or developer intent, making it ideal for large and complex Salesforce codebases. Teams can adopt similar strategies to enhance their code review processes under AI-driven change volume.
- Rebuild code review workflows to focus on developer intent, not just diffs.
- Integrate contextual data from work items and code history into reviews.
- Use asynchronous, semantic analysis to scale review without blocking developers.
- Apply AI-driven feedback early in the IDE and PR to catch issues sooner.
- Preserve human judgment and transparency with clear reasoning traces.
By Shan Appajodu and Ravi Boyapati . For years, software engineering viewed code review as a bounded, human-scalable activity. Pull requests grew gradually, and reviewers relied on local context. File-by-file diffs were enough to reason about changes. This model fractured when AI-assisted coding tools altered the economics of code creation faster than review workflows could adapt. At Salesforce, internal signals made this shift impossible to ignore. Code volume increased by approximately 30% and pull requests regularly expanded beyond 20 files and 1,000 lines of change. Review latency rose quarter over quarter. More concerning, review time for the largest pull requests began to plateau — or even decline. This indicated that reviewers were no longer meaningfully engaging with changes. The sustainable maintenance of our core values — trust, safety, and technical rigor — was becoming increasingly difficult to guarantee at scale.