Apex Aide apexaide

How a Mock LLM Service Cut $500K in AI Benchmarking Costs, Boosted Developer Productivity

By Sandeep Bansal and Seetharaman Gudetee· Salesforce Engineering Blog· ·Advanced ·Developer ·12 min read
Summary

Salesforce’s AI Cloud Platform team developed an internal mock Large Language Model (LLM) service that simulates AI provider responses to dramatically cut benchmarking costs by over $500K annually. The mock service provides deterministic latency and configurable failure scenarios allowing teams to benchmark performance, reliability, and scale without depending on costly and variable live LLM endpoints. This approach accelerates developer velocity by enabling consistent, repeatable tests including failover simulations, facilitating confident performance validation for large-scale production readiness. Salesforce teams can adopt similar mock simulation layers to reduce AI benchmarking costs while boosting iteration speed and system resilience.

Takeaways
  • Implement mock LLM services to simulate AI responses and reduce benchmarking costs.
  • Enforce deterministic latency to stabilize performance benchmarks and isolate internal changes.
  • Use software-driven failure injection to simulate outages and test failover logic dynamically.
  • Leverage scalable mock services to validate production readiness under high traffic loads.
  • Centralize mock service as a shared platform to streamline engineering collaboration and accelerate iteration.

By Sandeep Bansal and Seetharaman Gudetee. In our Engineering Energizers Q&A series, we spotlight the engineering minds driving innovation across Salesforce. Today’s edition features Sandeep Bansal, a senior software engineer from the AI Cloud Platform Engineering team, whose internal LLM mock service validates performance, reliability, and cost efficiency at scale — supporting production-readiness benchmarks beyond 24,000 requests per minute while significantly reducing LLM model costs during benchmarking. Explore how the team saved more than $500K annually in token-based costs by replacing live LLM dependencies with a controllable simulation layer, enforced deterministic latency to accelerate performance validation, and enabled rapid scale and failover benchmarking by simulating high-volume traffic and controlled outages without relying on external provider infrastructure.

Performance & LimitsArtificial Intelligence