Building AWS Bedrock Model Availability: Slashing AI Routing Discovery From Days to Minutes
Salesforce’s AI Infrastructure team, led by Scott Chang, developed an automated system to dramatically reduce the time it takes to discover and route AWS Bedrock AI model endpoints across global regions. This solution replaces a slow, manual tracking process with real-time availability detection using AWS Lambda and APIs, cutting discovery time from days to minutes. It ensures reliable and compliant routing with fallback mechanisms to handle endpoint failures and strict data residency requirements. Salesforce teams working with AI and multi-region deployments can adopt this approach to improve operational resilience, speed up model adoption, and maintain compliance across complex cloud environments.
- Automate AI model endpoint discovery using AWS Lambda and APIs for real-time data.
- Implement deterministic fallback routing to ensure compliance and availability.
- Use layered routing controls to honor latency, capacity, and data residency needs.
- Centralize model availability monitoring with tools like Grafana for quick insights.
- Transform manual routing configuration into a scalable infrastructure capability.
In our Engineering Energizers Q&A series, we highlight the engineering minds driving innovation across Salesforce. Today, we spotlight Scott Chang, Principal Engineer on the AI Infrastructure team, who builds the robust, secure, and resilient core that drives Agentforce 360. Scaling Agentforce across a vast global network of AWS accounts and regions turned model tracking and routing into a complex puzzle. This insightful Q&A explains how Salesforce’s automated solution slashed Amazon Bedrock discovery times from three days to mere minutes. Explore how Scott and his team leveraged AWS APIs for instant endpoint detection, removing manual hurdles and strengthening compliance through dependable fallback routing. What is your team’s mission in building Bedrock Model Availability for Agentforce? The team’s core mission involves delivering stable, resilient, and secure infrastructure for the AI services behind Agentforce 360, with a sharp focus on operational excellence.