How Data 360 Optimized Kubernetes Scheduling Architecture, Delivering 13% Cost Savings

By Padma Aradhyula· Salesforce Engineering Blog· March 5, 2026 ·Advanced ·Developer ·6 min read

Summary

Padma Aradhyula’s team at Salesforce Data 360 tackled the problem of inefficient Kubernetes scheduling which caused node fragmentation and increased costs for running millions of Spark jobs daily. They replaced the default LeastAllocated scheduling strategy with a proactive MostAllocated approach that packs executor pods densely, reducing wasted idle capacity and minimizing disruptive node evictions. This optimization improved resource utilization by 15%, cut compute costs by 13%, and significantly enhanced reliability by halving node disruptions. Salesforce teams managing large-scale Spark workloads can adopt similar custom scheduling logic to boost efficiency and stability in their Kubernetes environments.

Takeaways

Replace default Kubernetes scheduler LeastAllocated with MostAllocated for Spark workloads.
Proactively pack executor pods on fewer nodes to reduce fragmentation and idle capacity.
Avoid reactive autoscaler-driven consolidation to prevent costly executor evictions.
Monitor workload stability when increasing node utilization to ensure job SLA compliance.
Embed workload-aware placement logic to co-locate pods belonging to the same Spark job.

By Padma Aradhyula, Dongwei Feng, Siddharth Sharma, and Anuja Gore. In our Engineering Energizers Q&A series, we highlight the engineering minds driving innovation across Salesforce. Today, we spotlight Padma Aradhyula, Senior Director of Software Engineering on the Data 360 Compute Fabric team, who manages a large-scale platform orchestrating four million Spark applications daily, with nearly 2 million of them on Kubernetes. Explore how Padma’s team optimized infrastructure cost at global scale by evolving Kubernetes scheduler behavior to eliminate node fragmentation under bursty Spark workloads, redesigning placement logic to proactively consolidate executor pods onto fewer nodes and embedding efficiency directly into the scheduling layer to resolve the reliability tension created by reactive autoscaler-driven node churn.

Data CloudSalesforce ArchitecturePerformance & LimitsData

Read Original Article Explore More on Apex Aide

How Data 360 Optimized Kubernetes Scheduling Architecture, Delivering 13% Cost Savings

Related Articles

How to Secure your Agents & Data at Dreamforce 2025 for Admins and Architects

Extract, Analyse And Use Process Flows In Salesforce Using Agentforce

Mixins in Lightning Web Components – Dreamforce 2022 session

Building Beyond What You Know: Apex Extensibility

Client Credentials custom Auth. Provider