Cut EKS cluster costs by 34% without touching capacity

Identified and fixed a cluster cost problem caused by over-provisioned node groups and unset resource requests — without reducing actual workload capacity or changing application behaviour.

Infrastructure
AWS
EKS
Cost
Kubernetes

Date: 2026-04-01
Category: Infrastructure
Proof Type: Production outcome

Impact

34% reduction in EC2 spend for the cluster. Freed up budget that funded two additional environments for the team.

The problem#

The EKS cluster was running significantly more EC2 capacity than the workloads needed. Node groups were sized for peak load assumptions that never materialised, and most pods had no resource requests set — which meant the scheduler couldn't bin-pack effectively and nodes ran at 20–30% utilisation.

What I changed#

Audited actual resource usage using kubectl top and CloudWatch Container Insights to get real CPU and memory baselines per workload.
Set resource requests on all workloads — this alone changed the scheduler's bin-packing behaviour significantly.
Resized node groups from m5.xlarge to m5.large for the majority of workloads, keeping xlarge only for the two services that genuinely needed it.
Enabled Cluster Autoscaler with tighter scale-down thresholds so idle nodes were removed more aggressively during off-peak hours.

Why it worked#

The core issue was that without resource requests, the Kubernetes scheduler treats every pod as zero-cost and spreads them loosely. Setting accurate requests lets the scheduler pack pods onto fewer nodes — which then allows the autoscaler to actually scale down.

YAML

resources:  requests:    cpu: "250m"    memory: "256Mi"  limits:    cpu: "500m"    memory: "512Mi"

Result

34% reduction in EC2 spend with no change to application behaviour, capacity, or SLAs. The freed budget funded two new environments.

Reusable lesson#

EKS cost problems are usually scheduling problems in disguise. Fix the resource requests first — everything else follows from that.

Related Wins

Additional wins that show adjacent production improvements, design calls, and debugging work.

Tightening EKS ingress health checks to make rollouts boring again

Reduced deployment friction by aligning ALB health checks, readiness behavior, and ingress expectations so rollout failures became faster to diagnose and less disruptive.

Debugging Stories
Production incident fix
AWS
Kubernetes

View

Team offsite to Cape Coast — and why it mattered more than I expected

A two-day trip with the Scratchcode team to Cape Coast turned into one of the most useful alignment sessions we've had. A mix of work, history, and real conversations about where we're headed.

Team Moments
Team milestone
Team
Culture

View

Spoke at the Accra cloud community meetup

Presented a session on Kubernetes scheduling and node failure recovery to a room of about 40 engineers and students in Accra. First time speaking on infrastructure topics to a local audience.

Community
Speaking engagement
Speaking
Kubernetes

View