Featured Win

Tightening EKS ingress health checks to make rollouts boring again

Reduced deployment friction by aligning ALB health checks, readiness behavior, and ingress expectations so rollout failures became faster to diagnose and less disruptive.

Debugging Stories
AWS
Kubernetes
EKS
Debugging

Date: 2026-04-21
Category: Debugging Stories
Role: Cloud / DevOps Engineer
Proof Type: Production incident fix

Impact

Turned a recurring ingress failure into a repeatable, low-drama diagnosis path with clearer rollout signals and faster recovery.

Situation#

A service looked healthy from inside the cluster, but traffic routed through the ALB kept failing health checks during deployment windows. The issue was not a single broken setting. It was drift between ingress assumptions, readiness behavior, and what the load balancer actually expected.

What I changed#

I traced the request path from Ingress to target group behavior, then tightened the interfaces between Kubernetes and AWS:

aligned the health check path with what the application really exposed
checked service-to-pod port mapping against ingress expectations
made readiness behavior reflect external availability more accurately
verified controller-created target group behavior instead of only reading manifests

Why it worked#

The problem stopped looking like "ALB is failing" and started looking like an interface mismatch between declared routing intent and runtime health semantics. Once those boundaries were made explicit, diagnosis got much faster and the rollout path became more predictable.

Operational result

The biggest win was not only the fix itself, but a calmer release path with fewer ambiguous failures during deployment.

Reusable lesson#

In EKS, ingress failures are often coordination failures across multiple control loops rather than one bad YAML field. The fastest path is usually to verify each boundary directly: Ingress, target group behavior, service mapping, pod readiness, and application path exposure.

Related Wins

Additional wins that show adjacent production improvements, design calls, and debugging work.

Cut EKS cluster costs by 34% without touching capacity

Identified and fixed a cluster cost problem caused by over-provisioned node groups and unset resource requests — without reducing actual workload capacity or changing application behaviour.

Infrastructure
Production outcome
AWS
EKS

View

Team offsite to Cape Coast — and why it mattered more than I expected

A two-day trip with the Scratchcode team to Cape Coast turned into one of the most useful alignment sessions we've had. A mix of work, history, and real conversations about where we're headed.

Team Moments
Team milestone
Team
Culture

View

Spoke at the Accra cloud community meetup

Presented a session on Kubernetes scheduling and node failure recovery to a room of about 40 engineers and students in Accra. First time speaking on infrastructure topics to a local audience.

Community
Speaking engagement
Speaking
Kubernetes

View