Skip to content

I'm currently populating my catalog on the site. Pardon the prefilled data. The entries are actively being updated and cleaned up.

Previous website
Capstone Project

Enterprise AWS Platform Engineering

A production-grade cloud platform built on AWS EKS with Terraform environment isolation, polyglot CI/CD across Node.js, Java, and Rust, GitOps via ArgoCD, and a layered security scanning strategy — end to end.

Role: Platform / DevOps EngineerTimeline: 10 weeksCategory: Platform Engineering
  • AWS EKS
  • Terraform
  • GitHub Actions
  • ArgoCD
  • Helm
  • Docker
  • ECR
  • Prometheus
  • Grafana

Overview#

This project is a full-stack platform engineering build covering the complete delivery lifecycle on AWS — from infrastructure provisioning to production deployments, with security and observability woven in throughout.

The goal was to build the kind of platform a growing engineering team could actually run in production: environment-isolated Terraform state, a polyglot CI/CD pipeline handling Node.js, Java, and Rust services in parallel, GitOps-driven deployments via ArgoCD, and environment-aware security scanning that keeps developers fast without sacrificing rigour on the way to production.

Scope

This covers the full platform stack: infrastructure, containers, CI/CD, GitOps, and security — not just one layer.

Problem#

Most infrastructure tutorials stop at getting something running. Production platforms have harder constraints: multiple environments that can't interfere with each other, build pipelines that need to stay fast as the service count grows, and security controls that don't slow down every commit.

The challenge here was designing a system that held together across all of these dimensions at once — not just in one isolated area.

Solution#

I structured the build into six progressive phases:

  • OIDC authentication — GitHub Actions authenticates to AWS via OpenID Connect. No long-lived access keys, no rotation burden. Credentials expire in one hour.
  • Terraform environment isolation — Dev, staging, and production each have completely separate remote state (S3 + DynamoDB locking). Destroying dev cannot touch production.
  • EKS with ECR — A Terraform-provisioned EKS cluster per environment, with ECR repositories for each service, image lifecycle policies, and IRSA for pod-level IAM.
  • Advanced Kubernetes — HPA, Pod Disruption Budgets, Network Policies, RBAC, and StatefulSets. Built with Helm charts that carry production-safe defaults.
  • Polyglot CI/CD — Matrix builds run Node.js, Java, and Rust services in parallel. Each language gets its own caching strategy (npm, Maven, Cargo) and optimised Dockerfile.
  • Environment-aware security scanning — Fast scans on feature branches, comprehensive SAST and SCA on staging, full container and IaC scanning with manual approval gates on production.

Architecture#

Multi-environment platform with isolated Terraform state, EKS clusters, and a GitOps-driven promotion flow from dev through to production.

The architecture runs three fully isolated environments — each with its own VPC, EKS cluster, and Terraform state. GitOps via ArgoCD drives deployments: automated to dev, one approval for staging, two approvals and a manual gate for production.

The EKS cluster provisioning is covered in more depth in a companion project:

Project

Kubernetes Workload Platform on AWS EKS

Provisioned a production-grade EKS cluster on AWS with Terraform, deployed containerised workloads using Helm, and automated the full delivery pipeline with Jenkins.

AWS EKSTerraformJenkinsHelm
Read →

Tech Stack#

  • AWS EKS
  • Terraform
  • GitHub Actions
  • ArgoCD
  • Helm
  • Docker
  • ECR
  • Prometheus
  • Grafana
  • OIDC
Matrix build — three languages in parallel
strategy:matrix:  service:    - name: node-api      cache: npm      build: npm ci && npm run build    - name: java-service      cache: maven      build: mvn clean package -DskipTests    - name: rust-processor      cache: cargo      build: cargo build --release # All three run concurrently.# Total build time: ~4 min (not 9 min sequential).

The full design and trade-off analysis for this delivery system is documented in the case study:

Case Study

Designing a secure internal delivery platform on AWS and Kubernetes

A deep technical breakdown of how infrastructure baselines, GitOps delivery, and observability defaults came together as a reusable internal platform.

AWSKubernetesTerraformArgoCD
Read →

Key Features#

  • GitHub OIDC → AWS: no stored access keys, temporary credentials only.
  • Complete Terraform state isolation per environment with S3 + DynamoDB.
  • EKS cluster autoscaling, IRSA for pod IAM, ECR with lifecycle policies.
  • Helm charts with HPA, PDB, Network Policies, and zero-downtime rolling updates.
  • Parallel matrix builds for Node.js, Java, and Rust with language-specific caching.
  • Tiered security scanning: fast on dev, comprehensive on staging, full suite on production.
  • ArgoCD GitOps with automated dev sync and manual approval gates for production.
  • Prometheus + Grafana observability baked into the platform defaults.

Media#

The promotion workflow: automated dev deployments, one-approval staging, two-approval production with blue/green rollout and smoke tests.

Results#

  • Build times reduced from ~9 minutes (sequential) to ~4 minutes (parallel matrix builds).
  • Zero long-lived AWS credentials — full OIDC adoption across all environments.
  • Production deployments require 2 approvals and pass a full security suite before shipping.
  • Terraform destroy of dev cannot affect staging or production state.
  • Platform provides a consistent baseline any new service can land on from day one.

One concrete operational outcome from tightening the EKS delivery configuration:

Win

Tightening EKS ingress health checks to make rollouts boring again

Reduced deployment friction by aligning ALB health checks, readiness behavior, and ingress expectations so rollout failures became faster to diagnose and less disruptive.

AWSKubernetesEKSDebugging
Read →

What this demonstrates

End-to-end platform engineering thinking: security, delivery, infrastructure, and operations handled as one coherent system rather than separate concerns bolted together.

Lessons Learned#

The biggest lesson was that platform work is primarily a coordination problem. The technology choices (EKS, ArgoCD, Terraform) are well understood. The harder part is designing the system so that environment isolation, security controls, and developer experience reinforce each other rather than pulling in different directions.

Building the tiered security scanning was a good example of this. Full scanning on every commit sounds rigorous, but it slows developers down and trains them to ignore CI. Environment-aware scanning — fast on dev, comprehensive on production — achieves better security outcomes by not making developers work around the system.

For deeper context on the Kubernetes delivery patterns used here, this post breaks down the internals:

Blog

Kubernetes Internals Notes: API Server, RBAC, Scheduling, and Controllers

A practical, student-friendly guide to the Kubernetes request flow, authentication vs authorization, controllers, scheduler behavior, rolling updates, and workload resilience.

KubernetesPlatform EngineeringAWS
Read →

Related Projects

A few additional builds that connect to the same infrastructure, delivery, and reliability themes.

Kubernetes Workload Platform on AWS EKS

Provisioned a production-grade EKS cluster on AWS with Terraform, deployed containerised workloads using Helm, and automated the full delivery pipeline with Jenkins.

  • AWS EKS
  • Terraform
  • Jenkins
  • Helm
  • Docker
  • ECR

Containerised Application Delivery on AWS ECS

End-to-end deployment of a Node.js frontend and backend on AWS ECS Fargate, automated through a Jenkins CI/CD pipeline and provisioned with Terraform.

  • AWS ECS Fargate
  • Jenkins
  • Terraform
  • Docker
  • ECR

Multi-Region Platform Resilience

A disaster recovery and high-availability architecture across two AWS regions using Route 53 failover, cross-region RDS replication, and automated runbooks for recovery validation.

  • AWS Route 53
  • RDS Multi-Region
  • Terraform
  • CloudWatch
  • Lambda
  • S3