I am a Machine Learning/DevOps engineer with great experience building reliable and highly available platforms that scale and ensures uptime needed for customer satisfaction.
I am knowledgeable of the CAP theorem, PECELC theorem, advanced systems design, data partitioning, application and infrastructure level monitoring and improving latency of requests. One of my proudest achievements was improving the uptime of an infrastructure from 79% to 99.99%.
I am skilled in kubernetes, python, terraform, Jenkins, AWS, GCP, ansible, docker and always apply best practices like preventing escalating privileges, scanning images before deploy, using secrets correctly. I am also a certified kubernetes application developer.
I am also a Senior Full stack developer, I have developed multiple apis with TypeScript and Nestjs. I have built web applications with ReactJs and mobile applications with Flutter and React Native.
● Created dedicated IAM roles, custom policies, and KMS keys for all applications and microservices via IaC2 (custom parameterized Terragrunt and Golang-based Pulumi scripts for both infrastructure and Kubernetes), enforcing least-privilege access and improving resource isolation by 40%.
● Refactored and converted critical Terraform code to IaC2 (Pulumi/Terragrunt), boosting infrastructure consistency across environments and speeding up deployments by 60%.
● Added Kubernetes manifest validation to CI pipelines, reducing production misconfigurations by over 80% and cutting incident resolution times in half.
● Optimized API test builds through concurrency and Docker image caching, shrinking build durations from 20 minutes to under 10 (50% improvement).
● Orchestrated multi-account, multi-region Disaster Recovery (DR) by identifying critical systems, establishing a recovery AWS account, creating a proof-of-concept with Arpio and then IaC2 + AWS backups + cross-region and cross-account replication.
● Created DR runbook and achieved successful failover tests with near-zero data loss.
● Developed a modern VPC module (multi-AZ subnets, NAT gateways) in Go-based Pulumi, enhancing scalability and cutting environment setup times by 70%.
● Drove cost-optimization strategies, reducing monthly infrastructure expenses by 20%.
● Mentored engineers on secure infrastructure design, championed best practices in GitOps, and promoted a collaborative culture through Slack Donut meetups and team-building sessions.