Personal details

Williams M. - Remote DevOps engineer

Williams M.

Senior Platform Engineer
Based in: 🇬🇧 United Kingdom
Timezone: Edinburgh (UTC+1)

About

I am a Machine Learning/DevOps engineer with great experience building reliable and highly available platforms that scale and ensures uptime needed for customer satisfaction.

I am knowledgeable of the CAP theorem, PECELC theorem, advanced systems design, data partitioning, application and infrastructure level monitoring and improving latency of requests. One of my proudest achievements was improving the uptime of an infrastructure from 79% to 99.99%.

I am skilled in kubernetes, python, terraform, Jenkins, AWS, GCP, ansible, docker and always apply best practices like preventing escalating privileges, scanning images before deploy, using secrets correctly. I am also a certified kubernetes application developer.

I am also a Senior Full stack developer, I have developed multiple apis with TypeScript and Nestjs. I have built web applications with ReactJs and mobile applications with Flutter and React Native.

Work Experience

Senior Platform Engineer
Attest | Sep 2024 - Present
Python
Bash
Google Cloud Platform
Incident management
Disaster recovery planning
Terraform
Grafana
Prometheus
Golang
Istio
GitHub Actions
Argo CD
AWS

● Created dedicated IAM roles, custom policies, and KMS keys for all applications and microservices via IaC2 (custom parameterized Terragrunt and Golang-based Pulumi scripts for both infrastructure and Kubernetes), enforcing least-privilege access and improving resource isolation by 40%.

● Refactored and converted critical Terraform code to IaC2 (Pulumi/Terragrunt), boosting infrastructure consistency across environments and speeding up deployments by 60%.

● Added Kubernetes manifest validation to CI pipelines, reducing production misconfigurations by over 80% and cutting incident resolution times in half.

● Optimized API test builds through concurrency and Docker image caching, shrinking build durations from 20 minutes to under 10 (50% improvement).

● Orchestrated multi-account, multi-region Disaster Recovery (DR) by identifying critical systems, establishing a recovery AWS account, creating a proof-of-concept with Arpio and then IaC2 + AWS backups + cross-region and cross-account replication.

● Created DR runbook and achieved successful failover tests with near-zero data loss.

● Developed a modern VPC module (multi-AZ subnets, NAT gateways) in Go-based Pulumi, enhancing scalability and cutting environment setup times by 70%.

● Drove cost-optimization strategies, reducing monthly infrastructure expenses by 20%.

● Mentored engineers on secure infrastructure design, championed best practices in GitOps, and promoted a collaborative culture through Slack Donut meetups and team-building sessions.

AWS Senior DevOps Engineer
The Weather Company, An IBM business | Sep 2023 - Feb 2025
Python
Bash
Jenkins
Terraform
Grafana
Prometheus
Helm
AWS EKS
Docker & Kubernetes
Argo CD
Notion
AWS
  • Directed the design and implementation of scalable cloud infrastructure on AWS and GCP for a high-traffic platform, serving over 1 million monthly users with 99.99% uptime.
  • Implemented infrastructure-as-code practices using Terraform and ArgoCD, reducing deploymenttimes by over 50% and enhancing team productivity.
  • Pioneered the migration from EC2 to AWS EKS, as one of the first Devops Engineer on the team, cutting infrastructure costs by 40%, boosting deployment speed by 35%, and enhancing systemresilience by 50%
  • Authored and maintained 200+ pages of technical documentation on Notion, significantly improving the onboarding process for new team members and external collaborators.
  • Orchestrated the deployment of a new Jenkins server, enhancing CI/CD pipelines with Kubernetes pods; achieved a 50% increase in build process efficiency for over 30+ activedevelopment teams.
  • Integrated AWS Secret Manager, streamlining secret management for 100+ applications; improved security posture by 40% through robust IAM roles and security group configurations.
  • Led a critical migration project, transferring 20+ CI builds to the new Jenkins platform within a tight 1-month deadline, ensuring zero disruption to ongoing development activities.
  • Spearheaded the introduction of Docker in Docker (DIND) servers for Jenkins builds, reducing build times by 25% and enhancing the pipeline's reliability for containerized applications.
  • Initiated and successfully completed the migration of Helm charts to AWS ECR, facilitating a smoother deployment process and improving deployment efficiency by 30%.
  • Established a comprehensive monitoring solution using Prometheus and Grafana for real-time visibility into the health and performance of Linux servers and applications, enabling proactiveissue resolution and a 99.9% uptime

Projects

FoodCourt Web App
React
Tailwind css

Education

University Of Strathclyde
Master's degreeMachine Learning and Deep Learning
Oct 2022 - Sep 2023