SRE Lead
Zomato | Jul 2017 - Feb 2023
PHP
Docker
DynamoDB
Go (Golang)
AWS (Amazon Web Services)
As a Site Reliability Engineering (SRE) Lead at Zomato, I have been working for the company since 2017 and currently manage a team of sixteen Site Reliability Engineers. My role includes leading postmortem incident reviews to identify the root cause and all the contributing causes to ensure remediation. I also started and drive a weekly tech learning session for postmortems to ensure all the tech team understands and works towards preventing similar incidents in every microservice.
I actively lead the entire incident management lifecycle, from triaging, commanding, and getting mitigation deployed to postmortem and follow-ups. I also generically fill any holes missing in Engineering, such as Cost Optimization, Continuous Integration, Continuous Release, Outages, Debugging, Observability, Architecture, and Building Platform Microservices.
After a series of incidents, I led an effort with other VPs to increase the resiliency by designing fallbacks and redundancies throughout the business critical flows. For the first time, I designed and implemented a structured peak event preparation plan for the entire tech team and ensured a seamless event that increased the load by almost 75% compared to the whole year.