Cloud Operations Engineer
Clarivate | Mar 2020 - Present
Git
Jenkins
Ansible
Datadog
Terraform
Packer
Fluentd
Elastic Stack
Splunk signalfx
Technical Leader of the main logging platform (FluentD, Opendistro
ElasticSearch, Kibana) in AWS – automation done using (Docker, GIT,
Python, Terraform, JSON, YML, Ansible, Rundeck and Jenkins).
• On-boarding and supporting new ingestions from our internal business units.
SSO, ISM Policies, Indices, Monitoring, Aggregators, certificates, LB, DNS,
fluentD configs...
• Leading the migration from the custom ElasticSearch Cluster to AWS
ElasticSearch Service.
• Troubleshooting any issues in Production.
• On-Call 24x7.
• Monitoring in SignalFX and Datadog (Integration, configuration, management
and creation of dashboards)
• Helping other teams with any potential blockers they can have in AWS.
• Solving issues in AWS using Jira to manage the tickets. (EC2, VPC, Route53,
LB, SG, TGW, IAM, S3, Workspaces...)
• AMI management using Packer and Jenkins.
• Improving existing automation. Ansible, Jenkins, Git, Packer, EC2...
• Governance tasks.
Site Reliability Engineer
Dealfront | Jan 2023 - Mar 2024
Elasticsearch
Kubernetes
Grafana
Pagerduty
Prometheus
Helm
GitLab CI/CD
Helm Charts
Argo CD
- Installed and configured various tools and services (e.g., Vault, Cert-Manager, Opensearch, Sentry, MinIO, Qdrant) using Helm, Ansible, and ArgoCD to support data-engineering and data-science teams.
- Supported developers in creating roles and tokens in Vault for secure access to production databases.
- Implemented VictoriaMetrics Alert for critical alerts to PagerDuty and ensured its functionality.
- Established a logging strategy in the Kubernetes cluster using Opensearch for effective troubleshooting.
- Collaborated with cross-functional teams to maintain the Kubernetes cluster's smooth operation.
- Monitored and managed the Kubernetes cluster using VictoriaMetrics and Grafana, creating custom dashboards.
- Developed and maintained comprehensive documentation for the Kubernetes cluster and associated processes.
- Stayed updated with the latest technologies and continuously improved cluster infrastructure.
- Installed, configured, and maintained production Kafka clusters and Postgres databases, including monitoring and customizations for efficiency and reliability.
- Integrated various tools with Okta for SSO and OpenID.
Designed and implemented Gitlab CI pipelines for automating operational tasks.