We are AP Consulting, we not only look for candidates, we look for passionate and talented individuals who will transform teams and take companies to the next level. We are a consultancy committed to connecting the best talents with exceptional opportunities.
Each position we offer is a gateway to new possibilities, and each candidate is the protagonist of their own professional success story. From technology innovation to strategic business leaders, we are here to match unique skills with extraordinary opportunities.
Discover the potential of your career with us! Explore our job opportunities and take the first step towards an exciting professional future.
Site Reliability Engineer Cloud and Infrastructure
What can you expect from this position?
- Participate in initiatives involving system design and provisioning, reliability, observability and monitoring, self-service tool development, cost optimization, incident response, chaos engineering and build and release
- Hands-on design, analysis, development and troubleshooting of large-scale distributed systems
- Build tools and automation to eliminate repetitive tasks, minimize downtime, achieve human free operations, and provide self-service solutions to product development teams
- Work to improve the observability and monitoring of our systems. Proactively monitor capacity, performance, and cost metrics to ensure quality and identify opportunities for improvement
- Share an on-call rotation with your team where you will respond to incidents, lead triage efforts, and conduct blameless postmortems
- Partner with engineering, security, and product teams to keep our services reliable, available, fast, and cost efficient
- Be a champion of the customer’s voice and ensure our solutions are built with customer empathy at the forefront
- Promote SRE best practices within your team to ensure quality, stability, performance, resiliency, and maintainability of your solutions
- Explore new technologies and solutions to push our capabilities forward
What can you bring to the role?
- 4+ years combined experience as a Software Engineer, Site Reliability Engineer or DevOps Engineer
- Proven technical abilities in the areas of reliability, monitoring, self-service tool development, incident response, and build and release
- Experience in one of these languages: Python, Go or Java. Prior software development experience preferred
- Strong experience with Linux environments
- Demonstrated expertise designing, building, and triaging highly scaled production infrastructure in AWS
- Experience with infrastructure automation technologies like Terraform
- Experience in container/container-fleet-orchestration technologies like AWS ECS or EKS
- Approach your job with an automation and software engineering mindset
- Passion for uptime, observability, and full stack monitoring
- Experience participating in a team’s 24x7 incident response efforts
- Experience building ci/cd pipelines that are fast, informative, drive quality and achieve zero downtime releases
- Ability to work across functional and domain boundaries to improve system reliability and deliver solutions on time and with quality
Common Technologies In Our Ecosystem Include
- Java, Go, Node, PHP
- DataBricks experience would be ideal
- Linux-based, some Windows
- Apache Web, Nginx, IIS, Apache Tomcat, Jetty
- Docker, AWS ECS, AWS EKS, and home-grown Kubernetes
- ELB, CloudFront, S3, EC2s, RDS, IAM, SQS, SES, SNS, Lambda, API Gateway, Kinesis, Lambda, ElasticCache, ElasticSearch, SSM, Control Tower, and much more
- MySQL, Oracle, PostgreSQL, SQL Server
- Artifactory, GitHub Enterprise, CircleCI, Jenkins, GitHub Actions, SonarQube, Jfrog X-Ray, Control Tower
- Terraform (preferred), CloudFormation
- Packer, Puppet, Ansible
- New Relic, CloudWatch, PagerDuty
TOP REQUIREMENTS:
- Kubernetes (EKS)
- Terraform
- Intermediate to Advanced AWS
- Github Actions
- NewRelic/Datadog
- Containers (docker)
- Ideally some form of developer background - with an eye for automating recurring tasks
- excellent communicator
- Above all else, someone who takes ownership, is a self-starter, and is someone that loves learning new skills and has a growth mindset.
As an education innovation company, we're proud to play our part by inspiring learners around the world. If you bring your curiosity, we'll help you grow in a collaborative environment where everyone shares a passion for success.
Benefits and Perks
- Legal benefits
- 100% Remote Work Scheme!
- Additional days off after 3 months
- Major Medical Expenses Insurance including direct family members
- Funeral Expenses Insurance.
- Loyalty bonus for an additional 15 days
- Flexible working hours
- 4% Savings Fund
- Profit Sharing.
- Training Programs
- English classes
- Work tool