Are you driven to build bulletproof infrastructure that keeps high-performing engineering teams moving at full speed? Do you get excited about tackling complex challenges, automating away repetitive work, and making large-scale systems run flawlessly? We’re seeking a hands-on, forward-thinking Site Reliability Engineer (SRE) to join our infrastructure team and collaborate closely with our backend software engineers to design, operate, and scale systems that are fast, reliable, and built to last. In this role, you’ll be equal parts software engineer, systems architect, incident responder, and problem-solver — ensuring our services run seamlessly 24/7 while pushing the limits of automation, observability, and operational excellence.

Should have at least 4:

Linux Expertise: Hands-on experience administering, configuring, and troubleshooting Linux systems in production environments, including performance tuning, process management, and networking.
Programming & Automation: Proficiency in Python (preferred) and at least one additional scripting language (e.g., Bash, Go, Ruby) to automate deployments, monitoring, and incident response.
Infrastructure as Code (IaC): Experience with tools such as Terraform, Ansible, or CloudFormation to provision and manage infrastructure at scale.
Cloud Infrastructure: Proven track record deploying and managing production workloads in AWS, GCP, or Azure, including compute, storage, and networking components.
Monitoring & Observability: Skilled in using Prometheus, Grafana, Datadog, Splunk, or similar for metrics, logging, and tracing. Experience instrumenting applications for observability in collaboration with software engineering teams.
APIs & Service Integration: Experience consuming, building, and troubleshooting REST/gRPC APIs, including handling rate limits, retries, and error recovery.
Incident Response & Troubleshooting: Ability to diagnose and resolve complex production issues under time pressure, perform root cause analysis, and lead postmortems.
Capacity Planning & Performance Engineering: Experience forecasting resource needs, load testing, and ensuring systems can handle growth and high traffic.
Change Management: Familiarity with safe deployment practices, CI/CD pipelines, and risk assessment for infrastructure changes.
SLAs, SLOs, and Error Budgets: Understanding of how to define, monitor, and maintain service reliability targets and make trade-offs between reliability and feature velocity.

Good to have at least 2:

Collaboration: Proven ability to work closely with backend software engineers, product managers, and other stakeholders to design and operate reliable systems.
Automation-First Mentality: Strong belief in reducing manual toil and building self-healing systems.
Calm Under Pressure: Ability to think clearly and act decisively in high-stakes situations.
Continuous Improvement: Always looking for ways to optimize systems, processes, and workflows.
Analytical Thinking: Uses data and metrics to guide decision-making and evaluate success.
Adaptability: Comfortable learning new tools, frameworks, and approaches to meet evolving business and technical needs.

About Astreya

👥501-1000

📍San Jose, CA

🔗Website

Astreya Service

How does Astreya work?

We enable businesses to make better decisions, achieve operational efficiency and gain a competitive edge. The Astreya advantage is centered around focus and clear- vision, world-class talent, and innovative technology: Creativity is in our DNA.

Company culture

Make an impact

At Astreya, you have a chance to create change and build a new, better ‘normal’ based on how people really function and what allows us to thrive. We encourage our employees to explore the positive changes they can make within our personal and professional identities.

Work alongside data-driven technologists

As part of the Astreya team, you’ll have the opportunity to work with some of the largest and most innovative companies on the globe. You’ll gain exposure to empowering technologies as well as the newest and most efficient ways to work.

Visit company profile

Unlock all Arc benefits!

Browse remote jobs in one place
Land interviews more quickly
Get hands-on recruiter support

PRODUCTS

Arc

The remote career platform for talent

Codementor

Find a mentor to help you in real time

LINKS

About us Pricing Arc Careers - Hiring Now!Remote Junior Jobs Remote jobs Career Success Stories Talent Career Blog Arc Newsletter

JOBS BY EXPERTISE

Remote Front End Developer Jobs Remote Back End Developer Jobs Remote Full Stack Developer Jobs Remote Mobile Developer Jobs Remote Data Scientist Jobs Remote Game Developer Jobs Remote Data Engineer Jobs Remote Programming Jobs Remote Design Jobs Remote Marketing Jobs Remote Product Manager Jobs Remote Project Manager Jobs Remote Administrative Support Jobs

JOBS BY TECH STACKS

Remote AWS Developer Jobs Remote Java Developer Jobs Remote Javascript Developer Jobs Remote Python Developer Jobs Remote React Developer Jobs Remote Shopify Developer Jobs Remote SQL Developer Jobs Remote Unity Developer Jobs Remote Wordpress Developer Jobs Remote Web Development Jobs Remote Motion Graphic Jobs Remote SEO Jobs Remote AI Jobs

Cookie Policy Privacy Policy Terms of Service