Actively recruiting / 12 applicants
We’re here to help you
Wilson Bittencourt is in direct contact with the company and can answer any questions you may have. Email
Wilson Bittencourt, RecruiterRole Overview
The Site Reliability Engineer (SRE) plays a pivotal role in Pearl’s technology platform, focusing on building infrastructure and automation to scale systems securely and efficiently. Reporting to the VP of Technical Operations, you will enhance the engineering team's capabilities by eliminating manual bottlenecks and ensuring resilient, observable, and automated production environments. Your main goal is to lead infrastructure consolidation and achieve a "Zero-Manual Deployment" objective while maintaining SOC 2 Type 2 compliance.
Responsibilities
- Infrastructure as Code (IaC) & Automation: Design and maintain Terraform configurations for AWS and Snowflake. Spearhead the "Pearl Cloud Environment Creator" initiative to enable rapid, auditable provisioning of Dev, Staging, and Prod environments via GitOps.
- CI/CD Pipeline Development: Develop automated pipelines that eliminate manual intervention, integrating security scanning and compliance checks into every workflow.
- Observability & Monitoring: Create modern logging and alerting stacks. Establish SLIs, SLOs, and error budgets to ensure deep visibility into system health and performance.
- Disaster Recovery (DR) & Reliability: Architect DR plans for internal and customer-facing services and lead regular drills to ensure system resilience.
- Data Infrastructure Support: Collaborate with Data Engineering to migrate Snowflake pipelines to dbt and implement robust data quality testing frameworks.
- Incident Response: Participate in on-call rotations, lead troubleshooting efforts, and conduct blameless post-mortems for continuous system improvement.
Secondary Functions
- Mentorship: Guide engineering team members on reliability principles and infrastructure best practices.
- Documentation: Maintain detailed runbooks, architecture diagrams, and operational procedures.
- Security Support: Work with the InfoSec team to harden IAM roles and manage secrets for SOC 2 compliance.
Required Skills
- IaC Expertise: Expert-level proficiency with Terraform (or Pulumi/CloudFormation) across multi-account environments.
- Cloud Platforms: Extensive hands-on experience with AWS (EC2, RDS, S3, IAM, VPC, EKS). Experience with Snowflake is highly advantageous.
- CI/CD & DevOps: Proficient with GitHub Actions and building multi-stage deployment pipelines.
- Programming: Strong Python skills for tooling; familiarity with Bash and ideally Clojure.
- Observability Tools: Experience with DataDog, Prometheus/Grafana, ELK, or CloudWatch.
- Systems Knowledge: Strong Linux administration and networking fundamentals (DNS, TCP/IP, Load Balancing).
- SRE Mindset: Comprehensive understanding of error budgets, chaos engineering, and pragmatic problem-solving in ambiguous situations.
Nice to Have
- Participation in an on-call rotation: Following initial training.
- Availability for occasional, pre-coordinated after-hours maintenance windows.
- Excellent English communication skills: For effective collaboration with technical and non-technical stakeholders.