ClearScale, a leading AWS Premier Consulting Partner, empowers businesses to unlock the full potential of the cloud through a wide range of services, including cloud consulting, architecture design, migration, automation, application development, and managed services. We help Fortune 500 enterprises, mid-sized businesses, 1 and startups across diverse industries like Healthcare, Finance, and Technology succeed with ambitious and transformative cloud projects. Our expertise lies in architecting, developing, and launching innovative and sophisticated solutions using the latest cutting-edge cloud technologies. Due to our continued growth and the increasing demand for our modernization and cloud-native development capabilities, we are seeking a talented and experienced AWS Hosted/Modernization Software Engineer to join our dynamic team. If you are passionate about building and modernizing applications on the AWS platform, tackling complex engineering challenges, and working with a team of top-tier cloud experts, this is your opportunity to make a significant impact.

What You'll Do:

Execute on Observability Strategy
Define and document standards for logging, tracing and SLO definitions for engineering teams to follow
Propose effective ways to manage dashboards, traces, monitors, metrics and logs in Datadog
Integrate Datadog with incident management tools and Slack
Establish comprehensive monitoring using Datadog
Centralize logging and developing mechanisms for efficient debugging
Implementing systems for distributed tracing visualization
Adopting OpenTelemetry standards across microservices
Rolling out observability to development and production environments in close collaboration with engineering and operations teams
Define training practices for engineering teams to adopt observability standards and operational practises for healthy and sustainable incident management processes
Implementing POCs and demonstrating such constructs to engineering teams
Introduce engineering practices for healthy alerting mechanisms, dashboard definitions and blind-spots elimination with a focus on eliminating alert fatigue
Establish near real time reporting to minimize MTTA and MTTR and improve developer experience

What You'll Bring:

Extensive experience with AWS infrastructure at scale
Experience working in SRE, DevOps or Developer Experience teams in engineering organizations is a must
Deep knowledge of observability tooling (Datadog, Grafana, Splunk, OTEL) and hands-on experience developing, extending and operating them across different environments including high-loaded production systems
Expert knowledge of Terraform
Ability to propose solutions that scales across engineering teams and balance speed of response and cognitive load
Experience leading incident responses utilizing operational tools including logging, tracing, SLO patterns and synthetics
Experience establishing technical roadmaps from operational strategies for SRE, DevOps or Developer Experience teams in mid to large sized organizations and ability to drive its adoption in the engineering teams
Experience applying analytical practices to define SLAs in close coordination with engineering teams and stakeholders
Deep understanding and experience advocating for and rolling out SRE best practices and standards for engineering teams
Mindset of "minimal tooling for maximum impact"
Experience with on-call rotations, creating and executing scalable practices in engineering teams
Experience with integrating observability tooling with Teams and Slack
Leadership skills to drive alignment between different departments and get buy-in from different stakeholders
Exemplary oral and writing skills for technical and non-technical stakeholders
AWS certifications are a plus

Our Commitment to Your Growth and Well-being:

Competitive salary
Exceptional opportunities for career growth and leadership development within a leading AWS Premier Consulting Partner.
A collaborative, high-energy, and fully remote work culture that fosters connection and innovation.
Continuous learning and development opportunities, including access to training and certifications.
The flexibility and convenience of a 100% distributed workforce – work from the location that suits you best!

About ClearScale

👥201-500

📍San Francisco, CA

🔗Website

ClearScale Service

How does ClearScale work?

The company designs, builds, integrates, and manages complex infrastructures and applications on AWS exclusively. ClearScale has successfully delivered more than 1,000 cloud projects for clients ranging from startups to large enterprises and public sector organizations. Our core competency is delivering custom cloud projects and services for clients who have limited cloud experience on staff or who need additional resources. We leverage the best cloud technology available to provide a solution that is unique to your project requirements. Whether this is your first project in the cloud or one of many, ClearScale has the expertise to handle your most complex requirements.