We are Ipsos Askia, we are passionate about technology and insights. Ipsos Askia's ambition is to power Ipsos with tailored solutions that generate actionable insights and unlock business growth.

We are a team of 65+ people working across France, Germany, Romania and the UK. What truly sets us apart is our culture of empowerment and ownership, harmoniously working together to create value. Our continuous quest for innovative solutions is aimed at proactively meeting the needs of the business.

About Ipsos Askia product – NGIE:

Our goal is to make NGIE, our in-house platform, the primary survey data collection tool at Ipsos, replacing the current third-party solution. To achieve this, we are providing tools that collect information from respondents and address research objectives. We are expanding NGIE's capabilities in a scalable way to support business growth. Through daily collaboration and innovation, we steadily build confidence and fully unlock NGIE's potential to meet our clients' needs.

Role summary:

We are seeking a Site Reliability Engineer to join our Ipsos Askia team. This role is ideal for someone who is passionate about creating and implementing automated solutions that enhance software deployment and operational efficiency across cloud platforms. As an SRE, you will be responsible for developing and maintaining the infrastructure, tools, and automation needed to ensure the reliability, scalability, and performance of our systems.

Responsibilities:

Design, implement, and maintain fully automated infrastructure as a code to integrate in a CI/CD pipeline.
Be part of a 24/7 on-call SRE team.
Maintain platforms after launch by measuring and monitoring availability, performance, and overall system status.
Run the production environment by monitoring data availability, latency, and quality, taking a holistic view of system health
Develop the product with emphasis on the link between the end-user and the Product owner;
Troubleshoot, assess and resolve operational challenges and support escalation. Recover platforms during production incidents to meet targeted SLO with the end-users as a priority;
Compile blameless Postmortems with a focus on improving performance moving forward;
Work on the Error Budget with the Product Owner to implement the needed features;
Perform cyclic maintenance and updates of the operational landscape (software versions, patches, support for SW/HW maintenance activities of the provider, etc.);
Proactively identify and implement operational improvements for processes, performance, and reliability;
Handle operations related to monitoring/alerting of SLA-critical production platforms, resolving issues and manual intervention (close cooperation with software development and Ipsos Dev teams);
Ensure reliable operation of applications for projects across the organization, also influence shaping our delivery and mindset, so that IT and good business are aligned;
Provide active support during the design and guidance of developing new services for our platforms. Work as a senior member of a Scrum team;
Participate in the development and execution of the SRE strategy;
Apply the mindset and practices of the software engineer.

Requirements:

4+ years of experience, including DevOps, Software Engineering Site Reliability Engineer (SRE), and on-call rotations, working on highly scalable distributed systems
1+ years of experience with Power Shell scripting and Python
Bachelor’s or master’s degree in computer science or related fields or equivalent experience
Experience working with monitoring systems Zabbix, Prometheus, or similar
Experience using Code Version Control like Git
Experience with orchestration/automation tools (Ansible, Terraform, Packer, etc.)
Experience with container technologies (Docker, Kubernetes)
Experience deploying code using CI/CD tools like GitHub within change management procedures
Experience of working with Jira
Cloud-native application development and Cloud Technologies in GCP, AWS, or Azure
Strong understanding of cloud concepts, on-premises infrastructure, and platforms;
Proven ability to quickly learn new technical domains and then train others;
Great verbal and written communication skills;
Knowledge of web security, networking, and application architecture.