Job Title: Senior Software Engineer
Reports to: Software Development Manager
Location: Ireland/EU - Home/Office, flexible / USA
Purpose:
The Senior Software Engineer will be responsible for building and deploying cutting-edge AI solutions into real-time user applications and high-volume pipelines. We use a combination of serverless and AWS infrastructure fully defined as code, and continuously integrated and deployed.
We are early adopters of agentic coding tools to drive both velocity and engineering excellence. We deliver value through modern agile DevOps principles – continuous delivery, metric tracking, and rapid user feedback.
We believe in long-term success by establishing a sustainable pace of development, flexible working hours, and setting enough time aside to take care of your physical and emotional well-being.
Key Responsibilities:
- Design and build production-grade AI/LLM processing pipelines (document ingestion, extraction, classification, summarization, RAG) with a focus on reliability, throughput, and unit economics.
- Own observability for AI workloads end-to-end — structured logs, metrics, traces, prompt/response capture, token and cost attribution, and quality evals — integrated with Datadog.
- Build and maintain the deployment and runtime infrastructure for these pipelines on AWS (EKS, Lambda, Step Functions, SQS/Kinesis, S3) using Infrastructure-as-Code (Terraform / CDK) with reusable, reviewable modules.
- Establish CI/CD for AI services and models: automated testing, regression evals, safe rollouts (canary / blue-green), and rollback paths for both code and prompt/model changes.
- Drive engineering excellence in the AI team — pipeline architecture patterns, versioning of prompts/models/datasets, reproducibility, and separation of batch vs. real-time paths.
- Partner with SRE, Platform, and Product Engineering to harden shared services (secrets, networking, identity, data access) and ensure AI workloads meet security and compliance requirements.
- Mentor engineers on building maintainable, testable AI systems; raise the bar on code review, design documents, and operational readiness reviews.
- Embed SOC 2 security and compliance practices into design and implementation decisions, proactively raising gaps in access control, logging, and data protection.
Experience & Qualifications
- 7+ years building production distributed systems, with at least 2+ years operating AI/ML or LLM-based pipelines at scale.
- Deep experience with AWS (EKS/Kubernetes, Lambda, Step Functions, event-driven patterns) and Terraform or AWS CDK as a primary delivery mechanism.
- Strong proficiency in Python (and ideally TypeScript/Node) with production patterns for async processing, backpressure, retries, and idempotency.
- Proven track record with observability stacks (especially LGTM) applied specifically to probabilistic/LLM systems — evals, drift, hallucination detection, cost/latency SLOs.
- Experience integrating with foundation model providers (Anthropic, OpenAI, Gemini) and/or self-hosted inference, including prompt management, caching, and guardrails.
- Experience with vector stores, embeddings pipelines, and retrieval evaluation.
- Helpful to have experience with workflow tools like Argo or Dagster.