Verified Job On Employer Career Site
Job Summary:
Grafana Labs is a remote-first, open-source powerhouse known for its visualization tool used by millions globally. They are seeking a Senior Engineer to design and implement evaluation frameworks for Generative AI and Large Language Models, focusing on automated evaluation processes and metrics development.
Responsibilities:
• Design and implement robust evaluation frameworks for GenAI and LLM-based systems, including golden test sets, regression tracking, LLM-as-judge methods, and structured output verification.
• Develop tooling to enable automated, low-friction evaluation of model outputs, prompts, and agent behaviors.
• Define and refine metrics for both structure and semantics, ensuring alignment with realistic use cases and operational constraints.
• Lead the development of dataset management processes and guide teams across Grafana in best practices for GenAI evaluation.
Qualifications:
Required:
• Experience designing and implementing evaluation frameworks for AI/ML systems.
• Familiarity with prompt engineering, structured output evaluation, and context-window management in LLM systems.
• High autonomy to collaborate and translate team goals into clear, testable criteria supported by effective tooling.
Preferred:
• Experience working in environments with rapid iteration and experimental development.
• A pragmatic mindset that values reproducibility, developer experience, and thoughtful trade-offs when scaling GenAI systems.
• A passion for minimizing human toil and building AI systems that actively support engineers.
Company:
Grafana Labs is an open-source software platform built to support monitoring, visualization, and metric analytics. Founded in 2014, the company is headquartered in New York, New York, USA, with a team of 1001-5000 employees. The company is currently Late Stage. Grafana Labs has a track record of offering H1B sponsorships.