Job Title- Sr. SW Engineer- LLM Evaluation & Repository Validation
Duration- Part-time- Short-term
Remote- Canada
Full stack (Backend - Java, Go, Node, Python, C++ &
Frontend - Typescript, JavaScript, jQuery, React, Vue, Angular).
Role Overview — What Does a Typical Day Look Like?
Work across multiple different projects to improve LLM performance on code: sample projects
- Leading and delivering end-to-end agent use cases such as home automation agents, coding copilots, or creative design assistants.
- Collaborate with the team to identify edge cases and ambiguities in model behavior.
- Review and compare 3–4 model-generated code responses per task using a structured ranking system.
- Evaluate code diffs for correctness, code quality, style, and efficiency. Provide clear, detailed rationales explaining the reasoning behind each ranking decision.
Required Skills & Experience
- Strong expertise in building full-stack applications and deploying scalable, production-grade software using modern languages and tools.
- Deep understanding of software architecture, design, development, debugging, and code quality/review assessment.
- Proven ability to review code diffs and evaluate correctness, maintainability, and efficiency.
- Excellent oral and written communication skills for clear, structured evaluation rationales.