Home » Jobs » IT Jobs In Kenya » AI Evaluation Engineer (Software Engineering / Code)
Candidates Experience With Us + Latest Updates

Personalized Support for Your Success

Upcoming Trainings & Events

AI Evaluation Engineer (Software Engineering / Code)

IT Jobs. Gramian Consultancy Jobs

  • Design and build multi-agent benchmark tasks based on real-world code changes (bug fixes, migrations, refactors)
  • Work with the Harbor evaluation framework to run and validate tasks in containerized environments
  • Write clear, precise task instructions (file paths, function signatures, expected behavior, constraints)
  • Develop Python-based verification scripts to validate correctness of code changes
  • Define task decomposition strategies across multiple specialized agents
  • Analyze and navigate large open-source codebases to extract realistic task scenarios
  • Run, debug, and refine tasks in Docker environments to ensure reproducibility
  • Improve task quality, clarity, and difficulty based on evaluation results
  • 5+ years of experience in software development (Python and JavaScript)
  • Strong experience working with large codebases (e.g., Django, Flask, FastAPI, Node.js or similar)
  • Familiarity with Git workflows (pull requests, diffs, commits, cherry-picking)
  • Experience writing tests or validation scripts (pytest, unittest, or similar)
  • Ability to write clear, precise technical specifications
  • Familiarity with AI coding benchmarks or evaluation frameworks (e.g., SWE-bench or similar)
  • Hands-on experience with Docker (Dockerfiles, image builds, debugging)

Click Here to Apply

🚨 Before You Apply for This Job. Need Help With Your CV?

Career Lessons + Experiences

Labour Laws – Know Your Rights