Evaluation Scenario Writer - AI Agent Testing Specialist

Remote
Part Time Mid Level 5+ years

Posted 1 week ago

Interested in this position?

Upload your resume and we'll match you with this and other relevant opportunities.

Upload Your Resume

About This Role

Create challenging and comprehensive coding test cases for AI coding systems to evaluate their capabilities. Analyze AI failures to understand model strengths and weaknesses, iterating based on expert feedback.

Responsibilities

  • Review and refine realistic coding tasks based on provided production codebases
  • Write comprehensive functional tests that validate end-to-end behavior and edge-cases
  • Craft "fair but hard" challenges requiring complex reasoning with scattered information
  • Analyze AI failures to understand model struggles vs. mastery
  • Iterate based on feedback from expert QA reviewers

Requirements

  • 5+ years in software development, primarily Python (pytest, async/await, subprocess, file operations)
  • Background in Full-Stack development (React-based interfaces and robust Back-end systems)
  • Experience writing tests (functional, integration)
  • Docker containers (running evaluations locally)
  • CI/CD understanding (GitHub Actions as a user)
  • English proficiency - B2

Qualifications

  • Degree in Computer Science, Software Engineering or related fields
  • 5+ years in software development, primarily Python

Skills

Python * Docker * REACT * CI/CD * GitHub Actions * Pytest * async/await * subprocess * file operations *

* Required skills

About Mindrift

Mindrift connects specialists with AI projects from major tech innovators, unlocking the potential of Generative AI by tapping into real-world expertise from across the globe.

Technology
View all jobs at Mindrift →