Evaluation Scenario Writer - AI Agent Testing Specialist
Part Time
Mid Level
5+ years
Posted 1 week ago
Interested in this position?
Upload your resume and we'll match you with this and other relevant opportunities.
Upload Your ResumeAbout This Role
Create challenging and realistic coding test cases to evaluate and improve AI coding systems' capabilities. Focus on designing comprehensive functional tests that validate end-to-end behavior and analyze AI failures to understand model strengths and weaknesses.
Responsibilities
- Review and refine realistic coding tasks based on provided production codebases
- Write comprehensive functional tests that validate actual end-to-end behavior and edge-cases
- Craft "fair but hard" challenges for AI systems, requiring complex reasoning and information retrieval across scattered data
- Analyze AI failures to understand model struggles versus mastery
- Iterate based on feedback from expert QA reviewers
Requirements
- Degree in Computer Science, Software Engineering or related fields
- 5+ years in software development, primarily Python (pytest, async/await, subprocess, file operations)
- Background in Full-Stack development (React-based interfaces and robust Back-end systems)
- Experience writing tests (functional, integration)
- Docker containers (running evaluations locally in containers)
- CI/CD understanding (GitHub Actions as a user: triggers, labels, reading results)
- English proficiency - B2
Qualifications
- Degree in Computer Science, Software Engineering or related fields
- 5+ years in software development, primarily Python
Skills
Python
*
Docker
*
REACT
*
CI/CD
*
GitHub Actions
*
Pytest
*
* Required skills
About Mindrift
Mindrift connects specialists with AI projects from major tech innovators, unlocking the potential of Generative AI by tapping into real-world expertise from across the globe.
Technology
View all jobs at Mindrift →
Related Searches
Similar Jobs
Freelance Agent Evaluation Engineer
Active Remote
Mindrift
Python
Docker
REACT
CI/CD
+2 more
1 week ago
Evaluation Scenario Writer - AI Agent Testing Specialist
Active Remote
Mindrift
·
New York, NY
Python
Docker
REACT
CI/CD
+2 more
1 week ago
Evaluation Scenario Writer - AI Agent Testing Specialist
Active Remote
Mindrift
Python
Docker
REACT
CI/CD
+5 more
1 week ago
Freelance Mechanical Engineering & Python Expert - AI Trainer
Active Remote
Mindrift
Python
Pandas
NumPy
SciPy
1 week ago
Freelance Mechanical Engineering & Python Expert - AI Trainer
Active Remote
Mindrift
Python
Pandas
NumPy
SciPy
1 week ago