Evaluation Scenario Writer - AI Agent Testing Specialist
RemotePosted 1 month ago Expired
This job has expired
Looking for a job like Evaluation Scenario Writer - AI Agent Testing Specialist in or near New York, NY? Upload your resume and we'll notify you when similar positions become available.
Upload Your ResumeAbout This Role
Create challenging coding test cases and refine realistic coding tasks for AI coding systems by writing comprehensive functional tests that validate end-to-end behavior and edge-cases. Analyze AI failures to improve model performance and iterate based on expert feedback.
Responsibilities
- Review and refine realistic coding tasks based on provided production codebases with realistic scope, requirements and information sources
- Write comprehensive functional tests that validate actual end-to-end behavior and edge-cases
- Craft "fair but hard" challenges where the AI has all the context it needs, but has to work for it
- Analyze AI failures to understand what the model struggles with vs. what it masters
- Iterate based on feedback from expert QA reviewers who score your work on 7 quality criteria
Requirements
- Degree in Computer Science, Software Engineering or related fields
- 5+ years in software development, primarily Python (pytest, async/await, subprocess, file operations)
- Background in Full-Stack development, with an equal focus on building React-based interfaces and robust Back-end systems
- Experience writing tests (functional, integration)
- Docker containers (running evaluations locally in containers)
- CI/CD understanding (GitHub Actions as a user: triggers, labels, reading results)
- English proficiency - B2
Qualifications
- Degree in Computer Science, Software Engineering or related fields
- 5+ years in software development, primarily Python
Skills
* Required skills
About Mindrift
Mindrift connects specialists with AI projects from major tech innovators, unlocking the potential of Generative AI by tapping into real-world expertise from across the globe.