Freelance Agent Evaluation Engineer
RemotePosted 2 months ago Expired
This job has expired
Looking for a job like Freelance Agent Evaluation Engineer? Upload your resume and we'll notify you when similar positions become available.
Upload Your ResumeAbout This Role
Join Mindrift as a Freelance Agent Evaluation Engineer, working on project-based AI opportunities for leading tech companies to test, evaluate, and improve AI systems. This role involves creating structured test cases and analyzing AI agent behavior.
Responsibilities
- Create structured test cases simulating complex human workflows
- Define gold-standard behavior and scoring logic for agent evaluation
- Analyze agent logs, failure modes, and decision paths
- Work with code repositories and test frameworks to validate scenarios
- Iterate on prompts, instructions, and test cases to improve clarity and difficulty
- Ensure scenarios are production-ready, easy to run, and reusable
Requirements
- 3+ years of software development experience with strong Python focus
- Experience with Git and code repositories
- Comfortable with structured formats like JSON/YAML for scenario description
- Understanding core LLM limitations (hallucinations, bias, context limits) and how these affect evaluation design
- Familiarity with Docker
- English proficiency - B2
Qualifications
- 3+ years of software development experience
Skills
* Required skills
About Mindrift
Mindrift connects specialists with AI projects from major tech innovators, unlocking the potential of Generative AI by tapping into real-world expertise from across the globe.