Freelance Agent Evaluation Engineer
RemotePosted 2 months ago Expired
This job has expired
Looking for a job like Freelance Agent Evaluation Engineer? Upload your resume and we'll notify you when similar positions become available.
Upload Your ResumeAbout This Role
Mindrift is seeking a Freelance Agent Evaluation Engineer to join project-based AI opportunities for leading tech companies, focusing on testing, evaluating, and improving AI systems. This role involves creating structured test cases, defining gold-standard behavior, and analyzing agent logs to ensure production-ready scenarios.
Responsibilities
- Create structured test cases simulating complex human workflows
- Define gold-standard behavior and scoring logic to evaluate agent actions
- Analyze agent logs, failure modes, and decision paths
- Work with code repositories and test frameworks to validate scenarios
- Iterate on prompts, instructions, and test cases to improve clarity and difficulty
- Ensure scenarios are production-ready, easy to run, and reusable
Requirements
- 3+ years of software development experience with strong Python focus
- Experience with Git and code repositories
- Comfortable with structured formats like JSON/YAML for scenario description
- Understanding core LLM limitations and their effect on evaluation design
- Familiarity with Docker
- English proficiency - B2
Qualifications
- 3+ years of software development experience with strong Python focus
Skills
* Required skills
About Mindrift
Mindrift connects specialists with AI projects from major tech innovators, unlocking the potential of Generative AI by tapping into real-world expertise from across the globe.