Interested in this position?
Upload your resume and we'll match you with this and other relevant opportunities.
Upload Your ResumeAbout This Role
Join Mindrift as a Freelance Agent Evaluation Engineer, working on project-based AI opportunities for leading tech companies to test, evaluate, and improve AI systems. This role involves creating structured test cases and analyzing AI agent behavior.
Responsibilities
- Create structured test cases simulating complex human workflows
- Define gold-standard behavior and scoring logic for agent evaluation
- Analyze agent logs, failure modes, and decision paths
- Work with code repositories and test frameworks to validate scenarios
- Iterate on prompts, instructions, and test cases to improve clarity and difficulty
- Ensure scenarios are production-ready, easy to run, and reusable
Requirements
- 3+ years of software development experience with strong Python focus
- Experience with Git and code repositories
- Comfortable with structured formats like JSON/YAML for scenario description
- Understanding core LLM limitations (hallucinations, bias, context limits) and how these affect evaluation design
- Familiarity with Docker
- English proficiency - B2
Qualifications
- 3+ years of software development experience
Skills
Python
*
Docker
*
Yaml
*
JSON
*
Git
*
* Required skills
About Mindrift
Mindrift connects specialists with AI projects from major tech innovators, unlocking the potential of Generative AI by tapping into real-world expertise from across the globe.
Technology
View all jobs at Mindrift →
Related Searches
Similar Jobs
Freelance Agent Evaluation Engineer
Active Remote
Mindrift
Python
Docker
REACT
CI/CD
+2 more
1 week ago
Evaluation Scenario Writer - AI Agent Testing Specialist
Active Remote
Mindrift
·
New York, NY
Python
Docker
REACT
CI/CD
+2 more
1 week ago
Evaluation Scenario Writer - AI Agent Testing Specialist
Active Remote
Mindrift
Python
Docker
REACT
CI/CD
+5 more
1 week ago
Evaluation Scenario Writer - AI Agent Testing Specialist
Active
Mindrift
Python
Docker
REACT
CI/CD
+2 more
1 week ago
Freelance Mechanical Engineering & Python Expert - AI Trainer
Active Remote
Mindrift
Python
Pandas
NumPy
SciPy
1 week ago