Freelance Agent Evaluation Engineer
Remote
Mindrift
Up to $80
Part Time
Mid Level
3+ years
Posted 3 weeks ago
Interested in this position?
Upload your resume and we'll match you with this and other relevant opportunities.
Upload Your ResumeAbout This Role
Mindrift is seeking a Freelance Agent Evaluation Engineer to join project-based AI opportunities for leading tech companies, focusing on testing, evaluating, and improving AI systems. This role involves creating structured test cases, defining gold-standard behavior, and analyzing agent logs to ensure production-ready scenarios.
Responsibilities
- Create structured test cases simulating complex human workflows
- Define gold-standard behavior and scoring logic to evaluate agent actions
- Analyze agent logs, failure modes, and decision paths
- Work with code repositories and test frameworks to validate scenarios
- Iterate on prompts, instructions, and test cases to improve clarity and difficulty
- Ensure scenarios are production-ready, easy to run, and reusable
Requirements
- 3+ years of software development experience with strong Python focus
- Experience with Git and code repositories
- Comfortable with structured formats like JSON/YAML for scenario description
- Understanding core LLM limitations and their effect on evaluation design
- Familiarity with Docker
- English proficiency - B2
Qualifications
- 3+ years of software development experience with strong Python focus
Skills
Python
*
Docker
*
Yaml
*
JSON
*
Git
*
LLM
*
* Required skills
About Mindrift
Mindrift connects specialists with AI projects from major tech innovators, unlocking the potential of Generative AI by tapping into real-world expertise from across the globe.
Technology
View all jobs at Mindrift →
Related Searches
Similar Jobs
Freelance Agent Evaluation Engineer
Active Remote
Mindrift
Python
Docker
REACT
CI/CD
+2 more
1 week ago
Evaluation Scenario Writer - AI Agent Testing Specialist
Active Remote
Mindrift
·
New York, NY
Python
Docker
REACT
CI/CD
+2 more
1 week ago
Evaluation Scenario Writer - AI Agent Testing Specialist
Active Remote
Mindrift
Python
Docker
REACT
CI/CD
+5 more
1 week ago
Evaluation Scenario Writer - AI Agent Testing Specialist
Active
Mindrift
Python
Docker
REACT
CI/CD
+2 more
1 week ago
Freelance Mechanical Engineering & Python Expert - AI Trainer
Active Remote
Mindrift
Python
Pandas
NumPy
SciPy
1 week ago