Interested in this position?
Upload your resume and we'll match you with this and other relevant opportunities.
Upload Your ResumeAbout This Role
This freelance role involves evaluating and improving AI systems for leading tech companies by creating structured test cases, defining gold-standard behavior, and analyzing agent performance. It's a project-based opportunity focusing on testing and enhancing AI.
Responsibilities
- Create structured test cases simulating complex human workflows
- Define gold-standard behavior and scoring logic for agent actions
- Analyze agent logs, failure modes, and decision paths
- Work with code repositories and test frameworks to validate scenarios
- Iterate on prompts, instructions, and test cases to improve clarity and difficulty
- Ensure scenarios are production-ready, easy to run, and reusable
Requirements
- 3+ years of software development experience
- Strong Python focus
- Experience with Git and code repositories
- Comfortable with structured formats like JSON/YAML for scenario description
- Understanding core LLM limitations (hallucinations, bias, context limits)
- English proficiency - B2
Qualifications
- 3+ years of software development experience with a strong Python focus
Nice to Have
- Familiarity with Docker
Skills
Python
*
Docker
*
Yaml
*
JSON
*
Git
*
* Required skills
About Mindrift
Mindrift connects specialists with AI projects from major tech innovators, unlocking the potential of Generative AI by tapping into real-world expertise from across the globe.
Technology
View all jobs at Mindrift →
Related Searches
Similar Jobs
Freelance Agent Evaluation Engineer
Active Remote
Mindrift
Python
Docker
REACT
CI/CD
+2 more
1 week ago
Evaluation Scenario Writer - AI Agent Testing Specialist
Active Remote
Mindrift
·
New York, NY
Python
Docker
REACT
CI/CD
+2 more
1 week ago
Evaluation Scenario Writer - AI Agent Testing Specialist
Active Remote
Mindrift
Python
Docker
REACT
CI/CD
+5 more
1 week ago
Evaluation Scenario Writer - AI Agent Testing Specialist
Active
Mindrift
Python
Docker
REACT
CI/CD
+2 more
1 week ago
Freelance Mechanical Engineering & Python Expert - AI Trainer
Active Remote
Mindrift
Python
Pandas
NumPy
SciPy
1 week ago