Freelance Agent Evaluation Engineer

Remote

Mindrift Up to $80

Part Time Mid Level 3+ years

Posted 3 months ago Expired

This job has expired

Looking for a job like Freelance Agent Evaluation Engineer? Upload your resume and we'll notify you when similar positions become available.

Upload Your Resume

About This Role

Mindrift is seeking a Freelance Agent Evaluation Engineer to join project-based AI opportunities for leading tech companies, focusing on testing, evaluating, and improving AI systems. This role involves creating structured test cases, defining gold-standard behavior, and analyzing agent logs to ensure production-ready scenarios.

Responsibilities

Create structured test cases simulating complex human workflows
Define gold-standard behavior and scoring logic to evaluate agent actions
Analyze agent logs, failure modes, and decision paths
Work with code repositories and test frameworks to validate scenarios
Iterate on prompts, instructions, and test cases to improve clarity and difficulty
Ensure scenarios are production-ready, easy to run, and reusable

Requirements

3+ years of software development experience with strong Python focus
Experience with Git and code repositories
Comfortable with structured formats like JSON/YAML for scenario description
Understanding core LLM limitations and their effect on evaluation design
Familiarity with Docker
English proficiency - B2

Qualifications

3+ years of software development experience with strong Python focus

Skills

Python * Docker * Yaml * JSON * Git * LLM *

* Required skills

About Mindrift

Mindrift connects specialists with AI projects from major tech innovators, unlocking the potential of Generative AI by tapping into real-world expertise from across the globe.

Technology

View all jobs at Mindrift →

Similar Jobs

Freelance Agent Evaluation Engineer

This job has expired

About This Role

Responsibilities

Requirements

Qualifications

Skills

About Mindrift

Related Searches

Similar Jobs

Freelance Agent Evaluation Engineer

Evaluation Scenario Writer - AI Agent Testing Specialist

Evaluation Scenario Writer - AI Agent Testing Specialist

Evaluation Scenario Writer - AI Agent Testing Specialist

Freelance Mechanical Engineering & Python Expert - AI Trainer