Freelance Agent Evaluation Engineer

Remote
Part Time Mid Level 3+ years

Posted 3 weeks ago

Interested in this position?

Upload your resume and we'll match you with this and other relevant opportunities.

Upload Your Resume

About This Role

Join Mindrift as a Freelance Agent Evaluation Engineer, working on project-based AI opportunities for leading tech companies to test, evaluate, and improve AI systems. This role involves creating structured test cases and analyzing AI agent behavior.

Responsibilities

  • Create structured test cases simulating complex human workflows
  • Define gold-standard behavior and scoring logic for agent evaluation
  • Analyze agent logs, failure modes, and decision paths
  • Work with code repositories and test frameworks to validate scenarios
  • Iterate on prompts, instructions, and test cases to improve clarity and difficulty
  • Ensure scenarios are production-ready, easy to run, and reusable

Requirements

  • 3+ years of software development experience with strong Python focus
  • Experience with Git and code repositories
  • Comfortable with structured formats like JSON/YAML for scenario description
  • Understanding core LLM limitations (hallucinations, bias, context limits) and how these affect evaluation design
  • Familiarity with Docker
  • English proficiency - B2

Qualifications

  • 3+ years of software development experience

Skills

Python * Docker * Yaml * JSON * Git *

* Required skills

About Mindrift

Mindrift connects specialists with AI projects from major tech innovators, unlocking the potential of Generative AI by tapping into real-world expertise from across the globe.

Technology
View all jobs at Mindrift →