Freelance Agent Evaluation Engineer

Remote
Part Time Mid Level 5+ years

Posted 1 week ago

Interested in this position?

Upload your resume and we'll match you with this and other relevant opportunities.

Upload Your Resume

About This Role

This role involves creating challenging coding test cases to evaluate and improve AI coding systems for leading tech companies. The Freelance Agent Evaluation Engineer will focus on reviewing, refining, and writing comprehensive functional tests to analyze AI's strengths and weaknesses.

Responsibilities

  • Review and refine realistic coding tasks based on provided production codebases
  • Write comprehensive functional tests that validate actual end-to-end behavior and edge-cases
  • Craft "fair but hard" challenges where the AI has all the context it needs but requires complex reasoning
  • Analyze AI failures to understand model struggles versus mastery
  • Iterate based on feedback from expert QA reviewers who score work on quality criteria

Requirements

  • Degree in Computer Science, Software Engineering or related fields
  • 5+ years in software development, primarily Python (pytest, async/await, subprocess, file operations)
  • Background in Full-Stack development, with equal focus on React-based interfaces and robust Back-end systems
  • Experience writing tests (functional, integration)
  • Experience with Docker containers (running evaluations locally)
  • CI/CD understanding (GitHub Actions as a user: triggers, labels, reading results)
  • English proficiency - B2

Qualifications

  • Degree in Computer Science, Software Engineering or related fields
  • 5+ years in software development

Skills

Python * Docker * REACT * CI/CD * GitHub Actions * Pytest *

* Required skills

About Mindrift

Mindrift connects specialists with AI projects from major tech innovators, unlocking the potential of Generative AI by tapping into real-world expertise from across the globe.

Technology
View all jobs at Mindrift →