Interested in this position?
Upload your resume and we'll match you with this and other relevant opportunities.
Upload Your ResumeAbout This Role
Contribute to well-known open-source Python repositories for AI benchmarking projects. Responsibilities include evaluating the performance of coding agents, assessing outputs against technical criteria, and documenting findings.
Responsibilities
- Work with well-known, well-documented open-source Python repositories to support AI benchmarking projects
- Evaluate the performance of coding agents across open-ended software engineering tasks
- Assess agent outputs against predefined technical and quality criteria
- Apply real-world open-source development judgment to identify strengths, weaknesses, and edge cases in AI-generated code
- Document findings clearly to support model evaluation and improvement
- Collaborate asynchronously with research and engineering teams throughout the project lifecycle
Requirements
- Strong experience contributing to or maintaining open-source Python repositories
- Deep proficiency in Python
- Familiarity with common open-source development workflows
- Ability to evaluate code quality, correctness, and maintainability
- Strong analytical skills with high attention to detail
- Clear written communication for documenting evaluations and insights
- Comfortable working independently in a remote, project-based environment
Skills
Python
*
* Required skills
Related Searches
Similar Jobs
Visual Content Analyst
Active Remote
Crossing Hurdles
·
$30 - $30
1 week ago
SaaS Strategy Analyst (EDI)
Active Remote
Crossing Hurdles
·
$80 - $130
AI
EDI standards
ERP integrations
1 week ago
Full Stack Developer
Active
Canvas Credit Union
·
Lone Tree, CO
·
$99,238 - $114,119
SQL
Microsoft Office
C++
XML
+16 more
1 week ago
Full Stack Developer
Active Remote
Best Job Tool
Python
AWS
REACT
CI/CD
+7 more
1 week ago
Full Stack Developer
Active
University of Michigan
·
Ann Arbor, MI
Python
JavaScript
HTML
Generative AI
1 week ago