AI Evaluation Lead (JetBrains AI)

JetBrains

18 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Tech stack

Artificial Intelligence

Data analysis

Programming Tools

Python

Machine Learning

Open Source Technology

Large Language Models

Kotlin

Crowd Sourcing

Job description

At JetBrains, code is our passion. Ever since we started back in 2000, we have been striving to make the world's most robust and effective developer tools. By automating routine checks and corrections, our tools speed up production, freeing developers to grow, discover, and create.

The JetBrains AI team is focused on bringing advanced AI capabilities to JetBrains products, which includes supporting the internal AI platform used across JetBrains and conducting long-term R&D in AI and machine learning. We collaborate closely with product teams to brainstorm and prioritize AI-driven features, as well as support product marketing and release planning. Our team includes about 50 people working on everything from classical ML algorithms and code completion to agents, retrieval-augmented generation, and more.

We're looking to strengthen our team with an AI Evaluation Lead who will help define and execute our strategy for evaluating AI-powered features and LLMs. In this role, you will be instrumental in ensuring our models deliver meaningful value to users, by shaping evaluation pipelines, influencing model development, collaborating with product and research teams across the company, and publishing your work to open source.

We value engineers who:

Plan their work and make decisions independently, consulting with others if needed.
Follow the latest advances in AI and ML fields, think long-term, and take ownership of their scope of work.
Prefer simplicity, opting for sound, robust, and efficient solutions.

In this role, you will:

Design and develop rigorous offline and online evaluation benchmarks for AI features and LLMs.
Manage the team, prioritize tasks, and mentor teammates.
Define evaluation methodology and benchmarks for our open-source models and public releases.
Communicate your findings and best practices across the organization.

Requirements

Do you have experience in Python?, * Expertise in evaluating generative AI methods.

A strong understanding of statistics and data analysis.
Excellent management and communication skills.
Solid practical experience with Python and evaluation frameworks.
Attention to detail in everything you do.

We'd be especially thrilled if you have experience with:

Preparing public evaluation reports for feature or model releases.
Managing data annotation efforts, including crowdsourcing and in-house labeling.
CI systems, workflow automation, and experiment tracking.
The Kotlin programming language.