AI Systems Engineer, Codex Agents

OpenAI Inc.

San Francisco, United States of America

yesterday

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Compensation

$ 230K

Job location

San Francisco, United States of America

Tech stack

API

Software Debugging

Programming Tools

Distributed Systems

Python

Open Source Technology

Virtualization Technology

Graphics Processing Unit (GPU)

Large Language Models

Backend

Build Management

Machine Learning Operations

Job description

The Codex Core Agents team builds the agent harness that turns model capability into real-world action. We own the systems around the model: prompting and interpreting model outputs, executing actions safely in real environments, and feeding production experience back into better models and better agent behavior. This team sits close to research and works across the stack: harness, model interaction, inference, sandboxed execution, orchestration, evals, production reliability, and the performance envelope around tokens, latency, cost, capacity, and quality. The harness is open source and increasingly part of how models are trained and evaluated, making this one of the highest-leverage layers in Codex.

About The Role We're looking for engineers to build the AI systems that make Codex agents dependable in production. The ideal candidate is an agent-systems builder: hands-on across low-level systems and ML workflows, able to debug Codex behavior end to end across the harness, model behavior, inference/runtime stack, GPU fleet, and product surface.

You'll work with research, infrastructure, and product to design agent harness capabilities, run experiments and ablations across the model + system prompt + harness stack, build frameworks for assessing production agent performance, and turn messy failures into durable improvements.

What You'll Do

Design and build the core agent harness and execution loop that lets Codex agents interpret model outputs, use tools, execute code, and complete long-horizon tasks safely.
Build sandboxing, isolation, orchestration, state, and workflow infrastructure for agents operating in real development environments.
Develop evaluation, experimentation, and debugging systems that distinguish harness issues, model behavior, inference/runtime issues, and product failures.
Run ablations across prompts, model-facing interfaces, context construction, tool-use strategies, and harness behavior to improve solve rate, reliability, latency, and cost.
Improve observability, profiling, and diagnostics across the agent stack, from backend systems to inference, GPUs, and fleet capacity.
Work closely with research to make the harness trainable, measurable, and useful for improving frontier agentic models.
Build shared primitives that make Codex faster, safer, more reliable, and easier for other teams and open-source users to build on. You Might Be A Good Fit If You
Have built or operated production systems in distributed systems, infrastructure, developer tooling, sandboxing, virtualization, cloud platforms, or ML systems.
Enjoy working across layers: Rust systems code, Python configuration layers, APIs, agent orchestration, evals, logs/traces, inference behavior, runtime constraints, and user outcomes.

Requirements

Have hands-on experience with LLM applications, coding agents, evals, model deployment, inference, compiler/runtime performance, or developer platforms.
Care deeply about reliability, safety, performance, debuggability, and clean abstractions.
Can debug from evidence and move quickly from ambiguous production failures to practical, durable fixes.
Want to work close to research while still shipping changes to production
Still write meaningful code, show strong ownership, and can lead scoped or multi-team AI systems work.

Bonus Points

Deep Rust, systems, sandboxing, isolation, or low-level platform experience.
Experience with coding agents, agent harnesses, tool-using LLM systems, model evals, or post-training feedback loops.
Background in compilers, kernels, runtimes, inference optimization, GPU systems, benchmarking, profiling, or performance engineering.
Experience building production infrastructure used by many engineers or users under demanding reliability and security constraints.
Open-source infrastructure or developer-platform work with strong taste for APIs and usability.

Benefits & conditions

$230K - $385K medical insurance, dental insurance, vision insurance, parental leave, paid time off, paid holidays, 401(k), retirement plan United States, California, San Francisco May 15, 2026

About the company

OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity., At OpenAI, we believe artificial intelligence has the potential to help people solve immense global challenges, and we want the upside of AI to be widely shared. Join us in shaping the future of technology.

Role details

Job location

Tech stack

Job description

Requirements

Benefits & conditions

About the company

Apply for this position

Good distractions

Moments

Videos View all