Full Stack Software Engineer - ML Compute Capacity

Apple Inc.

Santa Clara, United States of America

7 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Santa Clara, United States of America

Tech stack

API

Cloud Storage

Information Engineering

Software Debugging

Distributed Systems

Elasticsearch

Monitoring of Systems

Python

PostgreSQL

Machine Learning

Prometheus

Web Application Frameworks

React

Grafana

Backend

Kubernetes

Optimization Algorithms

REST

Data Pipelines

Job description

As a senior engineer on the ML Compute Capacity team, you will design, build, and operate the production systems that ensure compute resources are optimally distributed throughout the company. You'll work across the stack - from data pipelines and backend services to APIs and interactive frontends - developing telemetry systems, optimization algorithms, policies, and intuitive tools for managing demand and improving efficiency across Apple's largest accelerator fleet. Our small, nimble team works in a high-autonomy, fast-paced environment, and we're passionate about digging into data patterns, laying out the performance characteristics of an entire distributed system, and knowledge sharing. If the opportunity to own and operate services that scale, stay highly available, and "just work" excites you, then please reach out to us!

Requirements

5+ years of experience in relevant areas
Proficiency in Python for production backend and data engineering work
Experience building data pipelines and crafting robust queries over large-scale, multi-source data (e.g., Trino, PostgreSQL, Elasticsearch)
Experience designing and building RESTful APIs and working with cloud storage technologies
Experience with modern web frameworks like React
Experience with observability tools (e.g., Prometheus, Grafana) or equivalent monitoring systems
Excellent problem-framing and problem-solving skills
Strong CS fundamentals
Bachelor's degree or higher in Engineering, Mathematics, Economics, or a related quantitative field, * Experience operating Kubernetes at production scale - including scheduling, resource management, and cluster debugging
Familiarity with accelerator utilization patterns across ML training and inference
Strong interest with capacity planning, cost attribution, or FinOps systems

About the company

Scaling machine learning workloads across thousands of accelerators creates challenges that few engineers ever encounter. In Apple's Machine Learning Platform Technologies organization, we build the infrastructure that powers large-scale ML training and inference workloads, bringing together expertise in distributed systems, machine learning infrastructure, and high-performance computing.

Role details

Job location

Tech stack

Job description

Requirements

About the company

Apply for this position

Good distractions

Moments

Videos View all