Staff Machine Learning Engineer, Apple Services Engineering

Apple Inc.

Seattle, United States of America

yesterday

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Seattle, United States of America

Tech stack

Podcasting

Software Quality

ControlNet

Software Debugging

Python

Machine Learning

Software Deployment

Software Engineering

PyTorch

Large Language Models

Deep Learning

Generative AI

Information Technology

Machine Learning Operations

Stable Diffusion

Job description

The Staff Machine Learning Engineer - Multimodal Generation & Post-Training will be a senior individual contributor on a small, applied ML team focused on production multimodal systems. The role will lead fine-tuning and adaptation of diffusion and emerging video models, as well as post-training of small and medium LLMs for captioning, moderation, and retrieval-friendly descriptions.

The engineer will design data and evaluation workflows that use our large archive of weakly labeled music, podcast, film, TV, and short-form content to drive measurable quality and efficiency improvements. The role includes close collaboration with partner infra teams for model serving and with adjacent product and research groups to bring new capabilities into production.

Requirements

Master's degree in Computer Science, Electrical Engineering, or a related technical field, or equivalent practical experience.
5+ years of hands-on industry experience building and shipping machine learning systems to production.
Proven experience training and fine-tuning diffusion or other image/video generative models, including adapter-based methods such as LoRA.
Proficiency in Python and at least one major deep learning framework such as PyTorch.
Experience designing and operating ML pipelines for noisy or weakly labeled data, including offline evaluation and monitoring in production.
Strong software engineering skills, including code quality, experimentation discipline, and debugging/profiling of model performance., * PhD in Computer Science, Machine Learning, or a related technical field.
8+ years of industry experience with production multimodal systems spanning image, audio, and/or video.
Deep expertise with diffusion and video generation techniques (e.g., ControlNet/IP-Adapter, temporal consistency methods, sampling and latency optimization).
Experience with PEFT/QLoRA and post-training approaches such as DPO or related preference-based methods for small and mid-sized LLMs.
Background in ASR/VAD/diarization, OCR, multimodal retrieval, or face recognition with fine-grained temporal alignment.
Familiarity collaborating with infra/platform teams on model serving (e.g., batching strategies, quantization, observability) and translating requirements into reliable production deployments.
Demonstrated ability to define metrics, build evaluation harnesses, and communicate results clearly to cross-functional partners.
Track record of publications, patents, or open-source contributions in relevant areas of machine learning or multimodal modeling.