Principal Engineer, Search and AI Infrastructure, Machine Learning Platform & Infrastructure
Role details
Job location
Tech stack
Job description
We are seeking a Principal Engineer to provide leadership in building and evolving next-generation AI infrastructure for search and other product needs at Apple. In this role, you will shape the architecture and long-term technical strategy for large-scale inference systems that handle both internal workload and production traffic, integrate and evolve the web-scale search systems, work at the intersection of product innovation, AI research, and large scale distributed systems.
We design, build and maintain infrastructure to support features that empower billions of Apple users. We take full end-to-end ownership of our services, driving them through every stage meticulously, encompassing conception, design, implementation, deployment, and maintenance. As a result, each one of us takes our responsibilities seriously. In this team, you'll have the opportunity to work on incredibly complex large scale systems with trillions of records and petabytes of data, work along side teams to optimize inference for cutting edge model architectures, and build production grade solutions for millions of customers in real time.
Requirements
- Bachelor's degree in Computer Science, relevant technical field, or equivalent practical experience
- Strong background in computer science: algorithms, data structures and system design
- 15+ year experience on large scale distributed system design, operation and optimization with over 10 years of leading teams
- Has managed work across a large organization, demonstrated the ability to develop strong leaders, with a consistent track record of executional excellence
- Excellent collaboration skills, excelling at both high-level thinking & execution as well as in the ability to influence and inspire others to achieve a common goal, * Preferred qualifications
- Master's degree or PhD in Computer Science or related technical fields
- Experience supporting distributed training inference workloads in production, ML systems performance profiling, debugging, and optimization
- Proficiency in cloud-native architectures and orchestration platforms (e.g., Kubernetes)
- Familiar with fundamental Deep Learning architectures such as Transformers, Encoder/Decoder models
- Familiarity with Nvidia TensorRT-LLM, vLLLM, DeepSpeed, Nvidia Triton Server etc
- Hands-on experience working with ML accelerators such as GPUs and TPUs