Site Reliability Engineer, Apple Data Platform

Apple Inc.
Austin, United States of America
4 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Austin, United States of America

Tech stack

Java
Amazon Web Services (AWS)
Cloud Computing
Configuration Management
Data Infrastructure
Linux
Disaster Recovery
Distributed Data Store
Distributed Systems
Hadoop
Hadoop Distributed File System
Hive
Python
Network Protocols
Open Source Technology
Performance Tuning
Reliability Engineering
Software Engineering
Virtualization Technology
Data Processing
Cloud Platform System
Performance Testing
System Availability
Siri
Kubernetes
Information Technology
Druid
Apache Flink
Bare Metal
Data Analytics

Job description

People at Apple don't just build products - they craft the kind of experience that have revolutionized entire industries. The diverse collection of our people and their ideas inspire innovation in everything we do. Imagine what you could do here! Join Apple, and help us leave the world better than we found it. Apple Services Engineering (ASE) is responsible for designing and maintaining the systems, platforms, and infrastructure that support Apple's global services, such as Apple Music, iCloud, Siri, Maps, and many more. Our work forms the foundation upon which our world-class software developers build the products our customers love. We are seeking innovative and dedicated Site Reliability Engineers to help us sustain our mission of providing the highest quality experience for our customers. ASE services must scale globally, remain highly available and consistently performant. If you are passionate about designing, engineering, and running systems and infrastructure that will help millions of customers, then this is the place for you!, Apple Services infrastructure is planetary scale. Our Data Platform Site Reliability Engineering team manages the infrastructure and applications on bare-metal and cloud computing platforms to deliver data processing, governance, and storage for many of Apple's global products and organizations. Our platform teams work with exabytes of data, terabytes of memory, and hundreds of thousands of jobs running millions of executors to support predicable and performant data analytics. Our platform enables key features in Apple Music, TV, Maps, News, and other world class products. Ensuring all of these technologies in geographically distributed data centers work together in harmony presents unique challenges.","responsibilities":"You'll need to solve problems that arise using empirical data, teamwork, and your own unique expertise.

Data Platform Services SREs work directly with our partner engineering teams, tightly collaborating with the software developers to deliver seamless experiences for our customers.

We run a mix of open source, vendor licensed, and proprietary tools which you will use and have opportunities to improve upon.

The cross functional team collaborates to ensure we apply a consistent incident management process across all data platform services and provide user journey based SLOs derived from exhaustive observability metrics, high availability architecture, and automation for deployments.

Requirements

Do you have experience in Virtualization?, Do you have a Bachelor's degree?, Proficiency with the architecture, deployment, performance tuning, and troubleshooting of open source data analytics or governance technologies such as Flink, Hive, Hadoop/HDFS, Trino, and/or Druid.

Proficiency in managing applications and infra on AWS, GCP and Ali Cloud.

The successful candidate is frustrated with toil and has an acute drive to both automate manual operations and evolve them into automatic processes.

Minimum Qualifications

BS/MS in Computer Science or Equivalent

5+ years of software development or production operations experience in a large-scale environment

Proficiency in authoring and releasing code in Go, Python, or Java using common configuration management and software delivery platforms

Experience operating production applications at scale, including well designed performance testing, HA and disaster recovery concepts, capacity planning, and managing distributed systems on internal and public cloud infrastructure, principally Kubernetes

Understanding of the Linux Operating System, containers and virtualization, standard networking protocols, and components

Strong sense of ownership and integrity demonstrated through clear communication and collaboration

Demonstrates excellent troubleshooting and problem solving skills using the scientific method

Apply for this position