Site Reliability Engineer, Apple Data Platform

Apple Inc.

Austin, United States of America

1 month ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Austin, United States of America

Tech stack

Java

Amazon Web Services (AWS)

Cloud Computing

Configuration Management

Data Infrastructure

Linux

Disaster Recovery

Distributed Data Store

Distributed Systems

Hadoop

Hadoop Distributed File System

Hive

Python

Network Protocols

Open Source Technology

Performance Tuning

Reliability Engineering

Software Engineering

Virtualization Technology

Data Processing

Cloud Platform System

Performance Testing

System Availability

Siri

Kubernetes

Information Technology

Druid

Apache Flink

Bare Metal

Data Analytics

Job description

People at Apple don't just build products - they craft the kind of experience that have revolutionized entire industries. The diverse collection of our people and their ideas inspire innovation in everything we do. Imagine what you could do here! Join Apple, and help us leave the world better than we found it. Apple Services Engineering (ASE) is responsible for designing and maintaining the systems, platforms, and infrastructure that support Apple's global services, such as Apple Music, iCloud, Siri, Maps, and many more. Our work forms the foundation upon which our world-class software developers build the products our customers love. We are seeking innovative and dedicated Site Reliability Engineers to help us sustain our mission of providing the highest quality experience for our customers. ASE services must scale globally, remain highly available and consistently performant. If you are passionate about designing, engineering, and running systems and infrastructure that will help millions of customers, then this is the place for you!, Apple Services infrastructure is planetary scale. Our Data Platform Site Reliability Engineering team manages the infrastructure and applications on bare-metal and cloud computing platforms to deliver data processing, governance, and storage for many of Apple's global products and organizations. Our platform teams work with exabytes of data, terabytes of memory, and hundreds of thousands of jobs running millions of executors to support predicable and performant data analytics. Our platform enables key features in Apple Music, TV, Maps, News, and other world class products. Ensuring all of these technologies in geographically distributed data centers work together in harmony presents unique challenges.","responsibilities":"You'll need to solve problems that arise using empirical data, teamwork, and your own unique expertise.

Data Platform Services SREs work directly with our partner engineering teams, tightly collaborating with the software developers to deliver seamless experiences for our customers.

We run a mix of open source, vendor licensed, and proprietary tools which you will use and have opportunities to improve upon.

The cross functional team collaborates to ensure we apply a consistent incident management process across all data platform services and provide user journey based SLOs derived from exhaustive observability metrics, high availability architecture, and automation for deployments.

Requirements

Do you have experience in Virtualization?, Do you have a Bachelor's degree?, Proficiency with the architecture, deployment, performance tuning, and troubleshooting of open source data analytics or governance technologies such as Flink, Hive, Hadoop/HDFS, Trino, and/or Druid.

Proficiency in managing applications and infra on AWS, GCP and Ali Cloud.

The successful candidate is frustrated with toil and has an acute drive to both automate manual operations and evolve them into automatic processes.

Minimum Qualifications

BS/MS in Computer Science or Equivalent

5+ years of software development or production operations experience in a large-scale environment

Proficiency in authoring and releasing code in Go, Python, or Java using common configuration management and software delivery platforms

Experience operating production applications at scale, including well designed performance testing, HA and disaster recovery concepts, capacity planning, and managing distributed systems on internal and public cloud infrastructure, principally Kubernetes

Understanding of the Linux Operating System, containers and virtualization, standard networking protocols, and components

Strong sense of ownership and integrity demonstrated through clear communication and collaboration

Demonstrates excellent troubleshooting and problem solving skills using the scientific method

Role details

Job location

Tech stack

Job description

Requirements

Apply for this position

Good distractions

Moments

Videos View all