Senior Site Reliability Engineer
Role details
Job location
Tech stack
Job description
Apple is looking for a Senior Engineer with systems and software engineering experience to join our Satellite Communications Group SRE team. The SRE team builds, monitors, and maintains large scale, highly resilient systems that enable our customers to access communications services via satellite. You'll be contributing to distributed systems, architecture design, and cloud infrastructure (as code!) for critical and unique customer- facing Apple services. This is a rare opportunity to build and control the entire end to end infrastructure, along with all supporting components such as provisioning, monitoring, deployment, and software platforms, from the beginning within a team with a no-ops culture.
Requirements
Do you have experience in Terraform?, Deep understanding of distributed systems principles, including consistency, fault tolerance, and scalability. Strong familiarity with consensus algorithms (e.g., Raft, Paxos, Zab, etc) Experience building and operating multi-clustered and highly-available services Experience with Temporal/Cadence/Windmill or other durable execution platforms Understanding of zero-trust application architecture Proven experience building and optimizing real-time and batch data processing pipelines using technologies such as Kafka, Spark, Flink, Beam, etc. Kubernetes experience, including cluster management as well as application deployment and configuration Experience with IoT/Edge device compute and infrastructure Experience or interest in RF, Cellular, Satellite communications (Bluetooth, GPS, WiFi, LTE/5G) Experience with modern web-scale services including servers, vips, load balancers, proxies Experience working with monitoring and metrics platforms like Splunk and Prometheus Education: Engineering or technical BS is a positive but not required, Significant Software engineering or SRE/DevOps experience Strong experience with large-scale distributed systems (replication, high availability, data processing/streaming) Strong experience with Linux/UNIX administration, configuration, and monitoring Proficient in at least one of these languages: Python, Go, Rust, C++ Have written or contributed to a batch or realtime processing system Experience with cloud environments (AWS, GCP, Azure): identity & credential management, pub/sub, message queuing. Experience with infrastructure-as-code tools (e.g., Terraform, CloudFormation, Puppet, Flux, Ansible, etc). Experience with containerization and orchestration technologies (e.g., Docker, Kubernetes). Experience with zero-downtime deployments, job scheduling systems, event-based messaging systems Able to quickly learn and adapt to new technologies Strong operational and troubleshooting skills