Senior Software Engineer (Platform Data Reliability & Automation)
Role details
Job location
Tech stack
Job description
This role focuses on the reliability and automation of NoSQL, Streaming, and Caching services across AWS and GCP environments. You'll design robust automation frameworks, ensure high availability, and partner with product and platform teams to deliver resilient, highly available infrastructure supporting billions of transactions and millions of players globally.
By embracing Development & DBRE principles, driving automation-first practices, and applying AI/ML where applicable, you'll enhance system uptime, reduce manual toil, and enable velocity for engineering teams across PlayStation.
You'll work closely with platform and product teams to ensure seamless integration and delivery of high-performance, scalable solutions across PlayStation's global ecosystem. Your contributions will directly support the reliability, scalability, and operational excellence of our data platform powering millions of players worldwide., * Design and implement Infrastructure as Code (IaC) and automate the provisioning, monitoring, scaling, and lifecycle management of NoSQL, Streaming, and Caching platforms (e.g., Cassandra, Aerospike, Kafka, Redis).
- Drive end-to-end automation to enable repeatable, reliable, and self-service deployment of data services across cloud and hybrid environments.
- Ensure high availability, scalability, and resiliency of the platform data solutions .
- Define and enforce SLIs, SLOs, and error margins for data platforms to drive reliability engineering practices.
- Build highly performant, self-healing systems, automated failover, and auto scaling solutions for databases and streaming platforms.
- Develop observability solutions (metrics, logging, tracing) for Cassandra, Aerospike, Redis, and Kafka/MSK to ensure proactive issue detection.
- Partner with engineering and platform teams to provide reliable, scalable, and performant data services.
- Lead incident response for critical database/caching/streaming issues and drive root cause analysis with permanent automated fixes.
- Explore and apply AI-driven approaches to automation (e.g., anomaly detection, predictive scaling, automated remediation) to enhance operational efficiency.
- Drive and implement best practices, procedures, operational playbooks to facilitate knowledge sharing and support continuous improvement across global teams
- Mentor junior engineers and influence best practices in automation, distributed systems, and database reliability.
Requirements
- Bachelor's or Master's degree in Computer Science or a related field
- 6+ years of software development and DBRE experience, with at least 3+ years focused on Go and Infrastructure As Code with an emphasis on automation.
- Deep proficiency in Go (Golang), with the ability to write performant, idiomatic, and maintainable code for production-scale systems
- Proven experience designing modular, domain-driven architectures in Go, supporting large and complex backend services
- Expertise with infrastructure-as-code tools such as Terraform, Ansible.
- Deep expertise operating large-scale NoSQL, caching and streaming platforms (Apache Kafka, Redis, AWS MSK, etc) including tuning, compaction strategies, repair operations, backup/recovery, and performance optimization.
- Solid understanding of Linux internals, networking, and storage systems.
- Experience building, deploying and operating stateful workloads on Kubernetes, including automation and lifecycle management of database and streaming platforms.
- Hands-on experience with AWS and/or GCP, including managed services such as MSK, DynamoDB, ElastiCache, or equivalent technologies.
- Strong problem-solving and analytical skills, with a passion for automation and distributed systems reliability.
- Excellent communication and collaboration skills, with experience mentoring and influencing peers across diverse teams
- Experience building internal developer platforms, self-service infrastructure, or platform engineering solutions that improve developer productivity and operational efficiency.
- Prior use of Go for infrastructure automation, control plane services, or SRE-focused tooling is a plus
- Experience leveraging AI/ML and Generative AI technologies to improve infrastructure automation, operational workflows, incident management, observability, or developer productivity is a huge plus.
- Certification in relevant technologies (e.g., AWS Certified Database - Specialty) is a plus
Benefits & conditions
paid time off, 401(k), Please note that the base pay range may vary in line with our hybrid working policy and individual base pay will be determined based on job-related factors which may include knowledge, skills, experience, and location. In addition, this role is eligible for SIE's top-tier benefits package that includes medical, dental, vision, matching 401(k), paid time off, wellness program and coveted employee discounts for Sony products. This role also may be eligible for a bonus package. Clickhere to learn more. The estimated base pay range for this role is listed below. $177,300 - $265,900 USD
Please note, Sony Interactive Entertainment conducts background checks at the offer stage for all new employees (which may include criminal background checks for some roles) and will need to process personal information to support these checks.