Cloud Storage Expert
Role details
Job location
Tech stack
Job description
As a Senior Architect & Developer for Rook-Ceph, you will design, and develop for a large-scale, production-grade storage platforms powering mission-critical workloads in a Kubernetes and OpenStack environment. This role offers opportunity to be part of team responsible for end-to-end architecture of Ceph (Rook) clusters, ensuring high availability, scalability, and performance across multi-tenant, high-demand systems.
It includes solution design for object (RGW/S3), block (RBD), and file (CephFS) services, defining best practices for data durability, replication, lifecycle management, and cost optimization (e.g., NVMe and HDD tiering strategies). You will work closely with platform, cloud, and application teams to integrate Ceph seamlessly into OpenStack and cloud-native ecosystems.
In addition, you will contribute to development and automation, including enhancing deployment pipelines, extending Rook operators, and building tooling around observability, lifecycle policies, and data management. You will troubleshoot complex distributed system issues, optimize performance at scale, and drive continuous improvements in reliability and efficiency.
This role requires deep expertise in distributed storage, Kubernetes, and Ceph internals, along with a strong ability to architect resilient systems and influence platform direction in a highly critical, large-scale environment.
Requirements
- Deep expertise in Ceph (RBD, RGW, CephFS) and Rook in large-scale, production Kubernetes environments. Strong experience designing and operating highly available, distributed storage systems for critical workloads, preferably within OpenStack ecosystems.
- Proven ability to architect scalable solutions, including data durability, replication, performance tuning, and cost optimization (e.g., tiering strategies).
- Solid development and automation skills (e.g., Go, Python, C++, CI/CD pipelines), with experience extending operators or building platform tooling.
- Strong troubleshooting skills across complex distributed systems, networking, linux Operating system and storage layers.
- Excellent collaboration and communication skills, with the ability to work across platform, cloud, and application teams.