Senior software engineer
Role details
Job location
Tech stack
Job description
We're looking for an experienced Platform Engineer to build and operate the hybrid infrastructure foundation for our advanced AI/ML research and product development. You'll architect, build, and run our platforms spanning AWS and on-premise deployments, empowering our teams to train and deploy complex models at scale. This role is focused on creating a robust, self-service environment using Kubernetes, AWS, and Infrastructure-as-Code (Terraform), and orchestrating high-demand GPU workloads., * Architect and maintain our core computing platform using Kubernetes on AWS and on-premise, providing a stable, scalable environment for all applications, research initiatives, and Sanas services.
- Provision, manage, and maintain our on-premise bare metal server infrastructure for high-performance GPU computing.
- Lead comprehensive observability across the organization (monitoring, logging, tracing) to ensure platform(s) health, and create automation for operational tasks, incident response, and performance tuning.
- Design and build low latency, scalable, and reliable infrastructure that serves model inference and training for our cutting-edge speech AI models.
- Collaborate with AI researchers and ML engineers to understand their infrastructure needs and build the tools and workflows that accelerate and support their development cycle.
- You'll have significant autonomy to shape our product infrastructure, and directly impact how cutting-edge AI is applied across various devices and applications in speech.
Requirements
- 5+ years of Software Engineering experience, preferably in Platform Engineering or Site Reliability.
- Strong fundamentals with a focus on writing clean & maintainable code.
- Strong proficiency in Scripting (Bash), Python, or Rust.
- Experience building large-scale distributed systems with high demands on model inference, performance, reliability, and observability.
- Experience with high-performance compute (HPC) schedulers, capacity planning, containerized deployments, and familiarity in managing GPU-intensive AI workloads.
- Strong communication skills with ability to own large scope projects by working cross-functionally across Engineering, AI, Product, Research and Business stakeholders.
- Experience working with AWS (preferred), GCP or Azure, EKS/Kubernetes.
- Deep curiosity about the state of agentic coding tools and how to optimize agent-assisted workflows.
Bonus
- Nice-to-have: Familiarity with Real Time streaming protocols like WebTransport and SIP/SRTP.
- Bachelor's Degree in Computer Science or related fields.