Senior Software Engineer
Role details
Job location
Tech stack
Job description
Nscale is looking for a Senior Software Engineer to build and scale the control/data plane systems and application services that power our GenAI cloud. You'll work alongside domain experts and experienced engineers across our infrastructure, platform, and product teams to build the foundational systems that enable thousands of AI workloads to run reliably at scale. This is a high-impact role where you'll have significant ownership and the opportunity to shape how we build and operate critical platform services.
What You'll be Doing
- Build and own the control plane and data plane services that power our cloud platform. You'll contribute to APIs and SDKs for platform consumption, implement reliable distributed state management and storage systems, and create services that coordinate workload scheduling and orchestration across multiple regions.
- Engineer the infrastructure for processing and managing high-throughput workloads and distributed data flows. You'll solve complex challenges around data capture, storage, and accessibility for AI/ML training and inference.
- Drive technical decisions for your systems and champion engineering best practices across the team. You will uphold high standards for reliability, testing, monitoring, and CI/CD in a fast-paced, research-driven environment, and provide technical mentorship to engineers on your team.
- Own the operational health of your systems in production. You'll implement observability, respond to incidents, optimise performance, and continuously improve reliability based on production feedback and metrics.
- You will have the opportunity to develop entirely new platform services and methods, leveraging cloud-native technologies and AI to create novel platform and product capabilities
Requirements
- You have extensive hands-on experience designing, building, and operating scalable production systems on or for a major cloud provider (e.g., AWS, GCP), including data-intensive distributed workflows, backend services, and APIs.
- You use AI tools like Claude, Cursor, or similar as a core part of your development workflow - not as a novelty, but as a fundamental multiplier of what you can build. Whether you're already using AI to rapidly prototype complex distributed systems, explore unfamiliar codebases, and architect solutions across new domains, or you're excited to push your AI-assisted development skills to that level, you understand the potential and are committed to mastering how to effectively collaborate with AI while maintaining high code quality and architectural coherence.
- You believe in using the right tool for the job and have strong proficiency with typed languages. Our primary stack is built with Go, with some services in Rust and Python. You're comfortable working across different languages and applying various technical approaches to find the best solution.
- You have delivered multi-service distributed systems from ambiguous requirements to high-adoption operational systems in production, with hands-on experience in day 2 operations including monitoring, alerting, incident response, and performance optimisation at scale.
- You thrive in an ambiguous, fast-paced environment where you are given high levels of agency and ownership. You are a pragmatic problem-solver who is biased towards action and impact.
Nice to Have:
- Experience designing developer-friendly APIs, SDKs, or platform services that customers and other teams depend on
- Experience with Kubernetes, infrastructure-as-code (Terraform, Pulumi), event-driven architectures, and message queues (NATS, Kafka, RabbitMQ)
- Experience with GPU orchestration and workload scheduling for AI/ML inference and training workloads
- Contributions to open-source projects in the cloud-native ecosystem
- Comfortable contributing throughout the stack including frontend to accelerate delivery when needed
Benefits & conditions
- Highly competitive package (base + equity) with reviews every 12 months.
- Join the fastest-growing tech startup, your chance to push boundaries, collaborate with brilliant minds, and make your mark on cutting-edge AI.
- Expect a dynamic progression plan tailored to your ambitions. Grow by trying new things, leading, challenging the status quo, and owning your impact, always with our full support.
- Human-First Flexibility: We treat you as humans first. Our flexible workplace trusts Nscalers to deliver, giving you the autonomy to shape your day around life's moments.
Join our thriving remote-first team. Geography is no barrier to impact or connection. We build seamless virtual collaboration, empowering you, wherever you work.