Principal Software Engineer: Cloud Platform
Role details
Job location
Tech stack
Job description
As a Principal Software Engineer at Radiant, you'll be a hands-on technical leader, driving the design and development of our most critical systems, including our GPU orchestration layer, inference-serving infrastructure, and core platform APIs.
You'll set the standard for engineering excellence across the team, working closely with other principal engineers to align on strategy and drive technical leadership. You'll partner with peers in infrastructure and machine learning to deliver high-performance AI systems, and mentor engineers across multiple squads. While you won't have direct reports, you'll play a pivotal role in shaping engineering direction, design decisions, and cross-team collaboration.
What You'll Do:
- Design and develop core services for the Radiant AI Cloud
- Architect and build core backend services in Go that power AI infrastructure, networking, inference orchestration, and model lifecycle management
- Lead engineering design efforts across the backend domain - architecture reviews, and system evolution strategies
- Collaborate closely with platform and infrastructure to deliver reliable, secure, production-grade services
- Drive technical standards and mentorship, upholding code quality, observability, and maintainability across teams
- Proactively identify and resolve architectural bottlenecks, performance issues, or scaling challenges
Requirements
- Proven expertise in Golang and cloud-native backend development
- Deep experience designing and deploying distributed systems and APIs (gRPC, REST) in production environments
- Strong knowledge of Kubernetes internals, with deep understanding of primitives, deployments and reconciliation loops.
- Experience with PostgreSQL or equivalent high-performance data stores
- Experience with CI/CD workflows, observability, security and automation best practices
- Demonstrated ability to lead technically: mentor others, drive architectural thinking, and influence across teams
- A bias for action, accountability, and leading by example
Preferred Skills (Nice to Have)
- Experience with AI/ML infrastructure tooling (e.g. vLLM, KServe, Ray, Triton)
- Familiarity with Python, especially ML libraries and model interfaces
- Familiarity with GPU scheduling, inference pipelines, or resource-constrained workload orchestration
- Exposure to GPU orchestration frameworks or building services for model training/inference
- Understanding of multi-tenant systems, isolation strategies, and secure infrastructure for AI workloads
How you work:
- You approach problems with a systems mindset - balancing practical execution with long-term scalability
- You elevate the team, setting high standards for technical quality and engineering excellence.
- You hold yourself and others accountable - giving direct feedback and expecting the same
- You take initiative, owning challenges end-to-end and proactively driving solutions.
- You invest in others, mentoring to build both capability and confidence.
- You communicate clearly - translating complexity into clarity across engineering and business audiences