Staff Platform Engineer - Remote (EMEA)
Role details
Job location
Tech stack
Job description
As a Staff Platform Engineer, you are the owner of the mission-critical cloud infrastructure that powers the WunderGraph Cosmo platform for our enterprise customers. Your primary responsibility is ensuring the reliability, performance, and scalability of this core platform by defining and meeting stringent SLOs. This role blends deep operational leadership with product-focused infrastructure engineering in Go. You will not only architect our internal systems for scale but also build and operate key product infrastructure, including our customer-facing telemetry pipeline (built on OpenTelemetry and ClickHouse) and the AI pipeline that empowers our products. We are looking for a hands-on technical leader, driven by the challenge of solving ambiguous, 'eBay-scale' problems, whether that means diagnosing complex issues in a distributed system, tuning network performance, or optimizing our infrastructure for cost-efficiency through advanced automation.
- TEAM INTEGRATION
- You align with the Head of Engineering.
- You collaborate closely with the engineering team and customers.
- ROLE OBJECTIVES You are successful if you:
- Enable our engineering teams to ship features for WunderGraph Cosmo fast, reliably, and with confidence through a world-class Internal Developer Platform (IDP).
- Take full ownership of our core platform infrastructure and services-and own them completely, from architecture to operation.
- Drive the architectural vision for our platform, making key decisions on technologies like Kubernetes, Infrastructure as Code, and our observability stack.
- Bring deep platform expertise to the table, leveling up the entire team through mentorship, architectural guidance, and by championing best practices.
- Grow with WunderGraph as we scale, expanding your influence across the product and organization while helping us build a world-class engineering team.
- ROLE TASKS The role focuses on, but is not limited to:
- Architecting, building, and operating the core cloud-native infrastructure for WunderGraph Cosmo and Hub, primarily using Go and Kubernetes.
- Owning and evolving our observability stack (OpenTelemetry, Prometheus, ClickHouse) and the infrastructure supporting our AI-driven features to ensure deep, actionable insights into our systems.
- Building and optimizing CI/CD pipelines to improve build times, automate quality and security gates, and create a seamless path to production for our engineers.
- Championing and implementing Infrastructure as Code (IaC) best practices using tools like Terraform, building reusable and maintainable modules for our teams.
- Embedding security best practices into the platform by designing and implementing network policies, RBAC, and automated checks to meet enterprise and SOC 2 compliance standards.
- Mentoring other engineers, providing insightful code and design reviews, and documenting platform features and architectural decisions to foster a culture of collaboration and knowledge sharing., + WunderGraph's engineering teams are highly productive, shipping features faster and with more confidence because the internal platform you've built is reliable, self-service, and provides an exceptional developer experience (DX).
- Our platform infrastructure scales seamlessly and reliably to meet the demands of our largest enterprise customers, like eBay, solidifying Cosmo's reputation for performance and stability.
- You are recognized as a key technical leader and architect within the engineering organization, sought out for your expertise and guidance on our most complex infrastructure challenges.
- Your architectural decisions and mentorship have measurably improved the team's skills, our system's reliability, and our overall engineering culture, embodying our value of "Engineering Excellence"
- You continue to grow with us, expanding your influence across the product and organization and helping to define the future of platform engineering at WunderGraph.
Requirements
- Proven experience architecting and operating scalable, highly available, and secure cloud-native platforms in production, with strong proficiency in Go and deep expertise in Kubernetes.
- You thrive in the dynamic environment of a scaling, remote-first company that has successfully navigated strategic pivots and is on a rapid growth trajectory.
- Deep expertise in a major cloud provider (AWS, GCP, Azure) and Infrastructure as Code tools (e.g., Terraform, Pulumi).
- A strong understanding of system architecture, distributed systems, and the challenges of running high-performance API gateways. Familiarity with GraphQL Federation is a significant plus.
- Experience building or managing modern observability stacks (e.g., OpenTelemetry, Prometheus, Grafana, ClickHouse).
- A self-starter attitude and a leader's mindset: you are comfortable with ambiguity, can identify and solve ill-defined problems, and don't need hand-holding.
- Excellent written and verbal communication skills, with the ability to articulate complex technical concepts clearly in design documents, RFCs, and asynchronous discussions.
Benefits & conditions
- Work from wherever you thrive, we're fully remote and globally distributed. We provide co-working space options worldwide if needed.
- Pick your preferred work hardware
- We focus on getting stuff done, and on having fun whilst doing so: work hard, play hard!
- You can make a real difference and find lots of opportunities to grow together with us
- Discretionary PTO: take the time you need to recharge
- Competitive compensation
- Depending on location, we offer healthcare benefits according to local standards
- Team retreats across the globe.
Note: This is a full-time, fully remote position. We are looking for someone who is available to work during European (CET) business hours.