Principal Platform Engineer
Role details
Job location
Tech stack
Job description
You will work on a modern, cloud-native data platform built on Microsoft Azure, and you'll have substantial input into how the platform evolves over time. The environment supports large-scale, business-critical workloads and is developed for both real-time and batch processing. The platform makes extensive use of Azure Databricks, Apache Spark, and Delta Lake to support high-throughput data pipelines. Kafka and Azure Event Hubs are used to implement reliable, event-driven architectures, with schema management and stream processing forming core parts of the system. Datasets are stored and served primarily using ADLS Gen2 and Delta tables, with additional technologies such as Azure Data Explorer, Cosmos DB, or Synapse used where they are the right fit. Engineers work closely with SQL, Python, and Scala to model data, optimise query performance, and support high-QPS analytical use cases. The platform places a strong emphasis on reliability, observability, and secure delivery. Teams use Datadog and Azure Monitor, follow Open Telemetry standards, and handle environments using Git-based workflows and Infrastructure as Code. Automated testing and well-defined release strategies are central to how changes are built and shipped.
Requirements
Do you have experience in Unity?, You will bring 5-10+ years of experience building and operating highly scalable systems in production, ideally within data-intensive or analytics-heavy environments. You've worked hands-on with Azure and have experience operating Databricks platforms supporting both streaming and batch workloads. You will have practical experience working with Kafka or Azure Event Hubs, including designing partitions, managing consumer groups, and understanding delivery semantics. You're confident working with SQL, have experience with data modelling, and know how to diagnose and improve query performance on large datasets. A proven understanding of distributed systems fundamentals, such as fault tolerance, stateful processing, and system resilience. You've worked with event-driven architectures and have supported systems that handle high request volumes and low-latency analytical workloads. You're comfortable working in at least one data-focused programming language, such as Python or Scala, and you bring a good sense of ownership. You enjoy influencing technical direction, mentoring other specialists, and helping teams converge on shared standards and guidelines. DESIRABLE'S: While not required, experience in some of the following areas would be advantageous. This could include working with Delta Live Tables, Unity Catalog, or broader Lakehouse governance approaches, or exposure to additional data and streaming technologies such as Flink, Azure Data Explorer (Kusto), or Cosmos DB. Experience crafting serving layers, stream-out patterns, or feature stores for analytics or machine learning is also valuable. Familiarity with SRE practices, such as defining SLOs and handling error budgets in data-intensive systems, is a plus.