Principal Data Engineer
Role details
Job location
Tech stack
Job description
The role will be involved in all aspects of the software delivery lifecycle including the creation and elaboration of business requirements, functional and technical design specifications, development and maintenance of our software (including prototyping) and driving innovation into our product suite. You will be responsible for ensuring the development and maintenance of IDBS's software platforms adheres to IDBS's architecture vision., * Design, develop, and maintain scalable data pipelines using Databricks and Apache Spark (PySpark) to support analytics and other data-driven initiatives.
- Support the elaboration of requirements, formulation of the technical implementation plan and backlog refinement. Provide technical perspective to products enhancements & new requirements activities.
- Optimize Spark-based workflows for performance, scalability, and data integrity, ensuring alignment with GxP and other regulatory standards.
- Research, and promote new technologies, design patterns, approaches, tools and methodologies that could optimise and accelerate development.
- Apply strong software engineering practices including version control (Git), CI/CD pipelines, unit testing, and code reviews to ensure maintainable and production-grade code.
Here Is What Success In This Role Looks Like
- Delivered reliable, scalable data pipelines that process clinical and pharmaceutical data efficiently, reducing data latency and improving time-to-insight for research and regulatory teams.
- Enabled regulatory compliance by implementing secure, auditable, and GxP-aligned data workflows with robust access controls.
- Improved system performance and cost-efficiency by optimizing Spark jobs and Databricks clusters, leading to measurable reductions in compute costs and processing times.
- Fostered cross-functional collaboration by building reusable. testable, well-documented Databricks notebooks and APIs that empower data scientists, analysts, and other stakeholders to build out our product suite.
- Contributed to a culture of engineering excellence through code reviews, CI/CD automation, and mentoring, resulting in higher code quality, faster deployments, and increased team productivity.
Requirements
- Deployment of Databricks functionality in a SaaS environment (via infrastructure as code) with experience of Spark, Python and a breadth of database technologies
- Event-driven and distributed systems, using messaging systems like Kafka, AWS SNS/SQS and languages such as Java and Python
- Data Centric architectures, including experience with Data Governance / Management practices and Data Lakehouse / Data Intelligence platforms. Experience of AI software delivery and AI data preparation would also be an advantage