Data Engineer - Data Pipelines

AstraZeneca
7 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English

Job location

Tech stack

Amazon Web Services (AWS)
Business Analytics Applications
Automation of Tests
Bash
Code Review
Continuous Integration
Information Engineering
Data Warehousing
Software Design Patterns
Python
Prometheus
Cloud Platform System
Snowflake
Grafana
Cloudformation
Containerization
Kubernetes
Terraform
Software Version Control
Data Pipelines
Docker
Redshift
Databricks

Requirements

reproducible pipelines using frameworks such as Nextflow (preferred) or Snakemake; integrate with schedulers and HPC/cloud resources.Data Platforms: Develop data models, warehousing layers, and metadata/lineage; ensure data quality, reliability, and governance.Scalability and Performance: Optimize pipelines for throughput and cost across Unix/Linux HPC and cloud environments (AWS preferred); implement observability and reliability practices.Collaboration: Translate scientific and business requirements into technical designs; partner with CPSS stakeholders, R&D IT, and DS&AI to co-create solutions.Engineering Excellence: Establish and maintain version control, CI/CD, automated testing, code review, and design patterns to ensure maintainability and compliance.Enablement: Produce documentation and reusable components; mentor peers and promote best practices in data engineering and scientific computing.Essential Skills/Experience:Pipeline engineering: Design, implement, and operate fit-for-purpose data pipelines for bioinformatics and scientific data, from ingestion to consumption.Workflow orchestration: Build reproducible pipelines using frameworks such as Nextflow (preferred) or Snakemake; integrate with schedulers and HPC/cloud resources.Data platforms: Develop data models, warehousing layers, and metadata/lineage; ensure data quality, reliability, and governance.Scalability and performance: Optimize pipelines for throughput and cost across Unix/Linux HPC and cloud environments (AWS preferred); implement observability and reliability practices.Collaboration: Translate scientific and business requirements into technical designs; partner with CPSS stakeholders, R&D IT, and DS&AI to co-create solutions.Engineering excellence: Establish and maintain version control, CI/CD, automated testing, code review, and design patterns to ensure maintainability and compliance.Enablement: Produce documentation and reusable components; mentor peers and promote best practices in data engineering and scientific computing.Desirable Skills/Experience:Strong programming in Python and Bash for workflow development and scientific computing.Experience with containerization and packaging (Docker, Singularity, Conda) for reproducible pipelines.Familiarity with data warehousing and analytics platforms (e.g., Redshift, Snowflake, Databricks) and data catalog/lineage tools.Experience with observability and reliability tooling (Prometheus/Grafana, ELK, tracing) in HPC and cloud contexts.Knowledge of infrastructure as code and cloud orchestration (Terraform, CloudFormation, Kubernetes).Understanding of FAIR data principles and domain-specific bioinformatics formats and standards.Track record of mentoring engineers and enabling cross-functional teams with reusable components and documentation.Experience optimizing performance and cost on AWS, including spot strategies, autoscaling, and storage tiers.When we put unexpected teams in the same room, we unleash bold thinking with the power to inspire life-changing medicines. In-person working gives us the platform we need to connect, work at pace and challenge perceptions. That's why we work, on average, a minimum of three days per week from the office. But that doesn't mean we're not flexible. We balance the expectation of being in the office while respecting individual flexibility. Join us in our unique and ambitious world.Why AstraZeneca:Your engineering craft will fuel science at the crossroads of biology, data, and technology. You will collaborate with researchers, data scientists, and technologists to tackle complex diseases, using modern platforms and inclusive ways of working to turn uncertainty into insight. We value kindness alongside ambition, nurture resilience and curiosity, and pair the resources of a global leader with the agility to move at pace-from hands-on experimentation to shared learning and tangible impact for patients. Call to Action:Step into this role and start building the data pipelines that turn

Apply for this position