Data Engineer

The Rolewe
Belfast, United Kingdom
yesterday

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English

Job location

Remote
Belfast, United Kingdom

Tech stack

Airflow
Amazon Web Services (AWS)
Amazon Web Services (AWS)
Data analysis
Apache HTTP Server
Continuous Delivery
Continuous Integration
Directed Acyclic Graph (Directed Graphs)
Data Architecture
Data Governance
ETL
Data Transformation
Data Security
Data Systems
Data Warehousing
DevOps
Distributed Computing Environment
Github
Python
Machine Learning
Performance Tuning
Cloud Services
Standard Sql
Data Logging
Spark
Electronic Medical Records
Infrastructure as Code (IaC)
GIT
PySpark
Amazon Web Services (AWS)
Data Management
Functional Programming
Cloudwatch
Api Gateway
Terraform
Stream Processing
Data Pipelines
Redshift

Job description

About the RoleWe are seeking an experienced and highly motivated Data Engineer to join our growing team. In this role, you will be responsible for designing, developing, and maintaining scalable data platforms and pipelines that support business intelligence, analytics, machine learning, and operational reporting initiatives.You will work closely with data analysts, software engineers, architects, and business stakeholders to deliver robust, high-performance data solutions in a cloud-native AWS environment. The ideal candidate has strong expertise in PySpark, Python, Apache Airflow, AWS services, Terraform, and modern DevOps practices. Key ResponsibilitiesData Engineering & Pipeline DevelopmentDesign, develop, and maintain scalable, reliable, and efficient data pipelines using PySpark and Python.Build high-volume batch and real-time data processing solutions capable of handling large-scale datasets.Develop, optimize, and monitor ETL/ELT workflows to ensure data quality, consistency, and availability.Implement data transformation, cleansing, enrichment, and validation processes.Troubleshoot and resolve data pipeline failures, bottlenecks, and performance issues.Workflow OrchestrationDesign and manage complex workflows using Apache Airflow.Create and maintain DAGs with robust scheduling, dependency management, alerting, and recovery mechanisms.Monitor workflow execution and proactively address failures or performance concerns.Implement workflow best practices to ensure reliability and maintainability.Cloud Data Architecture (AWS)Architect and implement cloud-native data solutions on AWS.Develop scalable and secure data platforms leveraging:Amazon S3Amazon RedshiftAWS GlueAWS LambdaAmazon EMRAPI GatewayAmazon CloudWatchAWS IAMEnsure adherence to security, governance, and compliance standards.Optimise cloud resources for performance and cost efficiency.Infrastructure as CodeProvision and manage AWS infrastructure using Terraform.Develop reusable Terraform modules and templates.Implement infrastructure automation to support development, testing, and production environments.Maintain version-controlled infrastructure and deployment processes.DevOps & CI/CDDesign and maintain CI/CD pipelines using GitHub Actions.Automate testing, deployment, monitoring, and infrastructure updates.Support continuous integration and continuous delivery best practices.Collaborate with engineering teams to improve deployment reliability and efficiency.Performance OptimisationOptimise Spark applications for scalability and efficiency.Conduct performance tuning of distributed data processing jobs.Identify and resolve resource

Requirements

utilisation issues across cloud and distributed environments.Implement monitoring and logging strategies to improve observability.Collaboration & Data GovernancePartner with business stakeholders, analysts, and engineering teams to understand data requirements.Contribute to data architecture decisions and long-term platform strategy.Establish and promote data governance, quality, and security best practices.Document systems, processes, and technical solutions to support maintainability and knowledge sharing.Required Skills & ExperienceStrong experience with Python and PySpark.Hands-on expertise with Apache Airflow.Extensive experience working with AWS cloud services.Strong knowledge of Amazon Redshift, AWS Glue, S3, Lambda, EMR, API Gateway, CloudWatch, and IAM.Experience with Terraform and Infrastructure as Code (IaC).Proficiency with Git, GitHub Actions, and CI/CD pipelines.Solid understanding of distributed data processing and Spark optimization.Experience designing scalable data architectures and data models.Strong SQL skills and understanding of data warehousing concepts.Excellent troubleshooting, analytical, and problem-solving abilities.Strong communication and collaboration skills.

Apply for this position