Data Engineer

The Rolewe

Belfast, United Kingdom

1 month ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Job location

Remote

Belfast, United Kingdom

Tech stack

Airflow

Amazon Web Services (AWS)

Data analysis

Apache HTTP Server

Continuous Delivery

Continuous Integration

Directed Acyclic Graph (Directed Graphs)

Data Architecture

Data Governance

ETL

Data Transformation

Data Security

Data Systems

Data Warehousing

DevOps

Distributed Computing Environment

Github

Python

Machine Learning

Performance Tuning

Cloud Services

Standard Sql

Data Logging

Spark

Electronic Medical Records

Infrastructure as Code (IaC)

GIT

PySpark

Amazon Web Services (AWS)

Data Management

Functional Programming

Cloudwatch

Api Gateway

Terraform

Stream Processing

Data Pipelines

Redshift

Job description

About the RoleWe are seeking an experienced and highly motivated Data Engineer to join our growing team. In this role, you will be responsible for designing, developing, and maintaining scalable data platforms and pipelines that support business intelligence, analytics, machine learning, and operational reporting initiatives.You will work closely with data analysts, software engineers, architects, and business stakeholders to deliver robust, high-performance data solutions in a cloud-native AWS environment. The ideal candidate has strong expertise in PySpark, Python, Apache Airflow, AWS services, Terraform, and modern DevOps practices. Key ResponsibilitiesData Engineering & Pipeline DevelopmentDesign, develop, and maintain scalable, reliable, and efficient data pipelines using PySpark and Python.Build high-volume batch and real-time data processing solutions capable of handling large-scale datasets.Develop, optimize, and monitor ETL/ELT workflows to ensure data quality, consistency, and availability.Implement data transformation, cleansing, enrichment, and validation processes.Troubleshoot and resolve data pipeline failures, bottlenecks, and performance issues.Workflow OrchestrationDesign and manage complex workflows using Apache Airflow.Create and maintain DAGs with robust scheduling, dependency management, alerting, and recovery mechanisms.Monitor workflow execution and proactively address failures or performance concerns.Implement workflow best practices to ensure reliability and maintainability.Cloud Data Architecture (AWS)Architect and implement cloud-native data solutions on AWS.Develop scalable and secure data platforms leveraging:Amazon S3Amazon RedshiftAWS GlueAWS LambdaAmazon EMRAPI GatewayAmazon CloudWatchAWS IAMEnsure adherence to security, governance, and compliance standards.Optimise cloud resources for performance and cost efficiency.Infrastructure as CodeProvision and manage AWS infrastructure using Terraform.Develop reusable Terraform modules and templates.Implement infrastructure automation to support development, testing, and production environments.Maintain version-controlled infrastructure and deployment processes.DevOps & CI/CDDesign and maintain CI/CD pipelines using GitHub Actions.Automate testing, deployment, monitoring, and infrastructure updates.Support continuous integration and continuous delivery best practices.Collaborate with engineering teams to improve deployment reliability and efficiency.Performance OptimisationOptimise Spark applications for scalability and efficiency.Conduct performance tuning of distributed data processing jobs.Identify and resolve resource

Requirements

utilisation issues across cloud and distributed environments.Implement monitoring and logging strategies to improve observability.Collaboration & Data GovernancePartner with business stakeholders, analysts, and engineering teams to understand data requirements.Contribute to data architecture decisions and long-term platform strategy.Establish and promote data governance, quality, and security best practices.Document systems, processes, and technical solutions to support maintainability and knowledge sharing.Required Skills & ExperienceStrong experience with Python and PySpark.Hands-on expertise with Apache Airflow.Extensive experience working with AWS cloud services.Strong knowledge of Amazon Redshift, AWS Glue, S3, Lambda, EMR, API Gateway, CloudWatch, and IAM.Experience with Terraform and Infrastructure as Code (IaC).Proficiency with Git, GitHub Actions, and CI/CD pipelines.Solid understanding of distributed data processing and Spark optimization.Experience designing scalable data architectures and data models.Strong SQL skills and understanding of data warehousing concepts.Excellent troubleshooting, analytical, and problem-solving abilities.Strong communication and collaboration skills.

Role details

Job location

Tech stack

Job description

Requirements

Apply for this position

Good distractions

Moments

Videos View all