Senior Data Engineer

Holcim
Municipality of Madrid, Spain
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Intermediate

Job location

Municipality of Madrid, Spain

Tech stack

Query Performance
Java
Agile Methodologies
Artificial Intelligence
Airflow
Amazon Web Services (AWS)
Amazon Web Services (AWS)
Automation of Tests
Big Data
Cloud Computing
Program Optimization
Software Quality
Continuous Integration
Data Infrastructure
ETL
Data Systems
Data Visualization
Data Warehousing
DevOps
Distributed Systems
Github
Identity and Access Management
Python
Openshift
Scrum
Software Deployment
Software Engineering
SQL Databases
Data Streaming
Parquet
Qliksense
Data Storage Technologies
System Availability
Snowflake
Spark
GIT
Cloudformation
Data Lake
Information Technology
Low Latency
Apache Flink
Avro
Amazon Web Services (AWS)
Data Analytics
QlikView
Amazon Web Services (AWS)
Real Time Data
Kafka
Terraform
Data Pipelines
Serverless Computing
Amazon Web Services (AWS)
Docker
Jenkins
Redshift
Databricks

Job description

We are seeking a seasoned Senior Data Engineer to design, build, and optimize our next-generation data platform. You will be responsible for architecting scalable data pipelines, managing large-scale distributed systems, and ensuring our data infrastructure in AWS and Databricks is robust and efficient. The ideal candidate is a Spark expert with a deep understanding of the AWS ecosystem and a passion for automation., * Pipeline Architecture: Design and implement complex batch and streaming ETL/ELT pipelines using Python, SQL, and Spark to process massive datasets.

  • Cloud Infrastructure: Leverage AWS Data Analytics services to build scalable, secure, and cost-effective data solutions.
  • Orchestration & DevOps: Manage and automate data workflows using Airflow, while utilizing Docker and ECS for containerized application deployment.
  • System Optimization: Monitor and tune the performance of distributed systems (Spark Cluster) to ensure high availability and low latency.
  • Infrastructure as Code: Utilize AWS CloudFormation or Terraform to manage data infrastructure, ensuring repeatable and version-controlled environments.
  • Cost Optimization: Monitor and optimize AWS spend by selecting appropriate instance types (Spot vs. On-Demand) and refining data storage strategies.
  • Security & Compliance: Implement IAM roles, bucket policies, and encryption (KMS) to ensure data is secure at rest and in transit.
  • Collaboration: Work within an Agile framework to deliver iterative value, collaborating closely with Data Scientists and Stakeholders to translate business needs into technical reality.

JOB DIMENSIONS

List of direct reports:

  • Up to 2 Direct Reports, and around 15 externals

Key interfaces, stakeholders and relationships:

  • Internal:
  • GDS: product manager, application manager, data & analytics & AI team
  • Country business stakeholders
  • External : 3rd party vendors, o Amazon S3: Implementing "Data Lake" best practices, including partitioning, compression (Parquet/Avro), and lifecycle policies. o Amazon Redshift: Designing star/snowflake schemas and optimizing query performance for high-volume data warehousing. o Amazon Athena: Performing ad-hoc SQL analysis directly on S3 data. o Experience with open table formats (iceberg/delta)
  • Orchestration & Integration: o Amazon MWAA (Managed Workflows for Apache Airflow): Deploying and scaling Airflow environments.

Requirements

  • Experience: Minimum 4+ years of hands-on experience in active Big Data environments and 2+ years specializing in Data Analytics within AWS.
  • Compute & Processing: Amazon EMR: Architecting and managing Spark clusters for large-scale distributed processing. o AWS Glue: Developing serverless ETL jobs, managing the Data Catalog, and implementing Glue Crawlers., + Streaming (Advantage): Amazon Kinesis or MSK (Managed Streaming for Kafka) for real-time data ingestion.
  • Core Engineering: Expert-level proficiency in Spark, Python, and SQL.
  • Infrastructure & Tooling: Proven experience with Airflow for orchestration and Docker/ECS for containerization.
  • Good knowledge in Databricks and data mesh architectures. Good understanding in how to implement and maintain Lakehouse data models (bronze / silver / gold layers) using Delta Lake for reliability, ACID transactions, time travel and schema evolution.
  • Solid software engineering practices: Git, CI/CD for data pipelines, automated testing, code quality and documentation.
  • Communication: Excellent written and oral English communication skills, with the ability to explain complex technical concepts to non-technical audiences.
  • Degree in Computer Science, Engineering, Mathematics or related field, or equivalent practical experience., * Real-time Processing: Experience with streaming and distributed messaging applications like Flink and Kafka.
  • Core Tech: Java programming.
  • Industrialise ML use cases
  • Data Visualization: Experience with QlikView or QlikSense to support BI initiatives.
  • Agile: Experience working in a fast-paced Scrum or Kanban environment.
  • Certifications: AWS Certified Data Engineer - Associate/Professional or AWS Certified Solutions Architect, Databricks Data engineer (Associated/Professional) certification
  • DevOps: Experience with Openshift, Github Actions or Jenkins for CI/CD of data workflows.

Apply for this position