Data Integration Engineer

Boehringer Ingelheim España, S.A.

Sant Cugat del Vallès, Spain

2 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Job location

Sant Cugat del Vallès, Spain

Tech stack

Artificial Intelligence

Airflow

Amazon Web Services (AWS)

Apache HTTP Server

Big Data

Cloud Computing

Code Review

Data as a Services

Data Architecture

Information Engineering

Data Governance

Data Infrastructure

Data Integration

ETL

Data Systems

Data Visualization

Data Warehousing

DevOps

Document Management Systems

Python

Machine Learning

NoSQL

Performance Tuning

Power BI

SQL Databases

Data Streaming

Tableau

Parquet

Data Processing

Scripting (Bash/Python/Go/Ruby)

Data Storage Technologies

Snowflake

Spark

Cloudformation

Containerization

Data Lake

Kubernetes

Information Technology

Data Analytics

Kafka

Data Delivery

Data Pipelines

Docker

Jenkins

Databricks

Job description

We are seeking a skilled and motivated Data Engineer to join the IT RDM CI Data Excellence team. In this role, you will play a pivotal part in enhancing our data infrastructure, optimizing data flows, and ensuring the availability and quality of strategically critical data assets from both internal and external providers.

You will enable fast and reliable data delivery to the right environments, supporting cutting-edge use cases in Artificial Intelligence (AI) and Machine Learning (ML). Collaboration is key: you'll work closely with researchers, data scientists, and analysts to build a consistent and scalable data ecosystem across multiple analytics and AI initiatives.

Tasks and responsibilities

Design, develop, and maintain scalable data pipelines and ETL processes to support data integration and analytics.
Collaborate with data architects, modelers and IT team members to help define and evolve the overall cloud-based data architecture strategy, including data warehousing, data lakes, streaming analytics, and data governance frameworks.
Collaborate with integration engineers, analysts, and other business stakeholders to understand data requirements and deliver solutions.
Optimize and manage data storage solutions and data integrations (e.g., S3, Snowflake, dbt, Snaplogic) ensuring data quality, integrity, security, and accessibility.
Leverage Databricks for scalable data processing, analytics, and advanced transformations.
Implement data quality and validation processes to ensure data accuracy and reliability.
Develop and maintain documentation for data processes, architecture, and workflows.
Participate in code reviews and contribute to best practices for data engineering.
Monitor and troubleshoot data pipeline performance and resolve issues promptly.
Consulting and Analysis: Meet with defined stakeholders to understand and analyze their processes and needs. Determine requirements to present possible solutions or improvements.
Technology Evaluation: Stay updated with the latest trends in data engineering, cloud technologies, and big data platforms.
Expert Communities: Engage actively in internal expert groups to exchange knowledge, mentor junior colleagues, and contribute to improving inefficient processes.
Cloud-Based Data Solutions: Utilize AWS cloud services (e.g., S3, Lambda, Step function, KMS, …) to support data engineering workflows and develop infrastructure as code for data pipelines using tools like Jenkins and AWS CloudFormation.
Performance Optimization: Monitor and optimize data pipelines for performance, scalability, and cost efficiency and troubleshoot and resolve data-related issues in a timely manner.

Requirements

Do you have experience in Tableau?, Do you have a Master's degree?, * Bachelor/Master degree in Computer Science, Engineering, or related field

Proficiency with the Apache ecosystem (Parquet, Iceberg, Spark, Kafka, Airflow).
Strong hands-on experience with AWS data services (Kinesis, Glue, Appflow, Lambda, S3).
Demonstrated experience with Snowflake and dbt (dbt labs) for building and modeling data pipelines
Strong analytical skills working with unstructured datasets.
Experience with relational SQL and NoSQL databases, preferably Snowflake and/or Databricks.
Familiarity with data pipeline and workflow orchestration tools.
Strong project management and organizational skills.
Excellent English written and verbal communication skills.
Snaplogic knowledge is a plus.
Preferred Skills:
Proficiency in scripting languages such as Python or Scala.
Familiarity with data visualization tools (e.g., Tableau, Power BI, QuickSight).
AWS Cloud Practitioner, Architecture, Big Data or Data Analytics certification.
NICE-TO-HAVE Qualifications : AWS Certified Big Data or AWS Certified Solutions Architect certification, experience with Databricks and Snowflake, knowledge of containerization technologies like Docker and Kubernetes, and experience with CI/CD pipelines and DevOps best practices.

Benefits & conditions

We are continuously working to design the best experience for you. Here are some examples of how we will take care of you:

Flexible working conditions
Life and accident insurance
Health insurance at a competitive price
Investment in your learning and development
Gym membership discounts

About the company

At Boehringer Ingelheim, we believe that Data & AI have the power to transform healthcare and improve the lives of millions of patients and animals. As a key member of the IT Research Development and Medicine - Computational Innovation team, you will join a passionate group where you will meet and collaborate with like-minded people dedicated to fostering a strong data and AI culture, delivering key transformation initiatives, and shaping the future of data-driven decision-making across our global organization. Your work will empower our researchers to achieve breakthrough therapies for our patients.