DATA ENGINEER (Data Science & Big Data Analytics

Eurecat

3 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Shift work

Languages

English, Spanish, Catalan

Experience level

Intermediate

Job location

Tech stack

Clean Code Principles

Artificial Intelligence

Airflow

Amazon Web Services (AWS)

Azure

Big Data

Command-Line Interface

Code Review

Computer Programming

Databases

Continuous Integration

Information Engineering

ETL

Serialization

Dataspaces

Data Warehousing

Relational Databases

Linux

Document-Oriented Databases

Eclipse

Elasticsearch

Github

Graph Database

Hadoop

Identity and Access Management

Python

PostgreSQL

Machine Learning

MySQL

Neo4j

NoSQL

RabbitMQ

Redis

Ansible

Prometheus

Standard Sql

Software Deployment

SQL Databases

Management of Software Versions

Parquet

Multithreading

Google Cloud Platform

Data Storage Technologies

Bson

Concurrency

Spark

Multi-Cloud

Backend

GIT

Event Driven Architecture

Pytest

Data Lake

Gitlab-ci

Kubernetes

Infrastructure Automation Frameworks

Information Technology

Data Lineage

Apache Flink

Avro

Bare Metal

Kafka

Data Management

Machine Learning Operations

Presto

Terraform

Software Version Control

Data Pipelines

Serverless Computing

Docker

Job description

You will join the Big Data & Data Science unit, a diverse team covering areas as varied as Computational Social Science, Cognitive Neuroscience, and Trustworthy AI. We are looking for an intelligent and curious data engineer to help us translate applied research into tangible products and prototypes, working on real European research projects alongside researchers, software engineers, and project managers., * Design, build, and maintain data pipelines (batch and streaming) that ingest data from heterogeneous sources into data lakes and warehouses, including metadata and lineage tracking.

Contribute to the development of federated query and discovery systems over distributed datasets (UNCAN.eu), working with engines such as Trino and integrating query optimizers compliant with privacy requirements.
Contribute to the deployment of European data spaces (DeployEMDS) using standard building blocks from IDSA, Gaia-X, and FIWARE, including data catalogues, brokers, and connectors.
Build and maintain orchestration workflows using Airflow or Dagster, following software engineering best practices (tests, code review, CI/CD).
Package and deploy services using Docker and Docker Compose or similar
Support Machine Learning projects with data storage, serving, and versioning infrastructure (object storage, SQL/NoSQL databases, feature stores).
Collaborate on multi-cloud and on-premise deployments (e.g. Hetzner, Azure, bare metal) and contribute to infrastructure-as-code practices.
Support the preparation of technical sections in EU-funded project proposals (Horizon Europe and similar), and contribute to scientific dissemination (papers, prototypes, demos).

Requirements

MSc in Computer Science, Data Engineering, Mathematics, Physics, or related technical field. A PhD or specialised Master's will be highly valued.

Experience

At least 2 years of professional experience as a Data Engineer or in a closely related role

Technical skills

Strong Python proficiency, including modern tooling for clean code (type hints, linters/formatters such as Ruff, testing with pytest).
Solid SQL skills and experience with relational databases (PostgreSQL, MySQL)
Experience with at least one NoSQL or document database (Redis, Elasticsearch, or similar)
Experience building ETL/ELT data pipelines (Airflow, Dagster or similar)
Working knowledge of object storage (S3, MinIO) and common serialization formats (Parquet, JSONL, Avro, BSON).
Comfort on Linux and with the command line
Docker and Docker Compose for packaging and local development
Git and CI/CD workflows (GitHub Actions, GitLab CI, or similar)
Understanding of batch vs. streaming paradigms and event-driven architectures
Understanding of the difference between Data Lake and Data Warehouse architectures, and when to use each.

Languages

Excellent written and spoken English
Knowledge of Catalan and/or Spanish is a plus, * Experience with distributed query engines (Trino, Presto, Dremio) and the concept of federated queries over heterogeneous data sources.
Familiarity with European data spaces initiatives: IDSA, Gaia-X, FIWARE, DSSC, Eclipse Dataspace Components; data catalogues (CKAN), brokers, and connectors.
Big Data ecosystem: Apache Spark, Flink, Kafka, RabbitMQ, Hadoop
Kubernetes and Helm for production deployments
Infrastructure as Code with Terraform, Ansible, or similar
Observability stacks: OpenTelemetry, Prometheus + Grafana, Loki, or equivalents
Experience with cloud providers (Azure, AWS, GCP, Hetzner): serverless functions, managed storage, IAM.
Graph databases (Neo4j) or time-series databases
Machine Learning fundamentals and familiarity with ML lifecycle tooling (MLflow, feature stores, model versioning).
Concurrency and backend knowledge: async programming, multithreading, actor model, message-driven systems.
Additional programming languages: Java, Scala, Go, or Rust
Participation in EU-funded research projects (Horizon Europe, Digital Europe) or scientific publications / conference presentations.
Relevant certifications (cloud providers, Kubernetes CKA/CKAD, data platforms)

Benefits & conditions

Hybrid work (home office/ work in the office).
Flexible Schedule.
Shorter workday on Friday and Summer Schedule.
Flexible remuneration package (health insurance, transport, lunch, studies - training and kindergarten).
Eurecat employees can join the Eurecat Academy courses.
Language courses (English, Catalan and Spanish).

False

About the company

Eurecat is the second Research & Technology Organisation in Spain and one of the largest applied research and technology transfer organisation in Southern Europe. It brings together the experience of more than 800 professionals who generate an annual turnover of 69 million euros and provides services to more than 2,000 companies. Eurecat integrates advanced digital capabilities and experience in biotechnology, industry and sustainability and collaborates with industry in R+D+I activities and projects, offering advanced scientific and technological services and specialized knowledge to respond effectively to the technological needs of very different business sectors, accelerating innovation, reducing both risks and spendings on scientific and technological infrastructures. The technology center participates in more than 200 large national and international consortium projects of high strategic R&I, has 230 patents and 10 spin-offs. Eurecat has eleven centers in Catalonia and presence in Madrid, Malaga and Chile.