data engineer

Big Data & Data Science
yesterday

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Shift work
Languages
English, Spanish, Catalan
Experience level
Intermediate

Job location

Tech stack

Clean Code Principles
Airflow
Amazon Web Services (AWS)
Amazon Web Services (AWS)
Azure
Big Data
Code Review
Computer Programming
Databases
Information Engineering
ETL
Serialization
Dataspaces
Data Warehousing
Relational Databases
Linux
Document-Oriented Databases
Eclipse
Elasticsearch
Python
PostgreSQL
Machine Learning
MySQL
Neo4j
NoSQL
RabbitMQ
Redis
Ansible
Prometheus
Standard Sql
Software Construction
SQL Databases
Management of Software Versions
Parquet
Multithreading
Google Cloud Platform
Data Storage Technologies
Bson
Spark
Multi-Cloud
Backend
Gitlab
GIT
Pytest
Data Lake
Kubernetes
Information Technology
Apache Flink
Avro
Bare Metal
Kafka
Data Management
Machine Learning Operations
Presto
Terraform
Software Version Control
Data Pipelines
Serverless Computing
Workday
Docker

Job description

query and discovery systems over distributed datasets (UNCAN.eu), working with engines such as Trino and integrating query optimizers compliant with privacy requirements.Contribute to the deployment of European data spaces (Deploy EMDS) using standard building blocks from IDSA, Gaia-X, and FIWARE, including data catalogues, brokers, and connectors.Build and maintain orchestration workflows using Airflow or Dagster, following software engineering best practices (tests, code review, CI/CD).Package and deploy services using Docker and Docker Compose or similar.Support Machine Learning projects with data storage, serving, and versioning infrastructure (object storage, SQL/No SQL databases, feature stores).Collaborate on multi-cloud and on-premise deployments (e.g. Hetzner, Azure, bare metal) and contribute to infrastructure-as-code practices.Support the preparation of technical sections in EU-funded project proposals (Horizon Europe and similar), and contribute to scientific dissemination

Requirements

(papers, prototypes, demos).Requirements Studies MSc in Computer Science, Data Engineering, Mathematics, Physics, or related technical field. A Ph D or specialised Master's will be highly valued.Experience At least 2 years of professional experience as a Data Engineer or in a closely related role.Technical SkillsStrong Python proficiency, including modern tooling for clean code (type hints, linters/formatters such as Ruff, testing with pytest).Solid SQL skills and experience with relational databases (Postgre SQL, My SQL).Experience with at least one No SQL or document database (Redis, Elasticsearch, or similar).Experience building ETL/ELT data pipelines (Airflow, Dagster or similar).Working knowledge of object storage (S3, Min IO) and common serialization formats (Parquet, JSONL, Avro, BSON).Comfort on Linux and with the command line.Docker and Docker Compose for packaging and local development.Git and CI/CD workflows (Git Hub Actions, Git Lab CI, or similar).Understanding of batch vs. streaming paradigms and event-driven architectures.Understanding of the difference between Data Lake and Data Warehouse architectures, and when to use each.LanguagesExcellent written and spoken English.Knowledge of Catalan and/or Spanish is a plus.Nice-to-haveExperience with distributed query engines (Trino, Presto, Dremio) and the concept of federated queries over heterogeneous data sources.Familiarity with European data spaces initiatives: IDSA, Gaia-X, FIWARE, DSSC, Eclipse Dataspace Components; data catalogues (CKAN), brokers, and connectors.Big Data ecosystem: Apache Spark, Flink, Kafka, Rabbit MQ, Hadoop.Kubernetes and Helm for production deployments.Infrastructure as Code with Terraform, Ansible, or similar.Observability stacks: Open Telemetry, Prometheus + Grafana, Loki, or equivalents.Experience with cloud providers (Azure, AWS, GCP, Hetzner): serverless functions, managed storage, IAM.Graph databases (Neo4j) or time-series databases.Machine Learning fundamentals and familiarity with ML lifecycle tooling (MLflow, feature stores, model versioning).Concurrency and backend knowledge: async programming, multithreading, actor model, message-driven systems.Additional programming languages: Java, Scala, Go, or Rust.Participation in EU-funded research projects (Horizon Europe, Digital Europe) or scientific publications / conference presentations.Relevant certifications (cloud providers, Kubernetes CKA/CKAD, data platforms).WHAT CAN EURECAT OFFER YOU?Permanent contract.Hybrid work (home office / work in the office).Flexible Schedule.Shorter workday on Friday and Summer Schedule.Flexible remuneration package (health insurance, transport, lunch, studies - training and kindergarten).Eurecat employees can join the Eurecat Academy courses.Language courses (English, Catalan and Spanish).", "employmentType": "FULL_TIME", "industry": "Data Engineer", "jobLocation" : { "@type": "Place", "address": { "@type": "PostalAddress", "streetAddress": "Barcelona", "addressLocality": "Barcelona", "addressRegion": "Barcelona", "addressCountry": "ES", "postalCode": "n/a" } }, "salaryCurrency": "EUR", "title": "Data engineer (data science & big data analytics)", "hiringOrganization" : { "@type" : "Organization", "name" : "Eurecat - Technology Centre" } }

About the company

{ "@context": "http://schema.org", "@type": "JobPosting", "baseSalary" : { "@type": "MonetaryAmount", "currency": "EUR", "value": { "@type": "QuantitativeValue", "minValue": 30000, "maxValue": 50000.00, "value": 50000.00, "unitText": "YEAR" } }, "datePosted": "2026-04-29", "validThrough" : "2026-07-17", "description": "You will join the Big Data & Data Science unit, a diverse team covering areas as varied as Computational Social Science, Cognitive Neuroscience, and Trustworthy AI. We are looking for an intelligent and curious data engineer to help us translate applied research into tangible products and prototypes, working on real European research projects alongside researchers, software engineers, and project managers.FUNCTIONS AND RESPONSIBILITIES OF THE JOBDesign, build, and maintain data pipelines (batch and streaming) that ingest data from heterogeneous sources into data lakes and warehouses, including metadata and lineage tracking.Contribute to the development of federated

Apply for this position