DATA ENGINEER (Data Science & Big Data Analytics)

Eurecat

1 month ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Shift work

Languages

English, Spanish, Catalan

Job location

Tech stack

Java

Airflow

Amazon Web Services (AWS)

Azure

Big Data

C++

Cloud Computing

Databases

Data Mining

Data Sharing

Dataspaces

Data Warehousing

Linux

DevOps

Programming Tools

Eclipse

Elasticsearch

Hadoop

Python

PostgreSQL

Machine Learning

MongoDB

MySQL

NoSQL

Queue Management Systems

Redis

Software Tools

Ansible

Scala

SQL Databases

Data Streaming

Management of Software Versions

Google Cloud Platform

GIT

Virtual Computing

Kubernetes

Information Technology

Apache Flink

Cassandra

Real Time Data

Kafka

Spark Streaming

Data Pipelines

Serverless Computing

Docker

Ambari

Job description

· Design and deploy data pipelines for bringing data and control metadata lineage from different source systems to a data warehousing system, including queue management and real-time data processing.

· Architect reusable software practices using state of the art orchestration systems such as Airflow or Dagster and containerization stack such as Docker or Kubernetes.

· Deploy data sharing and cataloguing existing software tools to enable Data Spaces through standard building blocks from IDSA, Gaia-X and Fiware.

· Contribute to Machine Learning projects by adopting technologies for data storing, serving and versioning capabilities from blob-Storage to traditional SQL solutions and NoSQL.

· Assist the unit in multi-cloud deployments (Amazon Web Services, Azure and Google Cloud Platform).

· When applicable, deploy and design Big Data Architectures including batch and streaming paradigms such as Flink, Spark Structured Streaming, Kafka, and the Hadoop Ecosystem.

· Project management and technical lead in EU funded and private projects. Writing proposals for Horizon Europe calls (excellence, task description…)

Requirements

· MsC in Computer Science. Other technical background will also be considered (Engineer, Mathematics, Physics, etc.) and PhD or Masters in this field will be highly valuated.

Experience

· Database Systems (MySQL, PostgreSQL, MongoDB, ElasticSearch, Cassandra DB, Redis)

· Implementation of data catalogues (CKAN), data brokers and data connectors (DSSC, Eclipse, Fiware)

· Airflow and Python based E(L)TL data pipelines.

· Cloud providers technological stack (Serverless functions, Virtual Computing Resources, Storage).

· Software patterns best practices as well as knowledge from Data Mining and Machine Learning.

· Languages and programming tools (Python, Java, Scala, SQL, C/C++, Git, Docker)

· DevOps knowledge and experience in Linux is a big plus, specially deploying and maintaining big data infrastructures using Ambari or Ansible.

· Interest in Research, Innovation & Patent Publication, Conference Presentations, Prototyping, Full Lifecycle Product Development.

Languages

· Excellent written and oral communication skills in English.

· Catalan and/or Spanish would be desirable.

Benefits & conditions

Hybrid work (home office/ work in the office).
Flexible Schedule.
Shorter workday on Friday and Summer Schedule.
Flexible remuneration package (health insurance, transport, lunch, studies - training and kindergarten).
Eurecat employees can join the Eurecat Academy courses.
Language courses (English, Catalan and Spanish).