Data Engineer

ThetaRay

Municipality of Madrid, Spain

4 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English, Spanish

Experience level

Intermediate

Job location

Municipality of Madrid, Spain

Tech stack

Java

Artificial Intelligence

Airflow

Data analysis

Big Data

Cloudera Impala

Information Systems

Customer Data Management

Data Transformation

Data Structures

Linux

Elasticsearch

Hadoop

Hadoop Distributed File System

Hive

Python

Machine Learning

Metadata

SQL Databases

Sqoop

Data Streaming

Feature Engineering

Spark

Jupyter

GIT

Pandas

PySpark

Kubernetes

Information Technology

Data Pipelines

Docker

Jenkins

Microservices

Job description

ThetaRay is a trailblazer in AI-powered Anti-Money Laundering (AML) solutions, offering cutting-edge technology to fintechs, banks, and regulatory bodies worldwide. Our mission is to enhance trust in financial transactions, ensuring compliant and innovative business growth. Our technology empowers customers to expand into new markets and introduce groundbreaking products. Why Join ThetaRay? At ThetaRay, you'll be part of a dynamic global team committed to redefining the financial services sector through technological innovation. You will contribute to creating safer financial environments and have the opportunity to work with some of the brightest minds in AI, ML, and financial technology. We offer a collaborative, inclusive, and forward-thinking work environment where your ideas and contributions are valued and encouraged. Join us in our mission to revolutionize the financial world, making it safer and more trustworthy for millions worldwide. Explore exciting career opportunities at ThetaRay - where innovation meets purpose. We are looking for a Data Engineer to join our growing team of data experts. As a Data Engineer, you will be responsible for designing, implementing, and optimizing data pipeline flows within the ThetaRay system. You will support our data scientists with the implementation of the relevant data flows based on the data scientist's features design and construct complex rules to detect money laundering activity. The ideal candidate has experience in building data pipelines and data transformations and enjoys optimizing data flows and building them from the ground up. They must be self-directed and comfortable supporting multiple production implementations for various use cases. Responsibilities

Implement and maintain data pipeline flows in production within the ThetaRay system based on the data scientist's design
Design and implement solution-based data flows for specific use cases, enabling the applicability of implementations within the ThetaRay product
Building a Machine Learning data pipeline
Create data tools for analytics and data scientist team members that assist them in building and optimizing our product into an innovative industry leader
Work with product, R&D, data, and analytics experts to strive for greater functionality in our systems
Train customer data scientists and engineers to maintain and amend data pipelines within the product
Travel to customer locations both domestically and abroad
Build and manage technical relationships with customers and partners

Requirements

Do you have experience in Spark?, * 2+ years of Hands-on experience working with Apache Spark - must

Hands-on experience with SQL
Hands-on experience with version-control tools such as GIT
Hands-on experience with Apache Hadoop Ecosystem including Hive, Impala, Hue, HDFS, Sqoop etc..
Experience with Python (Pandas)
Experience with PySpark/Scala/Java/R
Hands-on experience with data transformation, validations, cleansing, and ML feature engineering
BSc degree or higher in Computer Science, Statistics, Informatics, Information Systems, Engineering, or another quantitative field
Experience working with and optimizing big data pipelines, architectures, and data sets - an advantage
Strong analytic skills related to working with structured and semi-structured datasets
Build processes supporting data transformation, data structures, metadata, dependency, and workload management
Experience performing root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement
Business-oriented and able to work with external customers and cross-functional teams
Fluent in English & Spanish both written and spoken

Nice to have

Experience with Linux
Experience in building Machine Learning pipeline
Experience with Elasticsearch
Experience with Zeppelin/Jupyter
Experience with workflow automation platforms such as Jenkins or Apache Airflow
Experience with Microservices architecture components, including Docker and Kubernetes.