AI Data Engineer

The Rose

3 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Job location

Remote

Tech stack

Java

API

Artificial Intelligence

Airflow

Amazon Web Services (AWS)

Azure

Big Data

Google BigQuery

Databases

Data Validation

Data Cleansing

Information Engineering

Data Infrastructure

ETL

Data Structures

Data Warehousing

Database Queries

Hadoop

Python

PostgreSQL

MongoDB

MySQL

Standard Sql

Data Streaming

Unstructured Data

Workflow Management Systems

Data Processing

Data Storage Management

Google Cloud Platform

Snowflake

Spark

Data Lake

Apache Flink

Google BigQuery

Kafka

Data Management

Machine Learning Operations

Stream Processing

Data Pipelines

Docker

Redshift

Job description

An AI Data Engineer is responsible for designing, building, and managing data infrastructure that supports AI and Machine Learning systems. This role focuses on creating scalable data pipelines, preparing high-quality datasets, and enabling efficient model training and deployment., * Build and maintain data pipelines for AI/ML workflows

Collect, clean, and preprocess structured and unstructured data
Design and manage data lakes and data warehouses
Develop ETL/ELT processes for large-scale data processing
Optimize data storage and retrieval for AI model performance
Integrate data from multiple sources (APIs, databases, streaming systems)
Collaborate with data scientists and AI engineers to provide training datasets
Implement data validation, quality checks, and governance policies
Work with real-time and batch data processing systems
Monitor and troubleshoot data pipeline performance issues, * Languages: Python, Java, SQL
Big Data: Apache Spark, Hadoop
Databases: MySQL, PostgreSQL, MongoDB
Data Warehouses: Redshift, BigQuery, Snowflake
Streaming: Kafka, Flink
Orchestration: Apache Airflow
Platforms: AWS, Azure, Google Cloud Platform

Requirements

Strong knowledge of SQL for data querying and transformation
Proficiency in Python or Java for data processing
Understanding of data engineering concepts (ETL, data pipelines)
Familiarity with databases (MySQL, PostgreSQL, MongoDB)
Knowledge of data structures and algorithms
Understanding of data preprocessing techniques for ML
Problem-solving and analytical thinking, * Experience with big data tools (Apache Spark, Hadoop)
Familiarity with AI/ML workflows and data requirements
Knowledge of data warehouse tools (Amazon Redshift, Google BigQuery, Snowflake)
Experience with streaming tools (Kafka, Flink)
Understanding of MLOps practices
Experience with cloud platforms like Amazon Web Services, Microsoft Azure, or Google Cloud Platform
Familiarity with workflow orchestration tools (Apache Airflow)
Basic knowledge of Docker and Kubernetes