Data Engineer- Python, AI/ML

Motion Recruitment

Warren, United States of America

yesterday

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Intermediate

Job location

Warren, United States of America

Tech stack

Artificial Intelligence

Airflow

Amazon Web Services (AWS)

Automation of Tests

Azure

Code Review

Databases

Continuous Integration

Data Validation

Data Deduplication

Python

PostgreSQL

Microsoft SQL Server

MySQL

NumPy

Power BI

SciPy

SQL Databases

Tableau

Workflow Management Systems

Cloud Platform System

Data Classification

Feature Engineering

Large Language Models

Spark

GIT

Pandas

PySpark

Semi-structured Data

Scikit Learn

Information Technology

Data Lineage

Data Management

Looker Analytics

Data Pipelines

Job description

Build and maintain Python and SQL pipelines for governance-related ingestion, cleaning, transformation, and validation of structured and semi-structured data.
Implement and operate data quality checks, schema validation, and integrity rules across pipelines; investigate and resolve quality issues.
Contribute to master data workflows: standardization, deduplication, and consolidation of data from heterogeneous sources into consistent reference and golden-record datasets.
Instrument pipelines for data lineage, metadata, and catalog tooling.
Develop pipelines that feed governance dashboards and reporting in Tableau, Power BI, or Looker.
Build reproducible, well-documented pipelines for compliance and audit reporting.
Contribute to AI / ML-assisted governance use cases: embedding-based data classification, anomaly detection on quality metrics, LLM-assisted catalog search, and MCP-based exposure of governed datasets to AI assistants.
Partner with team leads, data stewards, and stakeholders to translate governance requirements into engineering work.
Follow team engineering practices: Git, code review, modular pipeline design, automated testing, CI/CD.

Requirements

Bachelor's or Master's degree in Computer Science, Data Science, Engineering, Statistics, or a related field.
2+ years building data pipelines in Python (Pandas, NumPy, SciPy) and SQL.
Working experience with Apache Spark or PySpark and workflow orchestration (Apache Airflow).
Schema design across relational (PostgreSQL, MySQL, SQL Server) and analytical databases, including standardization across heterogeneous sources.
Experience implementing data quality validation, EDA, and integrity enforcement on production datasets.
Hands-on experience with at least one major cloud platform (AWS, Azure, or GCP).
Working familiarity with Python ML libraries (Scikit-Learn) for feature engineering and exploratory analysis.
Experience producing analytics-ready datasets for BI tools (Tableau, Power BI, or Looker).
Git, code review, and CI/CD practices.
Clear technical communication and collaborative working style.

Role details

Job location

Tech stack

Job description

Requirements

Apply for this position

Good distractions

Moments

Videos View all