INTL Senior Data Engineer - AOR

Insight Global

Woonsocket, United States of America

2 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Woonsocket, United States of America

Tech stack

Adobe Analytics

Java

Adobe Experience Manager

Artificial Intelligence

Big Data

Google BigQuery

Computer Programming

Information Engineering

Data Infrastructure

Disaster Recovery

Python

Machine Learning

Node.js

Performance Tuning

Cloudera

SQL Databases

Data Streaming

Systems Integration

Transaction Data

Management of Software Versions

Feature Engineering

Spark

Core Api

Data Layers

PySpark

Kafka

Database Replication

Machine Learning Operations

Data Pipelines

Job description

An employer is seeking a Data Engineer II (ML Training & Multi-Source Integration) to join a large healthcare client supporting the AI Insight & Next Best Action platform. The project focuses on building and scaling the data layer that powers machine learning models, including integrating multiple data sources, developing feature pipelines, and enabling high-quality, production-ready ML datasets.

Responsibilities will include:

Build and maintain Feature Store pipelines that ingest and process behavioral, clinical, engagement, and Rx data signals

Design and develop ML training datasets, including batch and real-time feature pipelines, dataset versioning, and training/evaluation splits

Integrate and normalize multi-source data such as Kafka event streams, Adobe Analytics data, and healthcare datasets

Develop and optimize large-scale data processing jobs using Apache Spark (Dataproc) for feature engineering and model input preparation

Monitor and improve data quality for ML models, including tracking feature freshness, identifying data drift, and ensuring pipeline reliability

Partner with engineering teams to define data schemas and event structures that support downstream machine learning workflows

Ensure secure and compliant handling of sensitive data, including masking, de-identification, and maintaining auditability within data pipelines

Support data resiliency efforts, including disaster recovery planning, data replication strategies, and dataset lifecycle management

Maintain clear documentation of data pipelines, feature definitions, and lineage to support model transparency and operational efficiency

We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to HR@insightglobal.com.To learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy: https://insightglobal.com/workforce-privacy-policy/.

Requirements

5-7 years of data or ML data engineering experience (production environment, ideally GCP)
Strong programming experience in Python, Java, or Node.js for building data pipelines and feature engineering
Hands-on experience building ML training pipelines and Feature Stores (GCP Feature Store preferred)
Deep experience with Apache Spark (PySpark/DataSpark) for large-scale data processing and feature engineering
Strong experience working with BigQuery (complex SQL, data modeling, performance optimization)
Experience with Kafka (streaming ingestion / event-driven pipelines) Experience working with multi-source feature stores (behavioral, clinical, transactional data)

Knowledge of healthcare data domains (Rx, clinical, benefits)

Experience integrating Adobe Analytics or similar behavioral data platforms

Familiarity with Adobe Experience Platform APIs

Exposure to NIST / HITRUST frameworks in regulated data environments

Experience with GCP encryption (CMEK) for secure datasets

Role details

Job location

Tech stack

Job description

Requirements

Apply for this position

Good distractions

Moments

Videos View all