INTL Senior Data Engineer - AOR

Insight Global
Woonsocket, United States of America
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Woonsocket, United States of America

Tech stack

Adobe Analytics
Java
Adobe Experience Manager
Artificial Intelligence
Big Data
Google BigQuery
Computer Programming
Information Engineering
Data Infrastructure
Disaster Recovery
Python
Machine Learning
Node.js
Performance Tuning
Cloudera
SQL Databases
Data Streaming
Systems Integration
Transaction Data
Management of Software Versions
Feature Engineering
Spark
Core Api
Data Layers
PySpark
Kafka
Database Replication
Machine Learning Operations
Data Pipelines

Job description

An employer is seeking a Data Engineer II (ML Training & Multi-Source Integration) to join a large healthcare client supporting the AI Insight & Next Best Action platform. The project focuses on building and scaling the data layer that powers machine learning models, including integrating multiple data sources, developing feature pipelines, and enabling high-quality, production-ready ML datasets.

Responsibilities will include:

Build and maintain Feature Store pipelines that ingest and process behavioral, clinical, engagement, and Rx data signals

Design and develop ML training datasets, including batch and real-time feature pipelines, dataset versioning, and training/evaluation splits

Integrate and normalize multi-source data such as Kafka event streams, Adobe Analytics data, and healthcare datasets

Develop and optimize large-scale data processing jobs using Apache Spark (Dataproc) for feature engineering and model input preparation

Monitor and improve data quality for ML models, including tracking feature freshness, identifying data drift, and ensuring pipeline reliability

Partner with engineering teams to define data schemas and event structures that support downstream machine learning workflows

Ensure secure and compliant handling of sensitive data, including masking, de-identification, and maintaining auditability within data pipelines

Support data resiliency efforts, including disaster recovery planning, data replication strategies, and dataset lifecycle management

Maintain clear documentation of data pipelines, feature definitions, and lineage to support model transparency and operational efficiency

We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to HR@insightglobal.com.To learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy: https://insightglobal.com/workforce-privacy-policy/.

Requirements

  • 5-7 years of data or ML data engineering experience (production environment, ideally GCP)

  • Strong programming experience in Python, Java, or Node.js for building data pipelines and feature engineering

  • Hands-on experience building ML training pipelines and Feature Stores (GCP Feature Store preferred)

  • Deep experience with Apache Spark (PySpark/DataSpark) for large-scale data processing and feature engineering

  • Strong experience working with BigQuery (complex SQL, data modeling, performance optimization)

  • Experience with Kafka (streaming ingestion / event-driven pipelines) Experience working with multi-source feature stores (behavioral, clinical, transactional data)

Knowledge of healthcare data domains (Rx, clinical, benefits)

Experience integrating Adobe Analytics or similar behavioral data platforms

Familiarity with Adobe Experience Platform APIs

Exposure to NIST / HITRUST frameworks in regulated data environments

Experience with GCP encryption (CMEK) for secure datasets

Apply for this position