Remote GCP Data Engineer - Java focused

Insight Global
Hartford, United States of America
1 month ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Intermediate
Compensation
$ 104K

Job location

Hartford, United States of America

Tech stack

Java
JavaScript
Airflow
Automation of Tests
Google BigQuery
Clinical Data Repository
Cloud Computing
Cloud Storage
Code Review
Computer Programming
Continuous Integration
Directed Acyclic Graph (Directed Graphs)
Data Validation
Data Governance
ETL
DevOps
Fault Tolerance
Data Flow Control
Identity and Access Management
Networking Basics
Performance Tuning
Data Logging
Data Processing
Google Cloud Platform
Software Version Control
Data Pipelines
Apache Beam

Job description

In this role, you will build and enhance GCP data pipelines that support a healthcare client's clinical delivery initiatives, ensuring data is timely, accurate, and fit for downstream analytics and operational needs. You'll primarily develop Dataflow (Apache Beam) pipelines in Java, implementing transformations, enrichment logic, validation checks, and fault-tolerant handling for both batch and streaming workloads. You will orchestrate end-to-end workflows using Cloud Composer (Airflow), creating DAGs, scheduling dependencies, and managing retries, alerting, and backfills when needed. A typical day includes collaborating with clinical and technical stakeholders to clarify requirements, translate them into pipeline designs, and align on data definitions and acceptance criteria. You'll monitor pipeline health, investigate failures or performance bottlenecks, and apply optimizations to improve throughput, cost efficiency, and reliability. You will also contribute to DevOps practices-version control, code reviews, automated testing, CI/CD deployments, and environment promotion-while keeping documentation and runbooks current. Throughout, you'll operate with a strong quality and compliance mindset, helping ensure data handling aligns with healthcare governance expectations and supports dependable clinical delivery outcomes.

Requirements

3-6 years of hands-on Google Cloud Platform (GCP) experience, with deep, production-level work in Cloud Dataflow (Apache Beam) and Cloud Composer (managed Apache Airflow) to build and orchestrate data pipelines. Strong Java scripting/programming experience is required (e.g., building Dataflow/Beam pipelines, developing reusable pipeline components, writing robust transformations, and supporting CI/CD for Java-based services). Demonstrated ability to design and maintain reliable ETL/ELT workflows (batch and/or streaming), including schema management, data validation, error handling, observability, and performance tuning. Familiarity with adjacent GCP services and best practices (e.g., IAM, networking basics, monitoring/logging, secret management, and integration patterns with BigQuery/Cloud Storage/Pub/Sub/Dataproc as applicable), plus strong documentation and cross-team communication skills.

Nice to Have Skills & Experience

Experience supporting healthcare data use cases-comfort working with sensitive clinical data, aligning to data governance practices, and collaborating with stakeholders supporting clinical delivery initiatives (e.g., quality measures, clinical operations reporting, care management analytics)

Benefits & conditions

Benefit packages for this role will start on the 1st day of employment and include medical, dental, and vision insurance, as well as HSA, FSA, and DCFSA account options, and 401k retirement account access with employer matching. Employees in this role are also entitled to paid sick leave and/or other paid time off as provided by applicable law.

Apply for this position