Data Engineer - IAM Data Lake (Google Cloud Platform)

THE JUDGE GROUP, INC.

Irving, United States of America

2 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Junior

Compensation

$ 119K

Job location

Irving, United States of America

Tech stack

API

Airflow

Big Data

Cloud Storage

Continuous Integration

Data as a Services

Information Engineering

Data Infrastructure

Data Security

Data Systems

Distributed Data Store

Data Flow Control

Hadoop

Hadoop Distributed File System

Identity and Access Management

Python

Data Streaming

Parquet

Google Cloud Platform

Data Ingestion

Change Data Capture

Build Management

Data Lake

PySpark

Semi-structured Data

Avro

Data Management

Api Design

Data Pipelines

Job description

We are seeking a Data Engineer to support the development and evolution of an Identity and Access Management (IAM) Data Lake on Google Cloud Platform (Google Cloud Platform). In this role, you will design, build, and maintain scalable data pipelines and architectures that support security engineering and analytics initiatives.

You will work cross-functionally with Information Security Engineering teams to deliver reliable, compliant, and high-quality data solutions that enable informed decision-making at scale. What you'll do

Design and build scalable data lake architectures using Google Cloud Platform services.
Develop and maintain batch and streaming data pipelines using Google Cloud Platform-native tools (e.g., Pub/Sub, Dataflow).
Implement data ingestion solutions, including Change Data Capture (CDC) and incremental loading strategies.
Define and enforce data modeling, storage, and lifecycle management best practices.
Work with structured and semi-structured data formats (Parquet, Avro, ORC), optimizing storage and performance.
Collaborate with Information Security Engineering teams to deliver secure and compliant data solutions.
Build and manage curated datasets, APIs, and data access layers for downstream consumption.
Contribute to CI/CD pipelines, automation, and operational excellence for data systems.
Participate in architecture reviews and contribute to large-scale data platform planning., * Multiple roles reporting to the same hiring manager-candidate submissions will be evaluated across requisitions.
Preference for candidates located in the Dallas, TX area; Columbus, OH is also acceptable.

By providing your phone number, you consent to: (1) receive automated text messages and calls from the Judge Group, Inc. and its affiliates (collectively "Judge") to such phone number regarding job opportunities, your job application, and for other related purposes. Message & data rates apply and message frequency may vary. Consistent with Judge's Privacy Policy, information obtained from your consent will not be shared with third parties for marketing/promotional purposes. Reply STOP to opt out of receiving telephone calls and text messages from Judge and HELP for help.

Requirements

4+ years of experience in Data Engineering, Information Security Engineering, or a related field.
Hands-on experience with Google Cloud Platform (Google Cloud Platform) data services.
Experience building and maintaining data pipelines (batch and/or streaming).
Proficiency in Python/PySpark for large-scale data processing.
Experience working with APIs and distributed data systems.

Preferred qualifications

Strong understanding of Google Cloud Platform architecture, including:

Cloud Storage (bucket design, naming conventions, lifecycle policies)
Identity and access management controls

Experience with streaming architectures using Pub/Sub, including schema design and evolution.
Familiarity with Hadoop/HDFS and the broader Hadoop ecosystem.
Experience with workflow orchestration tools such as Apache Airflow.
Knowledge of CI/CD practices and data platform automation.
Experience with data modeling and building analytical datasets.
Familiarity with columnar data formats and compression techniques (Parquet, Avro, ORC).
Understanding of data consumption patterns, including APIs and data services.

Technical experience (typical range)

Google Cloud Platform: 4-6 years
Data pipelines & processing: 4-6 years
PySpark: 4-6 years
APIs: 4-6 years
Airflow: 2-4 years
CI/CD: 2-4 years
Data modeling: 2-4 years
Hadoop ecosystem: 1-2 years

Nice to know

Experience supporting IAM or security-focused data platforms.
Exposure to large-scale enterprise data environments.
Ability to analyze moderately complex problems and independently deliver solutions.

Role details

Job location

Tech stack

Job description

Requirements

Apply for this position

Good distractions

Moments

Videos View all