Data Engineer - IAM Data Lake (Google Cloud Platform)

THE JUDGE GROUP, INC.
Irving, United States of America
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Junior
Compensation
$ 119K

Job location

Irving, United States of America

Tech stack

API
Airflow
Big Data
Cloud Storage
Continuous Integration
Data as a Services
Information Engineering
Data Infrastructure
Data Security
Data Systems
Distributed Data Store
Data Flow Control
Hadoop
Hadoop Distributed File System
Identity and Access Management
Python
Data Streaming
Parquet
Google Cloud Platform
Data Ingestion
Change Data Capture
Build Management
Data Lake
PySpark
Semi-structured Data
Avro
Data Management
Api Design
Data Pipelines

Job description

We are seeking a Data Engineer to support the development and evolution of an Identity and Access Management (IAM) Data Lake on Google Cloud Platform (Google Cloud Platform). In this role, you will design, build, and maintain scalable data pipelines and architectures that support security engineering and analytics initiatives.

You will work cross-functionally with Information Security Engineering teams to deliver reliable, compliant, and high-quality data solutions that enable informed decision-making at scale. What you'll do

  • Design and build scalable data lake architectures using Google Cloud Platform services.
  • Develop and maintain batch and streaming data pipelines using Google Cloud Platform-native tools (e.g., Pub/Sub, Dataflow).
  • Implement data ingestion solutions, including Change Data Capture (CDC) and incremental loading strategies.
  • Define and enforce data modeling, storage, and lifecycle management best practices.
  • Work with structured and semi-structured data formats (Parquet, Avro, ORC), optimizing storage and performance.
  • Collaborate with Information Security Engineering teams to deliver secure and compliant data solutions.
  • Build and manage curated datasets, APIs, and data access layers for downstream consumption.
  • Contribute to CI/CD pipelines, automation, and operational excellence for data systems.
  • Participate in architecture reviews and contribute to large-scale data platform planning., * Multiple roles reporting to the same hiring manager-candidate submissions will be evaluated across requisitions.
  • Preference for candidates located in the Dallas, TX area; Columbus, OH is also acceptable.

By providing your phone number, you consent to: (1) receive automated text messages and calls from the Judge Group, Inc. and its affiliates (collectively "Judge") to such phone number regarding job opportunities, your job application, and for other related purposes. Message & data rates apply and message frequency may vary. Consistent with Judge's Privacy Policy, information obtained from your consent will not be shared with third parties for marketing/promotional purposes. Reply STOP to opt out of receiving telephone calls and text messages from Judge and HELP for help.

Requirements

  • 4+ years of experience in Data Engineering, Information Security Engineering, or a related field.
  • Hands-on experience with Google Cloud Platform (Google Cloud Platform) data services.
  • Experience building and maintaining data pipelines (batch and/or streaming).
  • Proficiency in Python/PySpark for large-scale data processing.
  • Experience working with APIs and distributed data systems.

Preferred qualifications

  • Strong understanding of Google Cloud Platform architecture, including:
  • Cloud Storage (bucket design, naming conventions, lifecycle policies)
  • Identity and access management controls
  • Experience with streaming architectures using Pub/Sub, including schema design and evolution.
  • Familiarity with Hadoop/HDFS and the broader Hadoop ecosystem.
  • Experience with workflow orchestration tools such as Apache Airflow.
  • Knowledge of CI/CD practices and data platform automation.
  • Experience with data modeling and building analytical datasets.
  • Familiarity with columnar data formats and compression techniques (Parquet, Avro, ORC).
  • Understanding of data consumption patterns, including APIs and data services.

Technical experience (typical range)

  • Google Cloud Platform: 4-6 years
  • Data pipelines & processing: 4-6 years
  • PySpark: 4-6 years
  • APIs: 4-6 years
  • Airflow: 2-4 years
  • CI/CD: 2-4 years
  • Data modeling: 2-4 years
  • Hadoop ecosystem: 1-2 years

Nice to know

  • Experience supporting IAM or security-focused data platforms.
  • Exposure to large-scale enterprise data environments.
  • Ability to analyze moderately complex problems and independently deliver solutions.

Apply for this position