Junior Data Engineer

Guidehouse Inc.
Arlington, United States of America
3 days ago

Role details

Contract type
Internship / Graduate position
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Junior

Job location

Arlington, United States of America

Tech stack

API
Information Systems
Information Engineering
ETL
Issue Tracking Systems
Python
Metadata
Meta-Data Management
SQL Databases
SC Clearance
Information Technology
Collibra
Amazon Web Services (AWS)
Data Pipelines

Job description

The Junior Data Engineer supports system-specific implementations that enable enterprise ingestion by assisting in the design, build, and operation of automated metadata harvesting pipelines that populate DHA-EDC content on reliable schedules.

This role focuses on hands-on engineering using SQL/Python/R to develop connectors and extraction scripts, implement validation rules, and configure refresh frequencies to support completeness, accuracy, and timeliness of harvested metadata.

Working closely (in-person and virtually) with source system SMEs, the Junior Data Engineer helps safely extract metadata from authorized sources and escalates tool constraints and design decisions to Architects within the approved architecture.

In support of an automated metadata harvesting and documentation, this role contributes to building and orchestrating metadata extraction pipelines to ensure timely population of DHA-EDC content.

Key Responsibilities

  • Assists in engineering metadata harvesting pipelines (SQL/Python/R), connectors, validation rules, and refresh frequencies for system-specific implementations supporting enterprise ingestion.

  • Collaborates in-person/virtually with source system SMEs to safely extract metadata from relevant sources and validate results during review cycles.

  • Supports implementation and testing of automated metadata harvesting tools and ETL/ingestion workflows to enable real-time or scheduled extraction.

  • Implements validation checks to help measure metadata completeness, accuracy, and timeliness, including coverage reporting and issue flagging for remediation.

  • Participates in defect triage and troubleshooting for failed harvest runs, refresh delays, or validation errors; documents findings and supports re-runs after fixes.

  • Co-develops the metadata harvesting plan with Architects and program stakeholders; translates requirements into build tasks and technical documentation.

  • Produces and maintains technical documentation for pipelines, connector configurations, validation logic, and refresh schedules to support sustainment and knowledge transfer.

Requirements

  • Must be able to OBTAIN and MAINTAIN a Federal or DoD "PUBLIC TRUST". Candidates with an ACTIVE SECRET CLEARANCE OR PUBLIC TRUST or SUITABILITY are preferred. Ability to meet the project’s client security and access requirements.

  • Bachelor’s degree obtained

  • 3 years of experience (or strong internship/coop experience) in data engineering, ETL/ELT, or pipeline development using SQL and Python.

  • Working proficiency in SQL and Python; familiarity with R is a plus for statistical validation and profiling.

  • Familiarity with building or supporting connectors/extractors (e.g., relational sources, APIs, file-based repositories) and running scheduled jobs.

  • Ability to collaborate with technical SMEs and follow structured validation and documentation processes.

  • Once onboard with Guidehouse, new hire MUST be able to OBTAIN and MAINTAIN a Federal or DoD "Secret" security clearance.

  • Strong written and verbal communication skills; ability to document technical work clearly and escalate issues appropriately.

What Would Be Nice to Have:

  • Bachelor’s degree in computer science, Information Systems, Engineering, or a related field (or equivalent practical experience).

  • Exposure to metadata/catalog patterns (e.g., metadata harvesting, catalog population concepts, lineage/semantic discovery concepts).

  • Hands on experience performing data engineering related to metadata harvesting efforts.

  • Hands on experience with leading data cataloging solutions such as Collibra, Alation, Unity Catalog, Azure Purview, or AWS Glue.

  • Experience supporting testing/validation workflows (e.g., rule-based checks, completeness/accuracy/timeliness metrics, defect tracking).

  • Familiarity working in Agile delivery teams and using backlog/issue tracking tools.

  • Prior experience in federal health, healthcare.

Benefits & conditions

Guidehouse offers a comprehensive, total rewards package that includes competitive compensation and a flexible benefits package that reflects our commitment to creating a diverse and supportive workplace.

Benefits include:

  • Medical, Rx, Dental & Vision Insurance

  • Personal and Family Sick Time & Company Paid Holidays

  • Position may be eligible for a discretionary variable incentive bonus

  • Parental Leave and Adoption Assistance

  • 401(k) Retirement Plan

  • Basic Life & Supplemental Life

  • Health Savings Account, Dental/Vision & Dependent Care Flexible Spending Accounts

Apply for this position