Data Engineer II
Role details
Job location
Tech stack
Job description
SteerBridge seeks a highly skilled and motivated Data Engineer II to join our team supporting the F-35 AI/ML Spares Project. As the most advanced fighter jet in the world, the F-35 strengthens national security, enhances global partnerships, and powers economic growth. Our F-35 AI/ML Spares Project applies advanced computational analytics to revolutionize supply chain management in the aerospace industry - harnessing the power of AI/ML to increase parts availability, reduce maintenance wait times, and maximize aircraft availability. In collaboration with the National Center for Manufacturing Sciences (NCMS), we are on a mission to deliver exceptional solutions that will redefine operational readiness for the F-35 program and beyond.
In this mid-level role, you will build and maintain AWS-based ETL/ELT pipelines, curated analytical datasets, and reporting workflows that support operational decision-making. You will design scalable data infrastructure that powers business intelligence, machine learning, and operational analytics - performing data engineering tasks on-site within the existing systems of record with multiple databases. This position also involves mentoring and collaborating with Marines at the squadron level, requiring a deep understanding of squadron-specific operations and a commitment to improving data entry and indexing practices. You will be a crucial link between our existing systems and data development, collaborating closely with data scientists, analysts, and operational partners., * Lead end-to-end data pipeline operations - design, develop, and maintain robust ETL/ELT pipelines on AWS (AWS Glue, Amazon Redshift, Amazon S3) using modern orchestration tools such as Apache Airflow, dbt, or Prefect.
-
Use Azure Databricks (Spark) and Azure Data Factory to manage and schedule data pipelines and workflows.
-
Build and optimize data models in cloud data warehouses (Snowflake, BigQuery, or Redshift); maintain data in S3 buckets and Blob storage.
-
Integrate data from diverse sources including REST/SOAP APIs, event streams (Kafka), relational and non-relational databases, and SaaS platforms.
-
Create, index, query, and update SQL tables/servers; run and update Python and/or JavaScript code to parse data.
-
Monitor pipeline health, enforce data quality controls (schema validation, null checks, duplicate detection), troubleshoot data issues, and implement alerting and observability best practices.
-
Develop and implement data acquisition, quality assurance, and management protocols; document all data collection, cleaning, and analyses for internal and external users.
-
Use schedulers and APIs to obtain near real-time data; automate workflows and processes using Python or other scripting languages.
-
Partner with data scientists and analysts to deliver clean, well-documented datasets and data products; collaborate with and support the data science team to produce deliverables.
-
Contribute to data governance standards including lineage, cataloging, and access controls.
-
Mentor and collaborate with Marines at the squadron level to improve data entry and indexing practices.
-
Assist with maintenance and development of internal analytics data architecture.
-
Design, write, and disseminate innovative and visually appealing reports for diverse audiences.
-
Participate in code reviews, architecture discussions, and cross-functional planning sessions.
-
Evaluate and recommend new tools and technologies to improve the data platform.
Requirements
Do you have experience in Version control?, Do you have a Master's degree?, * 3-5 years of professional experience in data engineering or a closely related role.
-
Bachelor's Degree in Computer Science or related field; three (3) years of additional relevant experience may substitute for education (minimum six years total experience without degree).
-
U.S. Citizenship required.
-
Active security clearance or the ability to obtain one is required.
Technical Skills
-
Strong proficiency in Python (PySpark, pandas) and SQL for data processing and pipeline development; SQL fluency including CTEs, window functions, and complex joins.
-
Hands-on experience with at least one cloud data warehouse (Snowflake, BigQuery, or Redshift).
-
Experience configuring or monitoring data pipelines in cloud platforms (AWS preferred; Oracle, Azure, Google also considered).
-
Familiarity with analytics deployment architectures including Python, containerized Docker, and Kubernetes.
-
Experience with Spark/Databricks for streaming data analytics, with preferred experience in graph data, machine learning, and AI applications.
-
Experience using Azure Data Factory to schedule pipelines and manage data flows.
-
Ability to connect and work with APIs (REST, SOAP, HTTP methods).
-
Experience with workflow orchestration tools such as Apache Airflow, dbt, or Prefect.
-
Solid understanding of data modeling concepts: star schema, data vault, medallion architecture, data lineage, and source-to-target mapping.
-
Familiarity with data visualization tools including Tableau, Power BI, Elasticsearch/Kibana, R, or Alteryx.
-
Experience with version control (Git), CI/CD practices, and production deployment workflows.
-
Experience integrating data; familiarity with cleaning, merging, standardizing, documenting, and securing data.
Interpersonal & Professional Skills
-
Strong communication skills with the ability to translate technical concepts for non-technical and operational stakeholders.
-
Ability to develop relationships with collaborators, program providers, community partners, and military personnel.
-
Able to successfully prioritize and manage multiple critical projects simultaneously with a high degree of accuracy.
-
Experience working on applied data projects involving diverse organizations to collect, analyze, and interpret data.
Preferred Qualifications
-
AWS Professional or Specialty Certification, or the ability to obtain one (highly preferred).
-
Experience supporting DoD and/or VA missions.
-
At least two (2) years using SQL professionally, with proficiency in R or Python.
-
Cloud project experience using AWS, Google, Oracle, and/or Azure.
-
Experience with ML/NLP/AI including Neural Networks and Supervised/Unsupervised algorithms for anomaly detection, forecasting, and modeling.
-
Experience with streaming data technologies such as Apache Kafka or AWS Kinesis.
-
Proficiency in HTTP Methods, Postman development/testing of REST and/or SOAP APIs, and CRUD actions.
-
Familiarity with data catalog and lineage tools such as DataHub, Alation, or Monte Carlo.
-
Knowledge of infrastructure-as-code tools such as Terraform or Pulumi.
-
Exposure to ML pipelines and feature stores.
-
Demonstrated high proficiency in statistical analysis software: Power BI, Tableau, Elasticsearch/Kibana, Alteryx, Python, or R.
-
Deep understanding of data quality issues with applied experience in quality assurance.
-
Proficiency in each phase of the software development lifecycle.
-
Master's degree in Computer Science, Engineering, Mathematics, or a related field (or equivalent experience).
-
Excellent writing, presentation, and research design skills; track record of communicating complex concepts to diverse audiences.
Benefits & conditions
Pulled from the full job description
- Health insurance
- 401(k) matching
- Paid time off
- Vision insurance
- Dental insurance
- Life insurance
- Paid holidays, * Health insurance
- Dental insurance
- Vision insurance
- Life Insurance
- 401(k) Retirement Plan with matching
- Paid Time Off
- Paid Federal Holidays