Data & Software Engineer

GRVTY, LLC

McLean, United States of America

27 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Remote

McLean, United States of America

Tech stack

Java

Airflow

Amazon Web Services (AWS)

Apache HTTP Server

Bash

Big Data

Computer Programming

System Configuration

Information Engineering

ETL

Data Security

Software Debugging

Software Design Patterns

Amazon DynamoDB

Python

PostgreSQL

Metadata Repositories

MySQL

NoSQL

NumPy

Operational Databases

Performance Tuning

PostGIS

Query Optimization

Azure

Software Deployment

SQL Databases

Data Streaming

Systems Integration

Data Processing

Cloud Platform System

Spark

GIT

Cloudformation

Pandas

Containerization

PySpark

Data Lineage

Terraform

Data Pipelines

Docker

Job description

Work with stakeholders to understand data requirements, assess feasibility, and design appropriate solutions with minimal oversight
Leverage strong problem-solving and debugging skills for data quality issues, pipeline failures, and performance bottlenecks
Leverage a background in large-scale data migration or platform modernization efforts
Contribute to data engineering documentation, best practices, and design patterns.

Requirements

GRVTY is seeking a Data & Software Engineer with a TS/SCI + Poly clearance (applicable to this customer) to join one of our top projects in McLean, VA.The Data & Software Engineer works with a small team to build complex data flows for a custom application. Successful candidate will have advanced Python programming skills, familiarity with Java, an understanding of data security, privacy, governance and compliance principles and a demonstrated history of building production data pipelines and ETL workflows at scale.

Candidate must have experience:

Building end-to-end data pipelines leveraging Python
Using orchestration tools to deploy data pipelines, including configuring and updating Spark Jobs
Containerizing and deploying applications in cloud environments like AWS.
Working with MySQL and PostgreSQL including performance tuning, schema design, and query optimization for complex, analytical workloads.
Leveraging industry standard tools for code control (Git, IaaC control, etc.)
Working with data catalogs, tracking data lineage and handling a variety of data formats, including Geospatial.
Using Bash scripting for automation and data processing tasks
Integrating Al/ML services and models, * Active TS/SCI with Polygraph Clearance
Minimum of 5 years' experience with:
Apache Spark & PySpark
Advanced Python skills (including Pandas & NumPy)
Docker, Podman
AWS S3, Lambda & Step functions
Apache Iceberg, Airflow, etc.
SQL (with Trino)
NoSQL, DynamoDB
Unity Catalog OSS, Apache Polaris
Apache Superset
Terraform or CloudFormation
OpenLineage
H3, PostGIS

About the company

GRVTY's team provides tactical data engineering solutions. We embed skilled Data Engineers, Data Scientists, and ETL Developers directly into intelligence analyst groups to be their go-to data wranglers. We develop new tools, code, and services to execute data engineering activities. Our engineers work to collect, process, and feed analytic tools, turning data into intelligence in response to immediate mission needs, with direct impact on real world situations. You will see your work used here on a daily basis, and you'll have the opportunity to support a variety of Sponsor mission organizations and mission partner organizations. This is a time of development and growth on the program, with an increasing number of missions being supported. The work is high impact and important, and the customer moves quickly. The environment is fast-paced, flexible, and open to innovation - you'll have more latitude here in choosing how to achieve results than on many other projects. The customer cares more about what you can do as opposed to your years of experience, and work hours are typically quite flexible - roll up your sleeves, get things done, and no one cares much about the specific hours that you work. The work space itself is also quite nice, and there is an excellent cafeteria! The tech stack on this team is rather huge and includes Python (Pandas, numpy, scipy, scikit-learn, standard libraries, etc.), Python packages that wrap Machine Learning (packages for NLP, Object Detection, etc.), Linux, AWS/C2S, Apache NiFi, Spark, pySpark, Hadoop, Kafka, ElasticSearch, Solr, Kibana, neo4J, MariaDB, Postgres, Docker, Puppet, and many others. Work on this program takes place in McLean, VA and in various field offices throughout Northern VA (we cannot support remote work) and requires a TS/SCI + Polygraph clearance (acceptable to this customer).