Data Engineer

Bespoke Technologies, Inc.
McLean, United States of America
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English

Job location

McLean, United States of America

Tech stack

Testing (Software)
Java
JavaScript
Geographic Information Systems
Agile Methodologies
Artificial Intelligence
Airflow
Amazon Web Services (AWS)
Amazon Web Services (AWS)
Apache HTTP Server
Applications Architecture
Automation of Tests
Bash
Big Data
Cloud Computing
Software Quality
Computer Programming
Databases
System Configuration
Continuous Integration
Information Engineering
Data Governance
Data Infrastructure
Data Integration
ETL
Data Security
Data Visualization
Software Debugging
Software Design Patterns
Linux
DevOps
Distributed Computing Environment
Amazon DynamoDB
Issue Tracking Systems
PostgreSQL
Project Management Software
Metadata Repositories
Microsoft SQL Server
MySQL
NoSQL
NumPy
Open Source Technology
Operational Databases
Performance Tuning
PostGIS
Scrum
Software Maintenance
Systems Development Life Cycle
Query Optimization
Cloud Services
DataOps
Azure
Web Application Security
Selenium
Server Administration
Shell Script
Software Engineering
Software Systems
SQL Databases
Strategies of Testing
TypeScript
Speech Recognition
Web Applications
Web Services
Software Repository
Data Processing
Scripting (Bash/Python/Go/Ruby)
Cloud Platform System
React
Retrieval-Augmented Generation
Large Language Models
Spark
Topic Modeling
GIT
Cloudformation
Vue.js
Pandas
Containerization
Angular
PySpark
Data Lineage
Data Lakehouse
Terraform
Software Version Control
Data Pipelines
Docker
Vulnerability Analysis

Job description

The Sponsor requires technical resources that can work in a quick-paced, dynamic, agile software development environment. The multi-disciplinary project team (including a Data Engineer, a DevOps Engineer, a Web Applications Developer, and Software Test Engineers) works together on multiple projects that include automating processing of large forensic images, extracting and enriching metadata, and displaying resulting information in meaningful ways for analysts to conduct assessments. Team members utilize a mix of COTS and GOTS tools and technologies; as well as build integrations with a variety of external partner applications. Most solutions are cloud-based.

The Sponsor adheres to Agile Scrum development methodology best practices and has 2-week sprint cycles.

Technical Requirements The Candidate team shall:

  1. Ensure that all development and modifications to existing Sponsor applications comply with Sponsor's security and architectural policies and regulations.
  2. Work with a variety of individuals, including key stakeholders and other development teams in adjacent organizations. However, the Sponsor Project Manager (PM) will manage priorities.
  3. Communicate technical concepts to non-technical audiences.
  4. Be responsible for identifying, documenting, and communicating risks and mitigations across the designed system.
  5. Review the existing and designed system to identify gaps and minimize overlaps, analyze project needs for alternative solutions, and identify needed layers and modules and how they work together for the system.
  6. Review technical implementation strategies and plans, identifies inconsistencies and opportunities for improvement and communicates those results to the Sponsor PM.
  7. Document and maintain code and workflows such as version control systems/code repositories, task management tools, and open source-style contribution models and issue tracking.
  8. Manage, with Sponsor oversight, the developed systems' lifecycle (to include operating system upgrades, updates, patches, security scans, and configuration changes) and other duties requiring in-depth knowledge of server hardware and software technologies.
  9. Be responsible for setting user permissions (including roles) and troubleshooting permission issues.
  10. Provide operations and maintenance of applications in the cloud infrastructure.
  11. Support Sponsor's representatives in cloud environment provisioning, engineering, and architecture related activities as applicable to applications.
  12. With Sponsor concurrence, implement changes to current applications and data pipelines based on Sponsor requirements.
  13. Provide support to manage the Sponsor Software Development Lifecycle (SDLC) process.
  14. Continuously evaluate progress and evaluate incoming information to ensure the schedules are met.
  15. Assist the Sponsor's product owner with the assessment of program requirements.
  16. Assist the Sponsor's product owner with mitigating or avoiding risks.
  17. Use a Sponsor-approved tracking system for issues and problems.
  18. Assist the Sponsor's product owner in alleviating program issues.
  19. Interface with stakeholders to ensure user stories are recorded.
  20. Coordinate stakeholder engagements, such as user requirements sessions and training.
  21. Prepare meeting agendas and meeting minutes.
  22. Track work requirements and action items to meet program deadlines.
  23. Support tasks requiring collecting, compiling, evaluating, and publishing information and statistical data included in documents, records, forms, reports, plans, policies, and regulations.
  24. Deliver development requirements as requested by the Sponsor.
  25. Provide applications development and programming support to develop software to implement system-level requirements.
  26. Maintain and enhance software solutions based on changing partner requirements, legislation, and policy.
  27. Stay current with emerging technologies and industry best practices.
  28. Conduct functionality testing for existing applications.
  29. Create and maintain an automated test environment in the Sponsor's AWS environment using technologies such as Selenium or comparable testing tools.
  30. Assist in the technical documentation of and design for meeting security requirements, ensuring compliance with Sponsor A&A processes.
  31. Develop applications and modifications to existing and new Sponsor applications in compliance with the Sponsor's architectural guidelines and Authorization and Accreditation (A&A) process.

Critical Core Competencies The Candidate shall provide the following Critical Core Competencies and ensure they are not single-threaded by an individual contributor for the duration of this contract. These competencies represent specialized skills, experience and institutional knowledge deemed vital to executing contract requirements., o Deploys, operates and maintains web services within the Sponsor's AWS cloud environment (and potentially other cloud service providers in the future).

Requirements

o Demonstrated experience with infrastructure as code (IaC) technologies including AWS Cloud Formation.

  • Web Application Programming o Strong problem-solving skills including debugging, testing, and troubleshooting complex web applications; Develops modern web applications using JavaScript/TypeScript frameworks (Angular, React, or Vue); Strong understanding of web application security best practices and usability principles.

  • Scripting o Shell scripting such as Bash.

  • Database Technologies o Using SQL and database technologies such as MySQL, DynamoDB, or SQL server.

  • Operating Systems o Experienced with Linux operating systems.

Required Skills and Demonstrated Experience

  • The Candidate shall ensure, for the duration of the contract, that candidate personnel assigned to work under this contract maintain institutional knowledge and competency level necessary for all required skills, to include demonstrated on-the-job experience.

  • The Candidate team shall possess and provide the following required skills and demonstrated experience: o Demonstrated experience with Agile/Scrum development methodologies in a fast-paced, collaborative team environment. o Demonstrated experience working effectively in high-performing, cross-functional teams with multiple concurrent projects. o Demonstrated experience working directly with stakeholders to gather requirements, understand needs, and translate them into technical solutions with minimal oversight. o Demonstrated experience in self-directed work with a strong ownership mentality and commitment to code quality, testing, and documentation. o Demonstrated experience context-switching between projects and systems as priorities demand.

  • The Candidate shall possess and provide the following required skills and demonstrated experience:

Data Engineering

  • Demonstrated experience building production data pipelines and ETL/ELT workflows at scale.
  • Demonstrated experience with Apache Spark and PySpark for distributed data processing.
  • Demonstrated experience with advanced Python programming skills including data manipulation libraries (Pandas, NumPy) and data engineering best practices.
  • Demonstrated experience understanding data security, privacy, governance, and compliance principles.
  • Demonstrated experience with workflow orchestration tools such as Step Functions and Airflow.
  • Demonstrated experience with containerization such as Docker or Podman, and deploying data applications in cloud environments.
  • Demonstrated experience with AWS services (in particular S3, Lambda, and Step Functions).
  • Demonstrated experience with PostgreSQL and MySQL in production environments, including performance tuning and schema design.
  • Demonstrated experience with SQL and query optimization for complex analytical workloads.
  • Demonstrated experience with version control (Git) and CI/CD practices for data pipelines.
  • Demonstrated experience working with stakeholders to understand data requirements, assess feasibility, and design appropriate solutions with minimal oversight.
  • Demonstrated experience with strong problem-solving and debugging skills for data quality issues, pipeline failures, and performance bottlenecks.

Desired Skills and Demonstrated Experience

Data Engineering

  • Demonstrated experience with data lakehouse architectures using Apache Iceberg.

  • Demonstrated experience configuring, deploying, and integrating data platform components: o Apache Ranger (access control and data governance) o Trino (distributed SQL query engine) o Data catalogs (Unity Catalog OSS, Apache Polaris, etc.) o Apache Superset (data visualization and dashboarding)

  • Demonstrated experience with Bash scripting for automation and data processing tasks.

  • Demonstrated experience with Infrastructure as Code (Terraform or CloudFormation) for data infrastructure.

  • Demonstrated experience with tracking data lineage and associated tooling such as OpenLineage.

  • Demonstrated experience with Java.

  • Demonstrated experience with data quality frameworks, testing methodologies, and validation strategies.

  • Demonstrated experience or background with large-scale data migrations or platform modernization efforts.

  • Demonstrated experience integrating AI/ML services and models (translation, OCR, speech-to-text, NLP, language detection, topic modeling), LLMs, and RAG (retrieval-augmented generation) pipelines.

  • Demonstrated experience with geospatial data processing (H3, PostGIS, or similar).

  • Demonstrated experience Contributing to data engineering documentation, best practices, or design patterns.

  • Demonstrated experience with NoSQL databases (DynamoDB, etc.).

  • Demonstrated experience with excellent written and verbal communication skills with both technical and non-technical audiences.

Apply for this position