Sr. Data Engineer - MUST BE US CITIZEN

SHARKFORCE CONSULTING LLC

2 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Compensation

$ 198K

Job location

Tech stack

Artificial Intelligence

Amazon Web Services (AWS)

Confluence

JIRA

Unit Testing

Information Engineering

ETL

Data Systems

Python

Machine Learning

Scrum

Software Engineering

SQL Databases

Unstructured Data

Data Processing

Feature Engineering

Spark

AWS Lambda

Information Technology

Spark Streaming

Data Pipelines

Databricks

Job description

We are seeking a Senior Data Engineer who is a U.S. citizen and passionate about building innovative data solutions. In this role, you will design and implement scalable data pipelines and AI/ML capabilities to improve entity resolution, enhance probabilistic matching, reduce duplicate records, and strengthen data quality across PCIS systems.

You will work closely with cross-functional teams to understand requirements, build reliable data solutions, and use technologies such as Databricks, Python, SQL and AWS.

Duties

Design, build, and maintain scalable data pipelines to ingest, process, and transform large volumes of structured and unstructured data across PCIS systems
Develop and optimize ETL/ELT workflows using Databricks, Python, SQL and AWS services (e.g., S3, Lambda)
Ensure high data quality, consistency, and reliability through validation, monitoring, and automated data checks
Support the design and implementation of AI/ML solutions for entity resolution, probabilistic matching, and de-duplication
Develop and integrate data features and pipelines that enable accurate identity matching across multiple data sources
Collaborate with data scientists and architects to operationalize ML models into production environments
Continuously improve matching accuracy and reduce false positives/negatives through data tuning and feedback loops
Enhance and maintain Python-based data processing scripts to ensure performance, reliability, and scalability
Identify and resolve data bottlenecks, optimizing performance and cost through tuning and automation
Provide day-to-day support for data and ML pipeline operations, troubleshooting issues and ensuring system stability
Support leadership and stakeholders by communicating technical concepts and results clearly to both technical and non-technical audiences

Requirements

U.S. citizenship required and must be eligible to obtain a Public Trust Clearance.
Bachelor's degree in computer science or related field.
Minimum 5 years of experience in software engineering with emphasis on Data Engineering.
Minimum 7 years of experience in Software Development focused on data.
Ability to support AI/ML teams by enhancing feature engineering code.
Skilled in creating, managing, and optimizing Spark Structured Streaming jobs.
Experience maintaining and updating Python-based data processing scripts executed on AWS Lambdas.
Commitment to conducting unit tests for all Spark, Python data processing, and Lambda code.
Strong understanding of Agile Scrum methodology and related tools (e.g., Jira, Confluence).