Sr. Data Engineer - MUST BE US CITIZEN
SHARKFORCE CONSULTING LLC
2 days ago
Role details
Contract type
Permanent contract Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
English Experience level
Senior Compensation
$ 198KJob location
Tech stack
Artificial Intelligence
Amazon Web Services (AWS)
Amazon Web Services (AWS)
Confluence
JIRA
Unit Testing
Information Engineering
ETL
Data Systems
Python
Machine Learning
Scrum
Software Engineering
SQL Databases
Unstructured Data
Data Processing
Feature Engineering
Spark
AWS Lambda
Information Technology
Spark Streaming
Data Pipelines
Databricks
Job description
We are seeking a Senior Data Engineer who is a U.S. citizen and passionate about building innovative data solutions. In this role, you will design and implement scalable data pipelines and AI/ML capabilities to improve entity resolution, enhance probabilistic matching, reduce duplicate records, and strengthen data quality across PCIS systems.
You will work closely with cross-functional teams to understand requirements, build reliable data solutions, and use technologies such as Databricks, Python, SQL and AWS.
Duties
- Design, build, and maintain scalable data pipelines to ingest, process, and transform large volumes of structured and unstructured data across PCIS systems
- Develop and optimize ETL/ELT workflows using Databricks, Python, SQL and AWS services (e.g., S3, Lambda)
- Ensure high data quality, consistency, and reliability through validation, monitoring, and automated data checks
- Support the design and implementation of AI/ML solutions for entity resolution, probabilistic matching, and de-duplication
- Develop and integrate data features and pipelines that enable accurate identity matching across multiple data sources
- Collaborate with data scientists and architects to operationalize ML models into production environments
- Continuously improve matching accuracy and reduce false positives/negatives through data tuning and feedback loops
- Enhance and maintain Python-based data processing scripts to ensure performance, reliability, and scalability
- Identify and resolve data bottlenecks, optimizing performance and cost through tuning and automation
- Provide day-to-day support for data and ML pipeline operations, troubleshooting issues and ensuring system stability
- Support leadership and stakeholders by communicating technical concepts and results clearly to both technical and non-technical audiences
Requirements
- U.S. citizenship required and must be eligible to obtain a Public Trust Clearance.
- Bachelor's degree in computer science or related field.
- Minimum 5 years of experience in software engineering with emphasis on Data Engineering.
- Minimum 7 years of experience in Software Development focused on data.
- Ability to support AI/ML teams by enhancing feature engineering code.
- Skilled in creating, managing, and optimizing Spark Structured Streaming jobs.
- Experience maintaining and updating Python-based data processing scripts executed on AWS Lambdas.
- Commitment to conducting unit tests for all Spark, Python data processing, and Lambda code.
- Strong understanding of Agile Scrum methodology and related tools (e.g., Jira, Confluence).
Benefits & conditions
- 401(k)
- 401(k) matching
- Dental insurance
- Employee assistance program
- Employee discount
- Flexible schedule
- Flexible spending account
- Health insurance
- Health savings account
- Life insurance
- Paid time off
- Parental leave
- Professional development assistance
- Referral program
- Retirement plan
- Tuition reimbursement
- Vision insurance
Application Question(s):
- Are you a U.S. citizen and able to obtain a Public Trust clearance?