Apache Spark Developer
Role details
Job location
Tech stack
Job description
We are actively hiring a TS/SCI-cleared Apache Spark Developer to support NGA's Data Modernization Services (DMS) mission by building and optimizing large-scale data processing pipelines. This role focuses on developing high-performance Spark applications within a containerized, Kubernetes-based environment, supporting mission analytics, data exploitation, and AI/ML integration. The ideal candidate thrives in distributed data environments, understands performance tuning deeply, and can operate effectively in secure, air-gapped systems.
This role is on-site/flexible hours in Herndon, VA; Springfield, VA; St. Louis, MO; or Aurora, CO.
Clearance Required for this role: TS/SCI eligibility with willingness/ability to obtain CI polygraph.
Core Technology Stack
Data / Processing
- Apache Spark (PySpark, Scala)
- Delta Lake, Parquet
- Structured Streaming
Infrastructure
- Kubernetes (execution environment)
- Docker
Storage / Cloud (Abstracted)
- S3 / object storage
- AWS / GCP / Azure (environment-dependent)
DevOps (Exposure Level)
- Git, Jenkins (CI/CD)
Languages
- Python (PySpark)
- Scala (preferred)
- Bash / scripting, * Design, develop, and maintain Apache Spark pipelines (batch and streaming) using PySpark and/or Scala
- Process and transform large-scale datasets using modern data lake architectures (Delta Lake, Parquet)
- Optimize Spark jobs for performance, including:
o Partitioning strategies
o Shuffle optimization
o Memory tuning
o File sizing and storage efficiency
- Implement Structured Streaming pipelines for near real-time data processing
- Develop and deploy Spark applications within containerized environments (Docker)
- Execute workloads in Kubernetes clusters, supporting scalable and distributed processing
- Integrate Spark pipelines with downstream systems, including:
o Analytics platforms (SQL, notebooks)
o AI/ML workflows and feature engineering pipelines
- Support data ingestion and storage in object-based systems (e.g., S3-compatible storage)
- Troubleshoot data pipeline failures and ensure reliability in mission-critical environments
- Operate within secure, air-gapped environments, including
Requirements
- TS/SCI (eligibility) with ability/willingness to obtain/maintain counterintelligence polygraph
- Bachelor's degree plus 5 years' experience in data engineering or Spark development (will entertain additional years' experience in lieu of degree)
- Strong hands-on experience with:
o Apache Spark (mandatory)
o Python (PySpark)
o Data processing at scale
- Experience working with:
o Parquet and/or Delta Lake
o Distributed data systems
- Familiarity with:
o Docker / containerization
o Kubernetes (basic to intermediate experience)
- Experience with object storage systems (e.g., S3 or equivalent)
- Strong troubleshooting and performance tuning skills
- Proficiency in Bash or scripting
Preferred Qualifications:
- Experience with Scala for Spark development
- Experience with Structured Streaming in production environments
- Familiarity with Iceberg or lakehouse architectures
- Experience with CI/CD pipelines (Jenkins, Git)
- Exposure to Terraform or Infrastructure as Code
- Experience supporting AI/ML data pipelines
- Prior experience supporting NGA, IC, or DoD programs
Benefits & conditions
Some of our benefits include:
- Generous PTO plus 11 Federal Holidays
- Retirement Planning - 401k Fully Vested with Match
- Tuition Assistance Program - Annual contributions to help you pay down your loans
- Annual Health and Wellness Allowance - buy an Apple Watch, a treadmill, or hit the gym on us
- Career Development - Annual Funds to spend on Education and Training
- Volunteer Time Off - Annually, all employees can spend 8 hours directly supporting a charity of choice
- Charitable Match - ABSC matches an employee's donation to a qualifying charity
- Referral Program - We pay for internal and external referrals!
- LOV Awards - Earn bonus awards throughout the year from our Living Our Values awards program