Part-Time Data Ingest Engineer (contractor)
Digital Public Library of America, Inc.
Boston, United States of America
7 days ago
Role details
Contract type
Contract Employment type
Part-time (≤ 32 hours) Working hours
Shift work Languages
English Compensation
$ 312KJob location
Boston, United States of America
Tech stack
API
Amazon Web Services (AWS)
Amazon Web Services (AWS)
Amazon Web Services (AWS)
Apache HTTP Server
Continuous Integration
Elasticsearch
Github
JSON
PostgreSQL
Metadata
Metadata Standards
Scala
Scripting (Bash/Python/Go/Ruby)
Delivery Pipeline
Spark
Electronic Medical Records
Backend
Avro
REST
Terraform
Docker
Job description
DPLA is looking for a part-time contractor to coordinate and maintain metadata ingest operations. This position is directly involved in maintaining DPLA's ingestion process of harvesting, mapping, enriching, and indexing metadata from contributing partners.
What you'll be doing
- Running monthly ingest cycles across active partner contributions (harvesting, mapping, enrichment, indexing)
- Coordinating with DPLA staff on metadata mapping and delivery
- Monitoring pipeline reliability and addressing bottlenecks or single points of failure
- Troubleshooting ingestion errors and coordinating resolution with DPLA staff
- Supporting deployments and maintaining CI/CD pipeline health
- Providing regular status updates to DPLA staff
Technical environment
- Pipeline: Scala, Apache Spark, Amazon EC2 and EMR, AWS S3, Apache Avro, Python scripts
- Metadata: JSON-LD via DPLA MAP
- APIs: Scala-based RESTful API on Elasticsearch 7, PostgreSQL auth backend
- CI/CD: GitHub Actions, Docker, Terraform, AWS CodePipeline, * 10-20 hours/week, flexible scheduling
- $75 - $150 hourly rate (commensurate with experience)
- An initial 3-6 month fixed-term contract, commencing April 1, with the possibility of extension.
- Independent contractor arrangement (W-9/1099)
- Must be legally authorized to work in the United States without company sponsorship
Requirements
- Hands-on experience with Spark/Scala pipelines and AWS (EC2, EMR, S3)
- Familiarity with cultural heritage metadata standards (RDF, JSON-LD, Dublin Core, MODS, or similar) and DAMS (CONTENTdm, etc.)
- Experience working across metadata quality, pipeline ops, and infrastructure
- Familiarity with GitHub-based collaborative workflows
- Self-directed