Senior Software Engineer, Ingestion Team

Pryon Inc.
Boston, United States of America
12 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior
Compensation
$ 200K

Job location

Remote
Boston, United States of America

Tech stack

HTML
Artificial Intelligence
Airflow
Amazon Web Services (AWS)
Systems Engineering
Azure
ETL
Distributed Systems
Python
Message Broker
Microsoft Office
RabbitMQ
Software Engineering
Data Streaming
Google Cloud Platform
Semi-structured Data
Kubernetes
Kafka
Amazon Web Services (AWS)
Terraform
Data Pipelines
Docker
Legacy Systems

Job description

The Ingestion team is responsible for everything that happens between content arriving from a connector and that content being ready for search and retrieval. This means document processing pipelines that handle parsing, text extraction, chunking, metadata enrichment, embedding generation, and index population - across every file format and content type our customers throw at us.

We're in the middle of a significant architectural evolution - migrating from a legacy pipeline to a modern, workflow-orchestrated architecture with cleanly separated processing stages: intake, transformation, enrichment, and indexing. The team is also actively designing the next iteration of the pipeline to push further on throughput and resilience.

This is real systems engineering: the problems are about scale, reliability, and the messy realities of processing millions of documents with wildly different structures., * Design and build pipeline stages for our modern ingestion architecture - from document intake through embedding generation and index writing

  • Contribute to the design of next-generation pipeline architecture as the system evolves
  • Improve system stability and scale: identify bottlenecks, reduce failure rates, and build observability into every stage
  • Work with workflow orchestration tools to manage complex, multi-step document processing with retry logic, error handling, and state management
  • Handle the realities of document diversity: PDFs, HTML, Office formats, images, structured and semi-structured data - all flowing through the same pipeline
  • Collaborate with the Connectors team (upstream) and Retrieval team (downstream) to ensure data flows cleanly across system boundaries
  • Participate in the ongoing migration from legacy systems, balancing new development with operational stability

Requirements

  • Is self-driven and comfortable operating with autonomy inside a structured team
  • Gets energized by architectural challenges, not just feature work
  • Has the patience and discipline to improve existing systems while building new ones
  • Understands that pipeline engineering is about handling the 10,000 edge cases, not just the happy path
  • Is motivated by the mission: building the processing backbone that makes enterprise AI accurate and reliable
  • Communicates well in a remote-first environment and collaborates naturally across team boundaries, * 5+ years of software engineering experience, with meaningful time on data processing pipelines, ETL systems, or similar infrastructure
  • Strong proficiency in Python and/or Go
  • Experience with workflow orchestration tools - Temporal, Airflow, Prefect, Step Functions, or similar
  • Understanding of distributed systems patterns: queues, workers, backpressure, idempotency, retry strategies
  • Hands-on experience with Kubernetes, Docker, Terraform, and Helm
  • Familiarity with message brokers and event streaming (Kafka, RabbitMQ, SQS, or similar)
  • Comfort working across cloud providers (AWS, Azure, GCP)

Benefits & conditions

401(k), Health insurance, Vision insurance, Dental insurance, Life insurance, Unlimited paid time off, Disability insurance, * Remote first organization

  • 100% Company paid Health/Dental/Vision benefits for you and your dependents
  • Life Insurance, Short-term and Long-term Disability
  • 401k
  • Unlimited PTO

Apply for this position