Senior Software Engineer, Ingestion Team
Role details
Job location
Tech stack
Job description
The Ingestion team is responsible for everything that happens between content arriving from a connector and that content being ready for search and retrieval. This means document processing pipelines that handle parsing, text extraction, chunking, metadata enrichment, embedding generation, and index population - across every file format and content type our customers throw at us.
We're in the middle of a significant architectural evolution - migrating from a legacy pipeline to a modern, workflow-orchestrated architecture with cleanly separated processing stages: intake, transformation, enrichment, and indexing. The team is also actively designing the next iteration of the pipeline to push further on throughput and resilience.
This is real systems engineering: the problems are about scale, reliability, and the messy realities of processing millions of documents with wildly different structures., * Design and build pipeline stages for our modern ingestion architecture - from document intake through embedding generation and index writing
- Contribute to the design of next-generation pipeline architecture as the system evolves
- Improve system stability and scale: identify bottlenecks, reduce failure rates, and build observability into every stage
- Work with workflow orchestration tools to manage complex, multi-step document processing with retry logic, error handling, and state management
- Handle the realities of document diversity: PDFs, HTML, Office formats, images, structured and semi-structured data - all flowing through the same pipeline
- Collaborate with the Connectors team (upstream) and Retrieval team (downstream) to ensure data flows cleanly across system boundaries
- Participate in the ongoing migration from legacy systems, balancing new development with operational stability
Requirements
- Is self-driven and comfortable operating with autonomy inside a structured team
- Gets energized by architectural challenges, not just feature work
- Has the patience and discipline to improve existing systems while building new ones
- Understands that pipeline engineering is about handling the 10,000 edge cases, not just the happy path
- Is motivated by the mission: building the processing backbone that makes enterprise AI accurate and reliable
- Communicates well in a remote-first environment and collaborates naturally across team boundaries, * 5+ years of software engineering experience, with meaningful time on data processing pipelines, ETL systems, or similar infrastructure
- Strong proficiency in Python and/or Go
- Experience with workflow orchestration tools - Temporal, Airflow, Prefect, Step Functions, or similar
- Understanding of distributed systems patterns: queues, workers, backpressure, idempotency, retry strategies
- Hands-on experience with Kubernetes, Docker, Terraform, and Helm
- Familiarity with message brokers and event streaming (Kafka, RabbitMQ, SQS, or similar)
- Comfort working across cloud providers (AWS, Azure, GCP)
Benefits & conditions
401(k), Health insurance, Vision insurance, Dental insurance, Life insurance, Unlimited paid time off, Disability insurance, * Remote first organization
- 100% Company paid Health/Dental/Vision benefits for you and your dependents
- Life Insurance, Short-term and Long-term Disability
- 401k
- Unlimited PTO