Backend Infrastructure & Agentic AI Platforms Engineer
Role details
Job location
Tech stack
Job description
Blu Omega is seeking a Backend Infrastructure & Agentic AI Platforms Engineer to support a federal program focused on next-generation autonomous AI capabilities. This role operates within a remote environment and is responsible for building and maintaining backend infrastructure and agentic AI systems supporting ARPA-H's mission. The position requires experience supporting backend infrastructure, AI platform engineering, and distributed systems within a mission-driven environment., * Own the end-to-end backend infrastructure for GRACE on Microsoft Azure, including Azure Functions, API Management, Container Apps, and Azure OpenAI Service
- Manage data storage, retrieval pipelines, vector databases, and document indexing for internal knowledge search
- Implement and maintain infrastructure as code for all environments
- Develop and manage CI/CD pipelines, deployment automation, and release processes
- Ensure monitoring, alerting, logging, distributed tracing, and incident response are in place for all systems
- Manage secrets, API keys, and credential rotation
- Track and optimize cost and token economics across LLM providers
- Develop and maintain backend implementation of MCP, including server hosting, tool registration, and versioning
- Design and evolve communication patterns for agent interoperability and external integrations
- Build and operate RAG pipelines for document ingestion, embedding, and semantic search
- Implement fallback, retry, and degradation patterns for AI service dependencies
- Manage tool-calling infrastructure, including registration, execution, and observability
- Build and maintain observability for agent workflows, including latency, throughput, and error rates
- Implement evaluation pipelines for safety, regression, and grounding assessment
- Define and enforce system-level SLOs and alerting procedures
- Establish and improve coding standards, design reviews, and testing practices
- Mentor team members and communicate technical decisions clearly
- Ensure privacy, security, and compliance in all systems and data handling Required
Requirements
- 7+ years of professional software engineering experience building and operating production systems
- Proven experience in high-velocity environments with end-to-end product ownership
- Strong proficiency in Python and at least one other backend language
- Experience with distributed systems, APIs, data pipelines, and software design patterns
- Hands-on experience with Microsoft Azure: Azure Functions, API Management, Container Apps, and Azure OpenAI Service
- Experience with containerization, CI/CD, and infrastructure as code
- Knowledge of authentication and identity systems (OAuth2, OIDC, Azure Entra ID)
- Demonstrated ability to own production systems, including on-call support and incident debugging, * Experience with AI/LLM platform engineering and orchestration
- Familiarity w:ith vector search, RAG pipelines, and semantic search infrastructure
- Strong understanding of security, privacy, and compliance standards
Benefits & conditions
- Salary Range: $100,000 - $180,000
Final compensation is based on technical skills, experience, and education.
Blu Omega Benefits & Perks:
- Medical, Dental, and Vision coverage through national providers
- 401(k) with company match (eligible after 6 months; vesting applies)
- Company-paid Life and AD&D insurance with additional voluntary options
- Short-term and long-term disability options
- Employee Assistance Program (EAP) with 24/7 confidential support and mental health resources
- Telehealth and virtual care options
- Pet insurance, legal services, and identity theft protection options
- Paid Time Off (PTO) and 11 paid federal holidays
- Access to wellness programs, discounts, and lifestyle benefits