DevOps Engineer
Role details
Job location
Tech stack
Requirements
- A senior engineer who can own production systems end-to-end\n
- Strong problem-solver with the ability to debug complex, non-deterministic AI systems\n
- Comfortable working in a rapidly evolving GenAI and agentic architecture\n
- Pragmatic mindset balancing performance, cost, and reliability\n
- High ownership and ability to work independently\n
Benefits & conditions
Cellebrites (Nasdaq: CLBT) mission is to enable its global customers to protect and save lives by enhancing digital investigations and intelligence gathering to accelerate justice in communities around the world. Cellebrites AI-powered Digital Investigation Platform enables customers to lawfully access, collect, analyze and share digital evidence in legally sanctioned investigations while preserving data privacy. Thousands of public safety organizations, intelligence agencies and businesses rely on Cellebrites digital forensic and investigative solutionsavailable via cloud, on-premises and hybrid deploymentsto close cases faster and safeguard communities.To learn more, visit us at www.cellebrite.com,https://investors.cellebrite.com/investors and find us on social media @Cellebrite.\n \n \nAbout the Role\n \n We are building a rapidly scaling \nGenAI-powered SaaS platform that enables investigators to interact with complex case data through a conversational AI interface. Our system leverages \nRAG architecture and agentic GenAI workflows to deliver advanced AI capabilities in production.\n \n We are looking for a \nSenior DevOps / Cloud Engineer to own our application services, cloud infrastructure, deployment pipelines, and production reliability in this dynamic AI environment.\n \n This is a hands-on role focused on \nserverless architecture, \nLLM-based systems, and \nagentic workflows, working closely with Engineering and Customer Success to ensure the platform is reliable, scalable, and cost-efficient.\n \n \nKey Responsibilities\n \n \n \n
- Own and manage application services running on GCP infrastructure, including serverless and managed services\n
- Design and maintain robust CI/CD pipelines for rapid, safe deployments\n
- Operate and optimize GenAI/LLM workloads in production, including RAG pipelines and agentic workflows\n
- Monitor and improve latency, cost, and reliability of AI-driven systems\n
- Troubleshoot complex production issues across application, data, and infrastructure layers\n
- Work with and optimize BigQuery-based data workflows, queries, and performance\n
- Support and debug multi-step AI pipelines and agent orchestration flows\n
- Implement and maintain observability (logging, metrics, tracing, alerting), including for AI pipelines\n
- Collaborate with engineering teams on architecture improvements for evolving GenAI systems\n
- Partner with Customer Success to investigate and resolve customer-impacting issues (minimal direct customer interaction)\n
- Enforce security and best practices in a sensitive data environment\n