Cloud Operations Engineer
Role details
Job location
Tech stack
Job description
Network Security Docker (Software) Incident Response Disaster Recovery Amazon CloudWatch Cloud Technologies AWS CloudFormation Amazon Web Services Business Continuity Enterprise Security Cloud Infrastructure Serverless Computing Prometheus (Software) Information Technology Digital Transformation Hybrid Cloud Computing Public Trust Clearance Artificial Intelligence Large Language Modeling Application Development Google Cloud Platform (GCP) Infrastructure as Code (IaC) Cloud Computing Architecture Microsoft Azure Expressroute Peering (Computer Networking) Google Kubernetes Engine (GKE) Retrieval Augmented Generation Generative Artificial Intelligence Application Programming Interface (API) Payment Card Industry (PCI) Data Security Standards Microsoft Certified: Azure Solutions Architect Expert Health Insurance Portability And Accountability Act (HIPAA) Compliance, * Drive the enterprise technology vision by defining long-term strategic blueprints and engineering standards supporting the implementation of cloud technologies like: Generative AI applications and production-ready RAG pipelines, Scalable vector database indexing, clustering, and retrieval pipelines to optimize vector search, Scalable GPU/TPU compute clusters, managed container services, and automated auto-scaling for LLM workloads, Private cloud networks, secure API gateways, and dedicated endpoints to isolate AI traffic from the public internet, Applications leveraging modern microservices and APIs.
- Design and implement scalable, secure, and resilient hybrid cloud architectures; using cloud-agnostic tools and processes.
- Architect and manage core cloud networking components (VPCs, VNets, subnets, transit gateways/peering, firewalls, private links, Direct Connect/ExpressRoute, etc.).
- Implement Infrastructure as Code (IaC) using tools such as Terraform, CloudFormation, Bicep, ARM, and Ansible to automate provisioning and configuration management.
- Implement comprehensive cloud security controls, compliance frameworks (SOC 2, ISO 27001, HIPAA, PCI-DSS, CMMC, etc.), encryption, security groups, WAF, and monitoring for threats and misconfigurations.
- Develop and implement processes to optimize cost and utilization through resource tagging, rightsizing, reserved instances/savings plans, and continuous cost analysis.
- Set up and maintain monitoring, logging, and observability solutions using tools like CloudWatch, Azure Monitor, Google Operations, Datadog, Prometheus/Grafana, or ELK stack.
- Support disaster recovery, high availability, and business continuity strategies, including backup/restore, failover architectures, and regular DR testing.
- Collaborate with Network, Security, Application Development, and Operations teams on cloud-related projects, incident response, and architecture reviews., All qualified applicants will receive consideration for employment without regard to sex, race, ethnicity, age, national origin, citizenship, religion, physical or mental disability, medical condition, genetic information, pregnancy, family structure, marital status, ancestry, domestic partner status, sexual orientation, gender identity or expression, veteran or military status, or any other basis prohibited by law. Leidos will also consider for employment qualified applicants with criminal histories consistent with relevant laws. Related Jobs Journeyman Cloud Operations Engineer Leidos Alexandria, VAOn-Site DevOps Equities Scripting Terraform DevSecOps Pipelines Leadership Automation Kubernetes CompTIA A+ Scalability Market Data NIST 800-53 Oracle Cloud Communication Data Analysis Private Cloud Cyber Security Microsoft Azure Problem Solving Sprint Planning Ancient History Computer Science Asset Management Machine Learning CompTIA Network+ Cloud Operations Product Planning Docker (Software) Incident Response CompTIA Security+ System Monitoring AWS CloudFormation Amazon Web Services Information Systems Cloud Infrastructure Top Secret Clearance System Administration Information Technology Application Deployment Artificial Intelligence Configuration Management Bash (Scripting Language) Google Cloud Platform (GCP) Infrastructure as Code (IaC) Python (Programming Language) Continuous Improvement Process AWS Certified Cloud Practitioner Troubleshooting (Problem Solving) Secret Internet Protocol Router Network (SIPRNet) Top Secret-Sensitive Compartmented Information (TS/SCI Clearance) +0 SME Cloud Operations Engineer Leidos Alexandria, VAOn-Site Planning Equities Terraform DevSecOps Pipelines Operations Leadership Automation Resilience Kubernetes, Alexandria, VA*On-Site Planning Equities Terraform DevSecOps Pipelines Operations Leadership Automation Kubernetes Scalability Reliability Market Data NIST 800-53 Oracle Cloud Communication Data Analysis Private Cloud Cyber Security Microsoft Azure Problem Solving Ancient History Computer Science Asset Management Machine Learning Data Engineering Cloud Operations Product Planning Agile Methodology Docker (Software) Incident Response CompTIA Security+ Disaster Recovery AWS CloudFormation Deployment Support Amazon Web Services Information Systems GIAC Certifications Software Development Cloud Infrastructure System Administration Information Technology Artificial Intelligence Configuration Management Google Cloud Platform (GCP) Infrastructure as Code (IaC) Zero Trust Architecture (ZTA) Continuous Improvement Process Troubleshooting (Problem Solving) AWS Certified SysOps Administrator Systems Security Certified Practitioner GIAC Security Essentials Certification (GSEC) Counter Intelligence Polygraph (CI Clearance) Secret Internet Protocol Router Network (SIPRNet) Top Secret-Sensitive Compartmented Information (TS/SCI Clearance) +0
Requirements
Ansible Grafana FedRAMP Firewall Equities Failover PineCone Weaviate Terraform Pipelines Operations Automation Innovation Resilience Kubernetes Encryption Subnetwork Scalability Market Data Multi-Cloud Autoscaling Communication Microservices Observability Private Cloud ISO/IEC 27001 Azure Monitor Cloud Security Microsoft Azure Vector Database Ancient History Managed Services Computer Science, * Bachelor's degree in Computer Science, Information Technology, Engineering, OR a related field and 12 + years of relevant experience OR Masters degree with 10 + years of relevant experience . Additional years of experience will be considered/accepted in lieu of a degree.
- 12 + years of progressive experience in cloud architecture, engineering, and operations, with at least 4 years in a senior or lead Cloud Architect role.
- Strong experience designing and implementing hybrid/multi-cloud environments (AWS, Azure, Google Cloud) at enterprise scale.
- Proven expertise with Generative AI infrastructure, including: Production-ready Retrieval-Augmented Generation (RAG) pipelines, Scalable vector databases (e.g., Pinecone, Weaviate, Milvus, FAISS, or managed services) with indexing, clustering, and optimized retrieval, GPU/TPU compute clusters, managed container orchestration (Kubernetes/EKS/AKS/GKE), and auto-scaling for large language model (LLM) workloads.
- Deep hands-on experience with Infrastructure as Code (IaC) tools: Terraform (preferred), AWS CloudFormation, Azure Bicep/ARM, and Ansible.
- Strong background in cloud networking: (VPCs, VNets, subnets, transit gateways, peering, private links, Direct Connect/ExpressRoute, Firewalls, security groups, WAF, and private API gateways).
- Demonstrated experience implementing enterprise security and compliance frameworks (SOC 2, ISO 27001, HIPAA, PCI-DSS, CMMC, FedRAMP, etc.), including encryption, threat monitoring, and zero-trust architectures.
- Experience with observability and monitoring stacks (CloudWatch, Azure Monitor, Google Operations, Datadog, Prometheus/Grafana, ELK/EFK stack).
- Proven track record in cost optimization (tagging, rightsizing, Reserved Instances, Savings Plans) and disaster recovery/business continuity planning.
- Excellent collaboration and communication skills; ability to work with cross-functional teams (Network, Security, Dev, Ops) in a fast-paced environment.
- U.S. Citizenship required.
- Ability to obtain and maintain a Public Trust security clearance., * Experience leading cloud adoption and defining long-term strategic technology roadmaps.
- Hands-on work with microservices architectures, modern APIs, and secure private cloud networking for AI workloads.
- Familiarity with containerization (Docker, Kubernetes) and serverless technologies.
Desired Certifications (one or more preferred; multiple strongly encouraged):
- Microsoft Certified: Azure Solutions Architect Expert.
- HashiCorp Certified: Terraform Associate.
- Certified Kubernetes Administrator (CKA) or Certified Kubernetes Application Developer (CKAD).
- Azure Security Engineer Associate.
Please Note:The program budget salary for this role could fall anywhere between mid $150,000 to low/mid $170,000 with a slight wiggle room (no guarantees) based on relevant experience and assessment. Additional factors considered in extending an offer include (but are not limited to) responsibilities of the job, education, experience, knowledge, skills, and abilities, as well as internal equity, alignment with market data, applicable bargaining agreement (if any), or other law., Scalability Reliability Market Data NIST 800-53 Oracle Cloud Communication Data Analysis Private Cloud Cyber Security Microsoft Azure Problem Solving System Recovery Ancient History Computer Science Asset Management Machine Learning Data Engineering Cloud Operations Product Planning Agile Methodology Docker (Software) Incident Response Disaster Recovery AWS CloudFormation Deployment Support Root Cause Analysis Amazon Web Services Integration Testing Information Systems GIAC Certifications Software Engineering Cloud Infrastructure System Administration Cloud-Native Computing Artificial Intelligence Configuration Management Google Cloud Platform (GCP) Infrastructure as Code (IaC) Cloud Computing Architecture Zero Trust Architecture (ZTA) Continuous Improvement Process Enterprise Application Software Software Development Life Cycle Troubleshooting (Problem Solving) GIAC Certified Enterprise Defender (GCED) Counter Intelligence Polygraph (CI Clearance) CompTIA Advanced Security Practitioner (CASP+) AWS Certified Solutions Architect Professional CISCO Certified Network Professional - Security Secret Internet Protocol Router Network (SIPRNet) Top Secret-Sensitive Compartmented Information (TS/SCI Clearance) +0
Google IT Automation with Python
Benefits & conditions
Pay and benefits are fundamental to any career decision. That's why we craft compensation packages that reflect the importance of the work we do for our customers. Employment benefits include competitive compensation, Health and Wellness programs, Income Protection, Paid Leave and Retirement. More details are available at www.leidos.com/careers/pay-benefits .