AI DevOps and Cloud Infrastructure Engineer

Crowe LLP

Atlanta, United States of America

1 month ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Intermediate

Compensation

$ 148K

Job location

Remote

Atlanta, United States of America

Tech stack

Artificial Intelligence

Amazon Web Services (AWS)

Azure

Bash

Cloud Computing

Cloud Computing Security

Cloud Engineering

Information Systems

Databases

Continuous Delivery

Continuous Integration

Information Engineering

Software Debugging

DevOps

Distributed Systems

Identity and Access Management

Python

Load Testing

Machine Learning

Cloud Services

Prometheus

Azure

Systems Integration

Management of Software Versions

Data Logging

Scripting (Bash/Python/Go/Ruby)

Cloud Platform System

Autoscaling

Large Language Models

Grafana

Kubernetes

Information Technology

Deployment Automation

HuggingFace

Machine Learning Operations

Functional Programming

Cloudwatch

Terraform

Data Pipelines

Docker

Job description

The AI DevOps and Cloud Infrastructure Engineer I (Senior Staff) designs, builds, and operates scalable, secure, and highly automated cloud environments that support the training, deployment, monitoring, and continuous delivery of AI and machine learning systems. This role serves as a subject-matter expert in infrastructure automation, distributed compute orchestration, and cloud platform operations, ensuring AI workloads perform reliably across development, staging, and production environments.

The engineer collaborates closely with AI engineering, MLOps, data engineering, platform, and security teams to define infrastructure requirements, improve observability, and support the performance demands of predictive and generative AI workloads. As a senior staff-level contributor, the role establishes best practices, evaluates emerging cloud and AI infrastructure tooling, and mentors' junior engineers to advance DevOps maturity, reliability, and cost efficiency across the organization.

Architecting and maintaining cloud infrastructure for AI model training, inference services, and distributed compute workloads.
Implementing infrastructure-as-code (IaC) to automate provisioning, configuration, scaling, and lifecycle management of cloud resources.
Designing and operating CI/CD pipelines for automated model training, testing, and deployment of AI-enabled applications.
Optimizing Kubernetes clusters, GPU utilization, and compute scaling strategies to balance performance, reliability, and cost.
Integrating AI models, inference endpoints, and data pipelines into cloud-native platforms.
Developing monitoring, logging, alerting, and observability solutions using modern telemetry and tracing tools.
Troubleshooting issues across networking, containers, compute, storage, and model-serving layers.
Leading performance benchmarking, load testing, and reliability validation for AI systems.
Documenting infrastructure architectures, operational runbooks, and engineering standards.
Supporting automation for dataset ingestion, model versioning, artifact management, and ML testing.
Ensuring compliance with cloud security, identity management, encryption, and responsible AI guidelines.
Partnering with security teams to implement secure networking, IAM policies, and secrets management.
Providing technical mentorship, design reviews, and cloud best-practice guidance to junior engineers.
Evaluating new cloud services, platform capabilities, and AI infrastructure tooling for adoption.

Requirements

4+ years of experience in DevOps, cloud engineering, platform engineering, or infrastructure engineering.
Strong proficiency with Kubernetes, Docker, and cloud orchestration platforms.
Deep experience with CI/CD systems and deployment automation.
Demonstrated ability to debug distributed systems and cloud networking issues.
Proficiency in Python, Bash, or other automation/scripting languages.
Strong communication skills and ability to collaborate across engineering and security teams.
Willingness to travel occasionally for cross-functional planning and collaboration., * Bachelor's degree in Computer Science, Cloud Engineering, Information Systems, or a related technical field, or equivalent experience.
Master's degree in a technical discipline.
Experience enabling ML or AI workloads at scale in production environments.
Cloud and platform certifications, including Azure (AZ-900, AZ-104, AZ-305, AZ-700, AI-102) or equivalent AWS/GCP certifications.
Advanced experience with AWS (e.g., EKS, EC2, IAM, Lambda, SageMaker) and/or Azure (e.g., AKS, VMSS, Azure ML).
Experience with GPU orchestration and scaling strategies for AI workloads.
Expertise with Terraform or other infrastructure-as-code frameworks.
Hands-on experience with observability stacks such as Prometheus, Grafana, CloudWatch, and OpenTelemetry.
Experience deploying and operating generative AI workloads, including LLM inference autoscaling and RAG architectures.
Familiarity with vector database hosting (e.g., Pinecone, Weaviate, FAISS) and model-serving frameworks (e.g., Hugging Face TGI, vLLM, custom inference containers).
Experience building CI/CD pipelines for LLM fine-tuning workflows (e.g., LoRA, QLoRA, PEFT) and monitoring generative AI performance metrics such as latency, throughput, and hallucination rates.

Benefits & conditions

The wage range for this role takes into account the wide range of factors that are considered in making compensation decisions including but not limited to skill sets; experience and training; licensure and certifications; and other business and organizational needs. The disclosed range estimate has not been adjusted for the applicable geographic differential associated with the location at which the position may be filled. At Crowe, it is not typical for an individual to be hired at or near the top of the range for their role and compensation decisions are dependent on the facts and circumstances of each case. A reasonable estimate of the current range is $74,100.00 - $147,800.00 per year.

Our Benefits: Your exceptional people experience starts here. At Crowe, we know that great peopleare what makes a great firm. We care about our people and offer employees a comprehensive total rewards package. Learn more about what working at Crowe can mean for you!

How You Can Grow: We will nurture your talent in an inclusive culture that values diversity. You will have the chance to meet on a consistent basis with your Career Coach that will guide you in your career goals and aspirations. Learn more about where talent can prosper!

About the company

Your Journey at Crowe Starts Here: At Crowe, you can build a meaningful and rewarding career. With real flexibility to balance work with life moments, you're trusted to deliver results and make an impact. We embrace you for who you are, care for your well-being, and nurture your career. Everyone has equitable access to opportunities for career growth and leadership. Over our 80-year history, delivering excellent service through innovation has been a core part of our DNA across our audit, tax, and consulting groups. That's why we continuously invest in innovative ideas, such as AI-enabled insights and technology-powered solutions, to enhance our services. Join us at Crowe and embark on a career where you can help shape the future of our industry., Everything we do is about making the future of human work more purposeful. We do this by leveraging state-of-the-art technologies, modern architecture, and industry experts to create AI-powered solutions that transform the way our clients do business. The new AI Transformation team will build on Crowe's established AI foundation, furthering the capabilities of our Applied AI / Machine Learning team. By combining Generative AI, Machine Learning and Software Engineering, this team empowers Crowe clients to transform their business models through AI, irrespective of their current AI adoption stage. As a member of AI Transformation, you will help distinguish Crowe in the market and drive the firm's technology and innovation strategy. The future is powered by AI, come build it with us. About the Team * We invest in expertise. You'll have the time, space, and support to go deep in your projects and build lasting technical and strategic mastery. You'll work with developers, product stakeholders, and project managers as a trusted leader and domain expert. * We believe in continuous growth. Our team is committed to professional development and knowledge-sharing. * We protect balance. Our distributed team culture is grounded in trust and flexibility. We offer unlimited PTO, a flexible remote work policy, and a supportive environment that prioritizes sustainable, long-term performance., Crowe (www.crowe.com) is one of the largest public accounting, consulting and technology firms in the United States. Crowe uses its deep industry expertise to provide audit services to public and private entities while also helping clients reach their goals with tax, advisory, risk and performance services. Crowe is recognized by many organizations as one of the country's best places to work. Crowe serves clients worldwide as an independent member of Crowe Global, one of the largest global accounting networks in the world. The network consists of more than 200 independent accounting and advisory services firms in more than 130 countries around the world. Crowe LLP provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, sexual orientation, gender identity or expression, genetics, national origin, disability or protected veteran status, or any other characteristic protected by federal, state or local laws. Crowe LLP does not accept unsolicited candidates, referrals or resumes from any staffing agency, recruiting service, sourcing entity or any other third-party paid service at any time. Any referrals, resumes or candidates submitted to Crowe, or any employee or owner of Crowe without a pre-existing agreement signed by both parties covering the submission will be considered the property of Crowe, and free of charge. Crowe will consider for employment all qualified applicants, including those with criminal histories, in a manner consistent with the requirements of applicable state and local laws, including the City of Los Angeles' Fair Chance Initiative for Hiring Ordinance, Los Angeles County Fair Chance Ordinance, San Francisco Fair Chance Ordinance, and the California Fair Chance Act. Please visit our webpage to see notices of the various state and local Ban-the-Box laws and Fair Chance Ordinances, where applicable.