Infrastructure Engineer
Advanced
Jackson Township, United States of America
2 days ago
Role details
Contract type
Permanent contract Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
English Experience level
SeniorJob location
Jackson Township, United States of America
Tech stack
Artificial Intelligence
Amazon Web Services (AWS)
Build Automation
Cloud Computing
DevOps
Python
Machine Learning
Reliability Engineering
Software Tools
Prometheus
Software Deployment
Software Engineering
AI Infrastructure
Data Logging
Pulumi
GitHub Copilot
Grafana
Infrastructure as Code (IaC)
Kubernetes
Infrastructure Automation Frameworks
Deployment Automation
Data Management
Machine Learning Operations
Cloud Migration
Terraform
Data Pipelines
Job description
- Design, build, automate, and support large-scale, highly available cloud infrastructure environments
- Manage and optimize containerized production platforms and orchestration environments
- Develop and maintain Infrastructure as Code (IaC) solutions using tools such as Terraform or Pulumi
- Build automation tooling, operational utilities, and platform enhancements using Python or Go
- Drive infrastructure reliability, scalability, observability, and resiliency initiatives
- Partner closely with engineering, product, security, AI/ML, and platform teams to support enterprise-wide initiatives
- Implement and maintain monitoring, logging, alerting, and performance management solutions
- Troubleshoot complex production issues and proactively identify systemic risks or operational weaknesses
- Lead infrastructure improvements with a focus on reversibility, risk mitigation, and minimizing production blast radius
- Create operational standards, automation frameworks, and deployment strategies that improve engineering velocity and reliability
- Support AI-driven infrastructure operations, intelligent automation initiatives, and AI-assisted engineering workflows
- Evaluate and implement emerging AI-enabled operational tooling to improve efficiency, incident response, automation, and developer productivity
- Collaborate with engineering teams supporting AI/ML workloads, data platforms, and model deployment pipelines
- Own infrastructure initiatives end-to-end, including architecture, implementation, rollout, rollback planning, and operational support
Requirements
We are seeking a highly skilled Infrastructure Engineer to help design, build, automate, and operate scalable, high-availability production infrastructure in a fast-paced enterprise technology environment. This individual will play a key role in driving reliability, automation, cloud infrastructure strategy, operational excellence, and AI-enabled engineering practices across mission-critical systems., * 5 years of experience in Infrastructure Engineering, DevOps, Site Reliability Engineering, or similar roles supporting large-scale production environments
- Hands-on experience operating containerized production environments and orchestration platforms in enterprise or high-growth environments
- Strong experience with Kubernetes, Helm, and Infrastructure as Code tools such as Terraform or Pulumi
- Experience supporting cloud infrastructure environments, preferably AWS
- Proficiency in Python or Go for automation, tooling, and infrastructure development
- Strong experience with monitoring, observability, and logging platforms such as Prometheus, Grafana, ELK, or equivalent technologies
- Experience implementing resilient infrastructure designs focused on scalability, reliability, rollback strategies, and operational safety
- Strong understanding of infrastructure tradeoffs involving reliability, cost optimization, deployment velocity, and operational risk
- Demonstrated experience leveraging AI-assisted engineering tools and agentic AI workflows within day-to-day development and operational practices
- Experience utilizing AI-enabled platforms such as Claude Code, Codex, GitHub Copilot, or similar tools to improve automation, troubleshooting, deployment efficiency, and operational workflows
- Familiarity with infrastructure requirements supporting AI/ML environments, including compute scalability, data processing pipelines, model deployment, or GPU-enabled workloads is highly desirable, * Excellent communication and cross-functional collaboration skills
- Strong analytical and problem-solving capabilities
- Ability to challenge assumptions, identify operational gaps, and recommend innovative infrastructure solutions
- Proven ownership mindset with experience leading infrastructure initiatives from concept through production deployment
- Strong organizational skills with the ability to prioritize and execute in fast-paced environments
- Passion for continuous improvement, emerging technologies, and modern AI-enabled operational practices
Preferred Skills:
- Software engineering background with experience building and maintaining production-grade applications, services, libraries, or internal frameworks
- Ability to read, troubleshoot, and modify application codebases supporting infrastructure platforms
- Experience bridging infrastructure engineering and software development practices
- Experience building reusable platform tooling, developer enablement frameworks, or internal infrastructure products
- Experience supporting enterprise-scale cloud transformation or modernization initiatives
- Exposure to MLOps, AI infrastructure, vector databases, model serving frameworks, or intelligent automation platforms
- Experience supporting AI/ML engineering teams through scalable infrastructure and deployment automation