DevOps Engineer

Ingram Barge Company
Nashville, United States of America
4 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Shift work
Languages
English
Experience level
Senior

Job location

Nashville, United States of America

Tech stack

.NET
Application Performance Management
Azure
Configuration Management
Computer Networks
System Configuration
Continuous Integration
Data Visualization
DevOps
Disaster Recovery
Monitoring of Systems
Nagios
Performance Tuning
Reliability Engineering
Site Reliability Engineering Practices
Ansible
Prometheus
Message Oriented Middleware
Software Deployment
Software Engineering
Systems Integration
Azure
Scripting (Bash/Python/Go/Ruby)
Cloud Platform System
Autoscaling
Delivery Pipeline
Grafana
Reliability of Systems
Gitlab
Angular
Kubernetes
Infrastructure Automation Frameworks
Deployment Automation
Performance Monitor
Azure
Front End Software Development
3-tier Architectures
Terraform
Devsecops
Pagerduty

Job description

Ingram Marine Group is seeking a DevOps Engineer to join our dynamic DevSecOps Team in the Nashville, TN area. This person will work alongside our Systems Architect, Application Development Architect, and Security Engineer and focuses on operationalizing our cloud-native infrastructure, enhancing CI/CD pipelines, ensuring system reliability and resilience, and providing 24x7 operational support.

What you will be doing:

Pipeline & Automation

  • Designing and implementing advanced CI/CD pipeline features using GitLab
  • Developing and maintaining Terraform modules for infrastructure provisioning
  • Creating and optimizing Ansible playbooks for configuration management and deployment automation
  • Integrating security scanning and compliance checks into deployment pipelines

Container & Kubernetes Operations

  • Building, configuring, and maintaining Azure Kubernetes Service (AKS) clusters

  • Developing and optimizing Helm charts for application deployments

  • Implementing and managing GitOps workflows

  • Monitoring and troubleshooting containerized applications and cluster performance

Infrastructure & Reliability

  • Implementing Infrastructure as Code best practices using Terraform and Ansible
  • Designing and executing disaster recovery procedures and business continuity plans
  • Performing system patching, upgrades, and maintenance activities
  • Establishing and maintaining comprehensive monitoring, alerting, and observability solutions using Prometheus and Grafana

Cost Optimization & Resource Management

  • Monitoring and analyzing Azure cloud spending patterns and resource utilization
  • Implementing cost optimization strategies including right-sizing, reserved instances, and auto-scaling policies
  • Developing dashboards and reports for cost tracking and forecasting
  • Collaborating with teams to optimize resource allocation and eliminating waste

Monitoring & Observability

  • Designing and implementing comprehensive monitoring solutions using Prometheus for metrics collection
  • Building and maintaining Grafana dashboards for infrastructure, application, and business metrics
  • Configuring intelligent alerting rules and escalation procedures
  • Establishing SLIs, SLOs, and error budgets for critical services

24x7 Support & Incident Response

  • Participating in on-call rotation for 24x7 production support
  • Leading Tier 3 incident response efforts for production outages and system issues
  • Performing root cause analysis and implementing preventive measures
  • Collaborating with development teams on performance optimization and troubleshooting
  • Maintaining runbooks and documentation for operational procedures

Requirements

Do you have experience in System performance monitoring?, Technical Expertise (5+ years)

  • Strong experience with Kubernetes (AKS preferred) and container orchestration
  • Proficiency in Infrastructure as Code: Terraform and Ansible
  • Advanced GitLab CI/CD pipeline development and optimization
  • Experience with GitOps methodologies and leading toolsets like Helm, Flux and/or ArgoCD
  • Python scripting for automation and pipeline tasks
  • Azure cloud services and networking concepts

Monitoring & Cost Management

  • Hands-on experience with Prometheus for metrics collection and alerting
  • Proficiency in Grafana for dashboard creation and data visualization
  • Experience with Azure Cost Management tools and FinOps practices
  • Knowledge of resource optimization techniques and auto-scaling strategies
  • Understanding of cloud pricing models and cost allocation methods

DevOps & SRE Practices

  • Incident management and post-mortem processes
  • 24x7 on-call experience with escalation procedures
  • Disaster recovery planning and implementation
  • Security best practices in CI/CD and infrastructure
  • Experience with chaos engineering and resilience testing

Collaborative Skills

  • Experience working with cross-functional teams
  • Strong troubleshooting and problem-solving abilities under pressure
  • Documentation and knowledge sharing practices
  • Comfortable with 24x7 on-call rotation responsibilities, * Azure certifications (AZ-104, AZ-400, or AKS-related)
  • Experience with message bus systems (Azure Service Bus)
  • Knowledge of .NET applications and Angular frontend deployments
  • Familiarity with secret management solutions (Delinea or similar)
  • Experience with additional monitoring tools (Azure Monitor, Application Insights)
  • FinOps certification or cost optimization experience
  • Experience with alerting tools and PagerDuty integration

Apply for this position