DevOps Engineer

AMERICAN AI LOGISTICS, LLC
Washington, United States of America
yesterday

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Intermediate
Compensation
$ 170K

Job location

Remote
Washington, United States of America

Tech stack

Microsoft Access
Amazon Web Services (AWS)
Amazon Web Services (AWS)
Amazon Web Services (AWS)
Audit Trail
Automation of Tests
Bash
Cloud Computing
Configuration Management
Databases
System Configuration
Continuous Integration
Customer Data Management
Data Transmissions
Software Debugging
Desktop Virtualization
DevOps
Fault Tolerance
Identity and Access Management
Subnetting
Python
Key Management
PostgreSQL
Message Broker
Network Architecture
Operational Databases
RabbitMQ
Ansible
Service Discovery
Session Manager SubSystems
Systems Integration
Tripwire
Software Vulnerability Management
Datadog
S3 Bucket
Cloud Platform System
Autoscaling
Grafana
Caching
Amazon Web Services (AWS)
Database Migration
Amazon Web Services (AWS)
Kubernetes
Deployment Automation
Performance Monitor
Amazon Web Services (AWS)
Bitbucket
Functional Programming
Cloudwatch
Terraform
SentinelOne Expertise
Serverless Computing
Docker
Pagerduty
Static Application Security Testing

Job description

We're looking for a DevOps / Infrastructure Engineer to own the cloud platform that powers a B2B SaaS product serving U.S. government procurement. Our application runs on AWS and processes sensitive customer data under multiple compliance frameworks, and common criteria of reliability, security, and auditability. They aren't aspirational goals, they're contractual obligations.

You'll be responsible for the full infrastructure lifecycle: provisioning and hardening cloud resources, building and maintaining CI/CD pipelines, managing container deployments, and keeping production systems available and secure. This is a small team, you won't be filing tickets for someone else to execute. You'll design it, build it, ship it, and carry the pager for it., * Designing, provisioning, and managing AWS infrastructure using Terraform, resources such VPCs, subnets, security groups, ALBs, ECS clusters, RDS instances, S3 buckets, KMS keys, and IAM policies

  • Managing multi-region AWS environments with environment promotion patterns (dev * stage * production) and strict resource isolation between environments
  • Maintaining Terraform state management, module organization, and CI-driven plan/apply workflows with proper review gates
  • Designing network architecture - public/private subnet segmentation, NAT gateways, security group rules following least-privilege principles, and WAF configurations

Containers & Compute

  • Building and optimizing Docker images for production services running on ECS Fargate, including multi-stage builds, image size reduction, and layer caching strategies
  • Managing ECR container registries with lifecycle policies, image scanning (Trivy, AWS Inspector), and automated vulnerability remediation workflows
  • Configuring ECS service definitions, task placement, auto-scaling policies, and multi-AZ deployment for fault tolerance
  • Managing EC2 instances where needed (e.g., virtual desktop infrastructure, bastion access, specialized compute) using Ansible for configuration management and patching

CI/CD & Deployment

  • Building and maintaining CI/CD pipelines (Bitbucket Pipelines or equivalent) with linting, SAST, dependency scanning, IaC scanning, and automated tests as CI gates
  • Implementing deployment strategies - rolling deployments, blue/green, canary releases - with automated rollback capabilities
  • Managing branch-promotion deployment models with required approvals, environment-specific configurations, and secrets injection at deploy time
  • Automating database migration execution as part of the deployment pipeline with safety checks and rollback procedures

Monitoring, Observability & Incident Response

  • Building and maintaining the monitoring stack - CloudWatch metrics, alarms, dashboards, and log aggregation pipelines (CloudWatch Logs * Kinesis Firehose * S3 for long-term retention)
  • Configuring alert routing by severity - PagerDuty for critical/major incidents, Slack for informational notifications - with on-call escalation policies
  • Triaging GuardDuty security findings and AWS Inspector vulnerability reports, converting them into actionable remediation tickets with severity-based SLA targets
  • Participating in on-call rotation, incident response, and post-incident reviews. Carrying the pager and owning the resolution.

Security, Secrets & Encryption

  • Managing IAM policies, resource based policies, service control policies, permissions boundaries, roles, and trust relationships following least-privilege principles; for human users, CI/CD pipelines, and service-to-service access
  • Administering secrets management using AWS Secrets Manager and SSM Parameter Store; rotation policies, access controls, and runtime injection patterns
  • Managing KMS customer-managed keys (CMKs) for encryption at rest across S3, RDS, and other services; key rotation, key policies, and documenting exceptions
  • Implementing and maintaining just-in-time privileged access mechanisms (e.g., SSM Session Manager) for production database and infrastructure access with full audit logging
  • Managing TLS certificates via ACM, reviewing supported cipher suites and protocols, and hardening edge configurations

Database & Messaging Infrastructure

  • Managing RDS PostgreSQL instances - provisioning, parameter tuning, backup configuration, point-in-time recovery, and semi-annual restore testing
  • Managing message broker infrastructure (Amazon MQ / RabbitMQ) - cluster configuration, queue topology, monitoring for consumer lag and dead-letter queues
  • Capacity planning and cost optimization across compute, storage, and data transfer

Compliance & Audit

  • Supporting SOC 2 (and more such as ISO 27001, FedRAMP) compliance requirements, implementing and evidencing controls for security, availability, and confidentiality
  • Maintaining CloudTrail organization trails, log retention policies, and audit trail integrity for compliance evidence
  • Operating vulnerability management automation - scanning, ticket creation with severity-based remediation SLAs, and exception handling
  • Supporting quarterly access reviews, documenting infrastructure changes, and maintaining runbooks for operational procedures

Requirements

Do you have experience in WAF?, Required

  • 3+ years of experience operating production infrastructure on AWS in a security-conscious environment
  • Strong Terraform skills - module design, state management, workspace/environment patterns, and CI-driven apply workflows. You should be writing Terraform daily, not occasionally.
  • Deep familiarity with ECS Fargate (or equivalent container orchestration) - task definitions, service discovery, scaling, health checks, and deployment strategies
  • Solid Docker experience - writing production Dockerfiles, optimizing builds, debugging container runtime issues, and managing image registries
  • Strong understanding of AWS networking - VPC design, subnet architecture, security groups, NACLs, ALB/NLB configuration, WAF rules, and NAT gateways
  • Proficiency in IAM - policies, roles, trust relationships, service-linked roles, and the principle of least privilege applied rigorously, not aspirationally
  • Experience with RDS PostgreSQL - provisioning, parameter groups, backup/restore, read replicas, and performance monitoring
  • Comfortable with Bash scripting and general-purpose automation (Python is a plus)
  • Experience building and maintaining CI/CD pipelines with quality gates, artifact management, and environment promotion
  • Understanding of deployment strategies (rolling, blue/green, canary) and when to use each
  • Familiarity with monitoring and alerting - CloudWatch (or Datadog/Grafana), log aggregation, and on-call incident response workflows
  • Experience with SOC 2 or equivalent compliance frameworks; you've implemented controls, gathered evidence, and worked with auditors, not just read about it

Preferred

  • Experience with Ansible for EC2 configuration management, patching, and fleet operations
  • Familiarity with Lambda and serverless patterns - event-driven automation, monitoring integrations, and cost-effective background processing
  • Experience managing Amazon MQ, RabbitMQ, or similar message broker infrastructure in production
  • Familiarity with KMS key management, Secrets Manager rotation, and encryption-at-rest strategies across AWS services
  • Experience with AWS Inspector, Trivy, Checkov, or similar vulnerability and IaC scanning tools integrated into CI pipelines
  • Familiarity with SSM Session Manager or similar just-in-time access patterns for production environments
  • Experience with AWS Organizations, Control Tower, or multi-account strategies
  • Exposure to FedRAMP, ISO 27001, NIST 800-53, or CMMC compliance frameworks
  • Experience managing virtual desktop infrastructure (AWS WorkSpaces or similar) for contractor access in regulated environments
  • Familiarity with cost optimization - reserved instances, savings plans, right-sizing, and tagging strategies for chargeback/allocation
  • Experience with GitOps workflows and infrastructure drift detection

Tech Environment

Category: Technologies

Cloud: AWS (ECS Fargate, EC2, S3, RDS, Lambda, KMS, CloudWatch, WAF, ALB)

IaC: Terraform (Terraform Cloud + S3 state backends)

Containers: Docker, ECR, ECS Fargate

CI/CD: Bitbucket Pipelines (SAST, dependency scanning, IaC scanning)

Config Mgmt: Ansible (EC2 fleet), SSM Parameter Store

Networking: VPC, Security Groups, NAT, ALB, WAF, ACM (TLS)

Secrets: AWS Secrets Manager, SSM Parameter Store, KMS CMKs

Monitoring: CloudWatch, Kinesis Firehose, SNS, PagerDuty, GuardDuty

Security: IAM, AWS Inspector, Trivy, Checkov, SentinelOne

Database: RDS PostgreSQL (multi-AZ, 365-day backup retention)

Messaging: Amazon MQ (RabbitMQ)

Access: SSM Session Manager (JIT), Tailscale, AWS WorkSpaces

Benefits & conditions

Pulled from the full job description

  • 401(k)
  • Health insurance
  • Vision insurance
  • Dental insurance, What You Won't Find Here
  • A ticket-taking role where someone else designs the architecture. You'll own infrastructure end-to-end, from design through production operation.
  • Kubernetes. We run on ECS Fargate deliberately, less operational overhead, right-sized for our scale.
  • A pure ops role with no development. You'll write Terraform modules, CI/CD pipelines, automation scripts, and Lambda functions. Infrastructure is code here, not clickops.
  • A large platform team. We're small. You'll work directly with engineering, security, and leadership. Low bureaucracy, high ownership.

Pay: $145,000.00 - $170,000.00 per year, * 401(k)

  • Dental insurance
  • Health insurance
  • Vision insurance

About the company

American AI Logistics (AAIL) is a defense technology company scaling rapidly. We are at the most consequential moment in our short history - growing from a startup into a full-scale defense technology platform. We move fast, win big, and are looking for driven individuals who want to be part of something that matters.

Apply for this position