DevOps Engineer

AMERICAN AI LOGISTICS, LLC

Washington, United States of America

yesterday

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Intermediate

Compensation

$ 170K

Job location

Remote

Washington, United States of America

Tech stack

Microsoft Access

Amazon Web Services (AWS)

Audit Trail

Automation of Tests

Bash

Cloud Computing

Configuration Management

Databases

System Configuration

Continuous Integration

Customer Data Management

Data Transmissions

Software Debugging

Desktop Virtualization

DevOps

Fault Tolerance

Identity and Access Management

Subnetting

Python

Key Management

PostgreSQL

Message Broker

Network Architecture

Operational Databases

RabbitMQ

Ansible

Service Discovery

Session Manager SubSystems

Systems Integration

Tripwire

Software Vulnerability Management

Datadog

S3 Bucket

Cloud Platform System

Autoscaling

Grafana

Caching

Amazon Web Services (AWS)

Database Migration

Amazon Web Services (AWS)

Kubernetes

Deployment Automation

Performance Monitor

Amazon Web Services (AWS)

Bitbucket

Functional Programming

Cloudwatch

Terraform

SentinelOne Expertise

Serverless Computing

Docker

Pagerduty

Static Application Security Testing

Job description

We're looking for a DevOps / Infrastructure Engineer to own the cloud platform that powers a B2B SaaS product serving U.S. government procurement. Our application runs on AWS and processes sensitive customer data under multiple compliance frameworks, and common criteria of reliability, security, and auditability. They aren't aspirational goals, they're contractual obligations.

You'll be responsible for the full infrastructure lifecycle: provisioning and hardening cloud resources, building and maintaining CI/CD pipelines, managing container deployments, and keeping production systems available and secure. This is a small team, you won't be filing tickets for someone else to execute. You'll design it, build it, ship it, and carry the pager for it., * Designing, provisioning, and managing AWS infrastructure using Terraform, resources such VPCs, subnets, security groups, ALBs, ECS clusters, RDS instances, S3 buckets, KMS keys, and IAM policies

Managing multi-region AWS environments with environment promotion patterns (dev * stage * production) and strict resource isolation between environments
Maintaining Terraform state management, module organization, and CI-driven plan/apply workflows with proper review gates
Designing network architecture - public/private subnet segmentation, NAT gateways, security group rules following least-privilege principles, and WAF configurations

Containers & Compute

Building and optimizing Docker images for production services running on ECS Fargate, including multi-stage builds, image size reduction, and layer caching strategies
Managing ECR container registries with lifecycle policies, image scanning (Trivy, AWS Inspector), and automated vulnerability remediation workflows
Configuring ECS service definitions, task placement, auto-scaling policies, and multi-AZ deployment for fault tolerance
Managing EC2 instances where needed (e.g., virtual desktop infrastructure, bastion access, specialized compute) using Ansible for configuration management and patching

CI/CD & Deployment

Building and maintaining CI/CD pipelines (Bitbucket Pipelines or equivalent) with linting, SAST, dependency scanning, IaC scanning, and automated tests as CI gates
Implementing deployment strategies - rolling deployments, blue/green, canary releases - with automated rollback capabilities
Managing branch-promotion deployment models with required approvals, environment-specific configurations, and secrets injection at deploy time
Automating database migration execution as part of the deployment pipeline with safety checks and rollback procedures

Monitoring, Observability & Incident Response

Building and maintaining the monitoring stack - CloudWatch metrics, alarms, dashboards, and log aggregation pipelines (CloudWatch Logs * Kinesis Firehose * S3 for long-term retention)
Configuring alert routing by severity - PagerDuty for critical/major incidents, Slack for informational notifications - with on-call escalation policies
Triaging GuardDuty security findings and AWS Inspector vulnerability reports, converting them into actionable remediation tickets with severity-based SLA targets
Participating in on-call rotation, incident response, and post-incident reviews. Carrying the pager and owning the resolution.

Security, Secrets & Encryption

Managing IAM policies, resource based policies, service control policies, permissions boundaries, roles, and trust relationships following least-privilege principles; for human users, CI/CD pipelines, and service-to-service access
Administering secrets management using AWS Secrets Manager and SSM Parameter Store; rotation policies, access controls, and runtime injection patterns
Managing KMS customer-managed keys (CMKs) for encryption at rest across S3, RDS, and other services; key rotation, key policies, and documenting exceptions
Implementing and maintaining just-in-time privileged access mechanisms (e.g., SSM Session Manager) for production database and infrastructure access with full audit logging
Managing TLS certificates via ACM, reviewing supported cipher suites and protocols, and hardening edge configurations

Database & Messaging Infrastructure

Managing RDS PostgreSQL instances - provisioning, parameter tuning, backup configuration, point-in-time recovery, and semi-annual restore testing
Managing message broker infrastructure (Amazon MQ / RabbitMQ) - cluster configuration, queue topology, monitoring for consumer lag and dead-letter queues
Capacity planning and cost optimization across compute, storage, and data transfer

Compliance & Audit

Supporting SOC 2 (and more such as ISO 27001, FedRAMP) compliance requirements, implementing and evidencing controls for security, availability, and confidentiality
Maintaining CloudTrail organization trails, log retention policies, and audit trail integrity for compliance evidence
Operating vulnerability management automation - scanning, ticket creation with severity-based remediation SLAs, and exception handling
Supporting quarterly access reviews, documenting infrastructure changes, and maintaining runbooks for operational procedures

Requirements

Do you have experience in WAF?, Required

3+ years of experience operating production infrastructure on AWS in a security-conscious environment
Strong Terraform skills - module design, state management, workspace/environment patterns, and CI-driven apply workflows. You should be writing Terraform daily, not occasionally.
Deep familiarity with ECS Fargate (or equivalent container orchestration) - task definitions, service discovery, scaling, health checks, and deployment strategies
Solid Docker experience - writing production Dockerfiles, optimizing builds, debugging container runtime issues, and managing image registries
Strong understanding of AWS networking - VPC design, subnet architecture, security groups, NACLs, ALB/NLB configuration, WAF rules, and NAT gateways
Proficiency in IAM - policies, roles, trust relationships, service-linked roles, and the principle of least privilege applied rigorously, not aspirationally
Experience with RDS PostgreSQL - provisioning, parameter groups, backup/restore, read replicas, and performance monitoring
Comfortable with Bash scripting and general-purpose automation (Python is a plus)
Experience building and maintaining CI/CD pipelines with quality gates, artifact management, and environment promotion
Understanding of deployment strategies (rolling, blue/green, canary) and when to use each
Familiarity with monitoring and alerting - CloudWatch (or Datadog/Grafana), log aggregation, and on-call incident response workflows
Experience with SOC 2 or equivalent compliance frameworks; you've implemented controls, gathered evidence, and worked with auditors, not just read about it

Preferred

Experience with Ansible for EC2 configuration management, patching, and fleet operations
Familiarity with Lambda and serverless patterns - event-driven automation, monitoring integrations, and cost-effective background processing
Experience managing Amazon MQ, RabbitMQ, or similar message broker infrastructure in production
Familiarity with KMS key management, Secrets Manager rotation, and encryption-at-rest strategies across AWS services
Experience with AWS Inspector, Trivy, Checkov, or similar vulnerability and IaC scanning tools integrated into CI pipelines
Familiarity with SSM Session Manager or similar just-in-time access patterns for production environments
Experience with AWS Organizations, Control Tower, or multi-account strategies
Exposure to FedRAMP, ISO 27001, NIST 800-53, or CMMC compliance frameworks
Experience managing virtual desktop infrastructure (AWS WorkSpaces or similar) for contractor access in regulated environments
Familiarity with cost optimization - reserved instances, savings plans, right-sizing, and tagging strategies for chargeback/allocation
Experience with GitOps workflows and infrastructure drift detection

Tech Environment

Category: Technologies

Cloud: AWS (ECS Fargate, EC2, S3, RDS, Lambda, KMS, CloudWatch, WAF, ALB)

IaC: Terraform (Terraform Cloud + S3 state backends)

Containers: Docker, ECR, ECS Fargate

CI/CD: Bitbucket Pipelines (SAST, dependency scanning, IaC scanning)

Config Mgmt: Ansible (EC2 fleet), SSM Parameter Store

Networking: VPC, Security Groups, NAT, ALB, WAF, ACM (TLS)

Secrets: AWS Secrets Manager, SSM Parameter Store, KMS CMKs

Monitoring: CloudWatch, Kinesis Firehose, SNS, PagerDuty, GuardDuty

Security: IAM, AWS Inspector, Trivy, Checkov, SentinelOne

Database: RDS PostgreSQL (multi-AZ, 365-day backup retention)

Messaging: Amazon MQ (RabbitMQ)

Access: SSM Session Manager (JIT), Tailscale, AWS WorkSpaces

Benefits & conditions

Pulled from the full job description

401(k)
Health insurance
Vision insurance
Dental insurance, What You Won't Find Here
A ticket-taking role where someone else designs the architecture. You'll own infrastructure end-to-end, from design through production operation.
Kubernetes. We run on ECS Fargate deliberately, less operational overhead, right-sized for our scale.
A pure ops role with no development. You'll write Terraform modules, CI/CD pipelines, automation scripts, and Lambda functions. Infrastructure is code here, not clickops.
A large platform team. We're small. You'll work directly with engineering, security, and leadership. Low bureaucracy, high ownership.

Pay: $145,000.00 - $170,000.00 per year, * 401(k)

Dental insurance
Health insurance
Vision insurance

About the company

American AI Logistics (AAIL) is a defense technology company scaling rapidly. We are at the most consequential moment in our short history - growing from a startup into a full-scale defense technology platform. We move fast, win big, and are looking for driven individuals who want to be part of something that matters.

Role details

Job location

Tech stack

Job description

Requirements

Benefits & conditions

About the company

Apply for this position

Good distractions

Moments

Videos View all