Platform Engineer
Role details
Job location
Tech stack
Job description
Apex Systems is seeking an experienced Platform Engineer to help build, automate, and operate the cloud platform foundation supporting a major migration from on-premise systems to AWS. This role focuses on delivering secure, reliable, and scalable platform services-including networking, compute, storage, container platforms, and automation tooling-while enabling application teams through self-service, Infrastructure as Code, and modern cloud-native patterns. You will work closely with Cloud Architecture, DevSecOps, Networking, and Security teams to ensure all platform components meet organizational security requirements.
Cloud Platform Engineering & Core Infrastructure
Build and operate foundational AWS services, including VPCs, Transit Gateway, Direct Connect/VPN, IAM, KMS, CloudWatch, and ECS/EKS.
Implement secure, scalable compute, storage, and networking patterns aligned with Landing Zone architecture.
Deploy and manage container platforms (ECS Fargate, EKS, or Kubernetes variants) supporting modernization of legacy middleware.
Build platform services such as service mesh, API gateways, logging pipelines, and centralized monitoring.
Implement secrets management, encryption, and secure service-to-service communication.
Support migration of VMware, Windows, Linux, and middleware workloads into AWS using standardized platform patterns.
Ensure platform components meet organizational and Zero Trust requirements for authentication, authorization, logging, and auditability.
Apply AI-assisted observability, anomaly detection, and predictive alerting to improve platform reliability.
Infrastructure as Code, Automation & Self-Service Enablement
Build and maintain IaC using Terraform, CloudFormation, and Ansible for AWS and hybrid environments.
Develop reusable Terraform modules, CloudFormation templates, and platform blueprints for consistent provisioning.
Implement Git-based IaC workflows with automated plan/apply pipelines (GitHub, GitLab, Azure Repos).
Automate provisioning of accounts, networks, compute, and platform services using AWS Service Catalog, AFT, or custom automation.
Implement drift detection and automated remediation using Terraform Cloud/Enterprise, Atlantis, or AWS native tools.
Build runbook automation using AWS Systems Manager, Ansible Automation Platform.
Enable self-service provisioning for application teams through templates, catalogs, and automation workflows.
Use generative AI to accelerate IaC creation, documentation, and operational runbook generation.
Observability, Operations & Reliability Engineering
Build centralized logging, metrics, and tracing pipelines using CloudWatch, OpenTelemetry, PrometheGrafana, Elastic Stack, or Datadog.
Implement alerting, incident response workflows, and operational dashboards for platform services.
Support SRE practices including SLOs/SLIs, error budgets, and blameless incident reviews.
Implement automated health checks, scaling policies, and resilience patterns for platform workloads.
Integrate platform services with CI/CD pipelines to ensure consistent deployment and operational readiness.
Apply AI/ML for log correlation, predictive scaling, and automated incident triage.
Maintain platform runbooks, operational standards, and architecture decision records.
Requirements
Hands-on experience building and operating AWS infrastructure (networking, compute, storage, IAM, monitoring).
Strong proficiency with Terraform, CloudFormation, and Ansible.
Experience with container platforms (ECS, EKS, or Kubernetes).
Experience automating infrastructure provisioning and configuration.
Familiarity with hybrid networking (Direct Connect, VPN, Transit Gateway).
Experience with centralized logging, monitoring, and observability tooling.
Understanding of security controls, secrets management, and compliance frameworks.
Experience supporting application teams through platform services or self-service tooling.
Preferred
Experience with a broad range of AWS services, including CloudFront, S3, Cloud Map, DataSync, CloudTrail, AppMesh, SQS, GuardDuty, AWS Inspector, Route 53, Security Groups, Subnets, Network ACLs, WAF, IAM, and VPC Endpoints.
Experience migrating legacy middleware (JBoss/WebLogic) to containers or AWS services.
Experience with service mesh, API gateways, or event-driven architectures.
Experience with AFT, AWS Service Catalog.
Experience with VMware-to-AWS migration patterns.
Background in SRE, reliability engineering, or platform operations.
Relevant certifications such as AWS Solutions Architect, SysOps Administrator, HashiCorp Terraform Associate, Kubernetes (CKA/CKAD), or CompTIA Security+ or equivalent