Platform Engineer
IBM
Catonsville, United States of America
2 days ago
Role details
Contract type
Permanent contract Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
EnglishJob location
Catonsville, United States of America
Tech stack
Kubernetes Security
Amazon Web Services (AWS)
Azure
Computer Security
System Configuration
Continuous Integration
Data Infrastructure
Elasticsearch
Network Topologies
Python
Network Control
Network File Systems
Network Interface
Octopus Deploy
Openshift
Red Hat Enterprise Linux - RHEL
Ansible
Prometheus
Ceph
Cloud Platform System
Fluentd
Grafana
Infrastructure as Code (IaC)
Containerization
Kubernetes
Rancher
Kibana
Terraform
Network Server
Devsecops
Docker
Service Stack
Job description
- Collaborate with Government engineers to design, develop, and maintain the AECC Kubernetes-based platform architecture.
- Develop, deploy, and maintain Infrastructure as Code (IaC) to support the deployment, operation, and sustainment of the AECC Kubernetes platform.
- Provision customer namespaces and containerized workloads using IaC as applications are onboarded to the AECC platform.
- Perform day-to-day administration of the Kubernetes platform, including patching and upgrading core components such as the control plane, worker nodes, CNI plugins, CSI drivers, and CRI runtimes.
- Configure and manage Container Network Interface (CNI) plugins (e.g., Calico, Cilium, OpenShift SDN) to ensure secure, resilient, and high-performance networking for Kubernetes workloads.
- Deploy and manage Container Storage Interface (CSI) drivers to enable dynamic provisioning of persistent storage for containerized workloads (e.g., OpenShift Data Foundation, Ceph).
- Monitor platform and cluster resource utilization and proactively notify the AECC Lead when additional capacity or scaling is required.
- Harden the Kubernetes platform in accordance with cloud-native best practices and applicable government cybersecurity and compliance requirements.
- Troubleshoot platform issues and perform root cause analysis across Kubernetes networking, containerized workloads, persistent storage, and runtime components.
- Develop and maintain platform documentation, including system diagrams, network topology, Kubernetes configurations, CSI and CRI configurations, and standard operating procedures.
- Provide on-call support for triage and resolution of after-hours production incidents.
- Recommend and implement improvements to platform security, scalability, reliability, and performance across containerized compute, network, and storage services.
- Build, configure, and administer Kubernetes clusters in support of mission-critical workloads.
- Implement and manage container registries (e.g., Docker Hub, Harbor, Red Hat Quay) to enable secure container image storage and distribution.
- Deploy and manage persistent storage solutions for containerized workloads (e.g., OpenShift Data Foundation, Ceph, NFS).
Requirements
Required technical and professional expertise
- Senior-level experience with Kubernetes-based platforms, including Red Hat OpenShift, Rancher, or upstream (vanilla) Kubernetes.
- Experience with Container Storage Interface (CSI) drivers for dynamic provisioning of persistent storage (e.g., OpenShift Data Foundation, Ceph, AWS EBS, Azure Disks).
- Strong Infrastructure as Code (IaC) experience using tools such as Helm, Kustomize, and Terraform, with GitOps workflows using Argo CD or Flux.
- Strong troubleshooting skills across the full technology stack, including Kubernetes networking, storage, servers, and containerized applications.
- Familiarity with persistent storage solutions for Kubernetes workloads (e.g., OpenShift Data Foundation, Ceph, NFS).
- Experience with Kubernetes monitoring and observability tools, such as Prometheus, Grafana, Elasticsearch, Fluentd, and Kibana.
- Security+ or equivalent DoD 8570.01-M Information Assurance Technical Level II certification.
- Must hold and maintain an active DoD Secret Security Clearance
- Must possess, or obtain within six (6) months of hire, a Computing Environment certification in a related field, including:
- Certified Kubernetes Administrator (CKA)
- Red Hat Certified Specialist in OpenShift Administration
- Certified Kubernetes Security Specialist (CKS)
- Other equivalent Kubernetes or container-related certifications
Preferred technical and professional experience
- Proficiency in building, configuring, and administering Kubernetes clusters in enterprise environments.
- Experience with container orchestration tools and technologies, including Docker, Podman, and Kubernetes.
- Experience managing container registries and container image lifecycles (e.g., Docker Hub, Harbor, Red Hat Quay).
- Strong automation and Infrastructure as Code (IaC) skills using tools such as Ansible, Terraform, and Python.
- Familiarity with DevSecOps practices, including CI/CD pipeline integration for containerized workloads