L 5/DOD Site Reliability Engineer in Annapolis Junction
Role details
Job location
Tech stack
Job description
About TensleyTensley Consulting is a Service-Disabled Veteran-Owned Small Business focused on mission engineering in support of the United States Intelligence Community and the Department of Defense. Our team consists of System Engineers, Software Engineers, Test Engineers, and Signals Analysts performing work throughout the Continental United States (CONUS) and Outside the Continental United States (OCONUS). Equal Opportunity, We aim to build a team that represents a variety of backgrounds, perspectives, and skills. We embrace and ensure equal employment opportunity without discrimination or harassment based on , , , (including , childbirth, or related medical conditions), , or expression, , , , marital or domestic/civil partnership status, genetic information, citizenship status, military or veteran status, or any other personal characteristic.
Requirements
- Command of core cloud infrastructure deployment patterns for workload environments, including storage buckets, managed database services, cloud data processing services, and integration with application workloads using workload , targeted IAM role bindings, alignment of resource hierarchies according to zero trust principles, and organizational policies for secure and compliant resource access
- Understanding of application maturation requirements for cloud deployments, and the ability to effectively articulate workload transformation or maturation requirements to stakeholders
- Expertise in Terraform state management, including remote state storage and locking
- Design and implement standardized runtime environments in Kubernetes based ecosystems, leveraging GKE Enterprise, EKS, or AKS Fleet Management features for security and scalability
- Configuration and management of service mesh (Istio) for traffic management, security, and observability within Kubernetes runtime environments
- Ensure the security of the runtime environments by implementing appropriate security policies, network controls, and access management
- Integrate GitLab CI pipelines with CD tooling (e.g., Flux CD) for automated deployments and rollbacks within Kubernetes runtime deployments
- Deep understanding of VPC networking, security, and management services
- Optimization of runtime environments for performance, scalability, and cost-efficiency, collaborating with application teams to understand their needs, including capacity planning and resource management
- Expertise in the development of cost-effective SLAs, SLOs, and SLIs in collaboration with workload owners
- Demonstrate readability and accountability via code reviews (performing as a co-maintainer) within the limited areas of the codebase to ensure all code is written in a clear, consistent, and idiomatic style in alignment with all governance and compliance requirements
- Working knowledge of advanced zero-trust concepts and their implementation in Azure, GCP, or AWS
Education/Experience:
Bachelor's degree and at least 5 years of relevant experience, including 2 years in reliability, distributed systems, or platform development; OR 8 years of relevant experience without a degree; OR Equivalent combination of education and experience.
Salary Range: $140,000-$160,000. This represents the typical salary range for this position, but is not guaranteed. Salary is based on experience, location and contractual requirements which could fall outside of the range listed.
, Required Skills
5 years of experience in the domain items listed below:
Cloud Architecture - Deep expertise in distributed systems design, scalability, and fault tolerance, including multi-region, multi-cloud deployments with high availability.
Cloud Infrastructure Management - Mastery of Infrastructure as Code (Terraform, DSC, ARM) for complex environments. Building self-healing systems and advanced automation pipelines to eliminate toil. Expertise in CI/CD pipelines and GitOps workflows (i.e. Flux)
Application Modernization - Advanced experience with cloud platforms (AWS, GCP, Azure) and container orchestration (Kubernetes, Docker), advanced proficiency in Kubernetes, Docker, Helm, and service mesh (Istio/Anthos)
Monitoring Response Experience - Designing organization-wide monitoring strategies, leading incident response frameworks and ensuring rapid recovery across critical services, driving adoption of SLOs, SLIs, and error budgets as business-aligned reliability metrics
Zero Trust - Comprehensive understanding of Zero Trust strategic architecture principals
, Desired Skills
Google Cloud Certified Professional Cloud Architect and/or Azure Solutions Architect Expert .
Benefits & conditions
100% paid medical coverage with HSA and company contribution 100% paid vision, dental, short-term, and long-term premium 12% 401(k) contribution (not a match) Education and training budget 6 weeks and 3 days of PTO And much more!