Site Reliability Engineer
STEAMPUNK INC.
McLean, United States of America
29 days ago
Role details
Contract type
Permanent contract Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
English Experience level
Intermediate Compensation
$ 200KJob location
McLean, United States of America
Tech stack
Java
JavaScript
Amazon Web Services (AWS)
Automation of Tests
Azure
Cloud Computing
DevOps
Programming Tools
DNS
Drupal
Github
Python
Routing
Performance Tuning
Reliability Engineering
Prometheus
Selenium
Software Engineering
Datadog
Data Logging
Pulumi
Transport Layer Security
Google Cloud Platform
Load Balancing
Autoscaling
Grafana
Gitlab
GIT
Cloudformation
Gitlab-ci
Kubernetes
Bitbucket
Cloudwatch
Terraform
New Relic (SaaS)
Jenkins
Go
Job description
- Establishing development tools and infrastructure for automation.
- Understanding the needs of stakeholders and conveying this to developers.
- Automate and improve development, testing, deployment, and release processes.
- Testing and examining code written by others and analyzing results.
- Own and improve the reliability, availability, and performance of production systems and services.
- Define, implement, and maintain Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets.
- Perform capacity planning, scalability analysis, and performance tuning for applications and infrastructure.
- Participate in on-call rotations, incident response, and post-incident reviews to drive long-term improvements.
- Design and implement infrastructure-as-code (IaC) to provision and manage cloud resources (e.g., AWS, Azure, GCP).
- Build and maintain CI/CD pipelines to ensure reliable, repeatable delivery of application and infrastructure changes.
- Engineer resilient architectures using concepts such as auto-scaling, blue/green deployments, canary releases, and self-healing patterns.
- Collaborate with security and platform teams to ensure infrastructure adheres to compliance, security, and governance requirements.
- Collaborate with application development teams to design reliable, observable, and operable services from the outset.
- Contribute to application code, tooling, and frameworks that enhance reliability, resilience, and performance.
- Act as an individual contributor and mentor more junior team members.
- Present regular status updates and provide cross-training to other DevOps team members.
Requirements
- Ability to obtain a U.S. government Security Clearance.
- BS Degree in an IT field with 10 years of experience OR BS in a non-IT field and 12 years of related IT experience.
- 3 years of experience with one or more clouds (i.e. AWS, Azure, or GCP).
- 3 years of experience with Git SCM providers such as GitHub, GitLab, Bitbucket.
- 3 years of experience with at least one programming language (e.g., Python, Go, Java, or JavaScript) for tooling, automation, or application development.
- Hands-on experience working with AWS in production environments.
- Hands-on experience designing, deploying, and operating Kubernetes-based systems (e.g., EKS, AKS, GKE).
- Experience with DevOps practices and tools, including CI/CD pipelines (e.g., GitHub Actions, GitLab CI, Jenkins, Azure DevOps).
- Hands-on experience with infrastructure-as-code tools (e.g., Terraform, CloudFormation, Pulumi) to manage cloud resources.
- Experience configuring and managing containerization and orchestration platforms.
- Experience implementing monitoring, logging, and tracing solutions (e.g., CloudWatch, Prometheus, Grafana, Datadog, New Relic, Elastic, OpenTelemetry).
- Familiarity with networking fundamentals (DNS, load balancing, routing, TLS) and their impact on reliability and performance.
- Experience with incident management, on-call operations, and production support practices.
- Certification(s) such as:
- Cloud certifications (e.g., AWS DevOps Engineer, AWS SysOps Administrator, Azure Administrator/DevOps Engineer, GCP Professional Cloud DevOps Engineer).
- Kubernetes certifications (e.g., CKA, CKAD).
Preferred
- Hands-on experience with Drupal and Azure.
- Experience implementing Automated Testing frameworks including Selenium.
- Excellent written and verbal communication skills, interpersonal and collaborative skills.
- Experience documenting an as-is state of the environment, perform a gap analysis, and produce artifacts that articulate options and recommendations.
- Experience designing and implementing SLOs, SLIs, and error budgets in production environments.
- Experience with chaos engineering, game days, and resilience testing.
- Local to Washington, DC metro area and available to be onsite 2 days a week.
- NIH experience.
Benefits & conditions
Steampunk relies on several factors to determine salary, including but not limited to geographic location, contractual requirements, education, knowledge, skills, competencies, and experience. The projected compensation range for this position is $125,000 to $200,000. The estimate displayed represents a typical annual salary range for this position. Annual salary is just one aspect of Steampunk's total compensation package for employees. Learn more about additional Steampunk benefits here.
About the company
Steampunk is a Change Agent in the Federal contracting industry, bringing new thinking to clients in the Homeland, Federal Civilian, Health and DoD sectors. Through our Human-Centered delivery methodology, we are fundamentally changing the expectations our Federal clients have for true shared accountability in solving their toughest mission challenges. As an employee owned company, we focus on investing in our employees to enable them to do the greatest work of their careers - and rewarding them for outstanding contributions to our growth. If you want to learn more about our story, visit http://www.steampunk.com .