Senior DevOps/Platform Engineer III - Richland, WA

Pacific Northwest National Laboratory
Richland, United States of America
5 days ago

Role details

Contract type
Temporary contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior
Compensation
$ 133K

Job location

Richland, United States of America

Tech stack

Adobe Analytics
.NET
Adobe InDesign
Artificial Intelligence
Airflow
Amazon Web Services (AWS)
Amazon Web Services (AWS)
Data analysis
Azure
Big Data
C Sharp (Programming Language)
C++
Cloud Computing
Software Quality
Code Reuse
Computer Programming
Computer Networks
Data Fusion
Data Infrastructure
ETL
Data Systems
DevOps
Programming Tools
Disaster Recovery
Distributed Computing Environment
Distributed Systems
Github
Information Lifecycle Management
Python
Key Management
PostgreSQL
Machine Learning
Enterprise Messaging Systems
MongoDB
Open Source Technology
Performance Tuning
Reliability Engineering
Cloud Services
Prometheus
Runbook
Service Discovery
Software Engineering
Software Systems
Policy as Code
Data Logging
Pulumi
Data Processing
Load Balancing
Azure
GitHub Copilot
Istio
System Availability
Delivery Pipeline
Large Language Models
Snowflake
Spark
Multi-Cloud
Reliability of Systems
HybridCloud
GIT
Cloudformation
Data Layers
Event Driven Architecture
Containerization
Data Lake
Gitlab-ci
Kubernetes
Infrastructure Automation Frameworks
Information Technology
Apache Flink
Amazon Web Services (AWS)
Kafka
Linkerd (Service Mesh)
Machine Learning Operations
Cloudwatch
Api Gateway
Terraform
Splunk
Software Version Control
Data Pipelines
Dynatrace
Devsecops
Serverless Computing
Docker
ELK
Jenkins
Redshift
Databricks
Vulnerability Analysis
Go
Microservices

Job description

We are seeking a Senior DevOps/Platform Engineer to join PNNL's advanced AI engineering initiatives, contributing to next-generation systems spanning agentic AI platforms, large-scale data orchestration, and real-time intelligence processing. In this role, you'll apply your expertise in scalable system design and AI/ML engineering to build mission-critical capabilities while developing your technical leadership and establishing yourself as a key contributor to our engineering community., * Build components of LLM orchestration frameworks using LangChain, LlamaIndex, and emerging platforms

  • Contribute to MLOps platforms including experiment tracking, model versioning, and deployment pipelines
  • Create developer tooling, utilities, and interfaces for AI-native frameworks
  • Integrate multi-modal data sources into cohesive processing pipelines

Scalable Infrastructure & Data Systems

  • Develop microservices within distributed architectures handling high-throughput workloads
  • Build components of real-time streaming platforms and event-driven systems
  • Implement data pipelines for large-scale ETL, data processing, and analytics
  • Deploy containerized applications using Kubernetes and support CI/CD pipelines
  • Contribute to systems deployed in secure and edge environments

Mission-Critical Production Systems

  • Deploy AI systems with appropriate monitoring, logging, and observability
  • Ensure code quality, security best practices, and compliance standards
  • Build geospatial processing, time-series, and data fusion capabilities
  • Support system performance optimization and troubleshooting

Technical Leadership

  • Lead technical components of projects and tasks
  • Mentor junior staff and contribute to team knowledge sharing
  • Participate in design discussions and contribute to architectural decisions
  • Support proposal development with technical content and scoping
  • Build effective collaborations across teams and S&E domains, * Detect and prevent smuggling of drugs and contraband at ports of entry [Link (https://www.pnnl.gov/sites/default/files/media/file/NII%20Capabilities%20072621_0.pdf) ]
  • Develop large data pipelines to thwart funding for terrorists, nuclear proliferators, drug cartels, and rogue leaders [Link (https://www.pnnl.gov/sites/default/files/media/file/PNNL_Treasury_AWS%20collab%201121.pdf) ]
  • Applying big data solutions to national security problems [Link (https://www.pnnl.gov/news-media/science-front-line-ralph-perko) ]
  • Applying image classification for nuclear forensics analysis [Link (https://www.pnnl.gov/sites/default/files/media/file/NSD_1259_FLYER_SharkzorHighlights_FINAL_0.pdf) ]
  • Develop capabilities for scalable geospatial analytics [Link (https://www.pnnl.gov/sites/default/files/media/file/GeoBOSS%20Open-Source%20Geospatial%20Analytics%20at%20Scale.pdf) ]

This position is based in Richland, WA and requires an onsite presence Monday through Thursday, with Friday as required by business needs.

Requirements

You're an accomplished engineer with strong foundations in DevOps, scalable system design, AI/ML development, and production software engineering. You're ready to take on increasing technical responsibility, leading components of complex systems while mentoring junior team members. You excel at translating technical requirements into working solutions, selecting appropriate approaches for challenging problems, and contributing meaningfully to technical direction and project success., * Demonstrated proficiency in Python and working knowledge of at least one additional language (C#/.NET, Go, C++) for infrastructure automation and tooling development

  • Knowledge of Infrastructure as Code principles and tools including Terraform, CloudFormation, Pulumi, or ARM templates with emphasis on modular, reusable code patterns
  • Ability to design, implement, and maintain sophisticated CI/CD pipelines across multiple environments using tools such as Jenkins, GitLab CI, GitHub Actions, or Azure DevOps
  • Proficiency with version control workflows (Git), GitOps methodologies, automated testing frameworks for infrastructure code, and policy-as-code practices with consistent use of AI assist tools (e.g., Claude, GitHub Copilot) to accelerate automation and troubleshooting

Cloud & Container Orchestration

  • Demonstrated experience designing and managing infrastructure across cloud platforms (AWS, Azure, or GCP) with multi-cloud experience highly valued
  • Strong expertise with containerization technologies (Docker) and container orchestration platforms (Kubernetes, EKS, AKS, or GKE) including advanced concepts like operators, custom resources, and cluster management
  • Ability to design and implement event-driven architectures using cloud-native services (AWS EventBridge, Azure Event Grid, Pub/Sub) and messaging systems with understanding of service mesh technologies (Istio, Linkerd) and API gateway patterns
  • Knowledge of networking concepts in cloud and containerized environments including CNI plugins, ingress controllers, load balancing, and service discovery with familiarity in edge computing deployments and hybrid cloud architectures

Observability, Reliability & Security

  • Ability to implement comprehensive observability solutions including metrics collection (Prometheus, CloudWatch), distributed tracing (Jaeger, Tempo), and centralized logging (ELK Stack, Loki, Splunk)
  • Understanding of Site Reliability Engineering (SRE) principles including SLOs, SLIs, error budgets, and incident response with ability to design and implement chaos engineering practices to improve system resilience
  • Experience implementing security best practices including secrets management (Vault, AWS Secrets Manager), vulnerability scanning, and DevSecOps tooling
  • Knowledge of disaster recovery strategies, backup automation, and business continuity planning with understanding of compliance frameworks and ability to implement automated compliance controls

Data Platform Operations & ML Infrastructure

  • Understanding of cloud-native data pipeline architectures and ETL/ELT orchestration (AWS Glue, Azure Data Factory, Airflow, Prefect) with ability to build and maintain infrastructure supporting ML pipelines, model training workflows, and MLOps practices
  • Knowledge of deploying and operating cloud-based data storage systems and platforms (S3, Redshift, Delta Lake, PostgreSQL, MongoDB, OpenSearch, Snowflake)
  • Understanding of distributed data processing frameworks (Spark/Databricks, Kafka, Flink) with experience operating Kubernetes-based platforms for data workloads including Spark on K8s, Ray clusters, or Kubeflow
  • Ability to implement infrastructure supporting large-scale data systems with appropriate monitoring, cost optimization, and performance tuning including storage tiering, data lifecycle management, and compute resource optimization

Collaboration & Operations

  • Strong problem-solving abilities with experience troubleshooting complex distributed systems spanning applications, infrastructure, and data layers
  • Excellent communication skills to collaborate effectively with software engineers, data scientists, security teams, and business stakeholders with ability to create clear, comprehensive documentation for infrastructure designs, runbooks, and disaster recovery procedures
  • Demonstrated capacity to manage multiple infrastructure initiatives simultaneously while maintaining high availability and reliability standards with proven ability to mentor team members on DevOps practices and operational excellence
  • Experience participating in on-call rotations, incident response, and post-mortem processes with ability to balance tactical operational needs with strategic infrastructure improvements, * PhD and 1 year of Software Engineering experience -OR-
  • MS/MA and 3 years of Software Engineering experience -OR-
  • BS/BA and 5 years of Software Engineering experience -OR
  • AA and 14 years of Software Engineering experience in designing, architecting, programming, deploying, and automating software solutions in support of scientific research or consumer digital product development -OR-
  • HS/GED and 16 years of Software Engineering experience in designing, architecting, programming, deploying, and automating software solutions in support of scientific research or consumer digital product development, * Degree in computer science, software engineering, or related field.
  • 3-5 years of hands-on DevOps, Platform Engineering, Site Reliability Engineering, or Infrastructure Engineering experience.
  • Experience in contributing to technical direction and independently structuring complex problems into actionable work, in collaboration with senior engineers and cross-functional teams.
  • Expertise in Python and proficiency in at least one other language (C#/.NET, C++, Go).
  • Contributions to open-source infrastructure projects or active participation in DevOps communities.

Hazardous Working Conditions/Environment

Not applicable.

Additional Information

This position requires the ability to obtain and maintain a federal security clearance.

A security clearance background investigation includes review of your employment, education, financial, and criminal history, as well as interviews with you and your personal references, neighbors, and co-workers to determine trustworthiness, reliability, and loyalty to the United States. The investigation also examines your foreign connections, drug and alcohol use, foreign influence, and overall conduct., * U.S. Citizenship

  • Background Investigation: Applicants selected will be subject to a Federal background investigation and must meet eligibility requirements for access to classified matter in accordance with 10 CFR 710, Appendix B.
  • Drug Testing: All Security Clearance positions are Testing Designated Positions, which means that the applicant selected for hire is subject to pre-employment drug testing, and post-employment random drug testing. In addition, applicants must be able to demonstrate non-use of illegal drugs, including marijuana, for the 12 consecutive months preceding completion of the requisite Questionnaire for National Security Positions (QNSP).

Note: Applicants will be considered ineligible for security clearance processing by the U.S. Department of Energy if non-use of illegal drugs, including marijuana, for 12 months cannot be demonstrated.Eligible"

Benefits & conditions

PNNL lists the full pay range for the position in the job posting. Starting pay is calculated from the minimum of the pay range and actual placement in the range is determined based on an individual's relevant job-related skills, qualifications, and experience. This approach is applicable to all positions, with the exception of positions governed by collective bargaining agreements and certain limited-term positions which have specific pay rules.

As part of our commitment to fair compensation practices, we do not ask for or consider current or past salaries in making compensation offers at hire. Instead, our compensation offers are determined by the specific requirements of the position, prevailing market trends, applicable collective bargaining agreements, pay equity for the position type, and individual qualifications and skills relevant to the performance of the position.

Minimum Salary

USD $133,100.00/Yr.

Maximum Salary

USD $210,400.00/Yr.

About the company

At PNNL, our core capabilities are divided among major departments that we refer to as Directorates within the Lab, focused on a specific area of scientific research or other function, with its own leadership team and dedicated budget. Our Science & Technology directorates include National Security, Earth and Biological Sciences, Physical and Computational Sciences, and Energy and Environment. In addition, we have an Environmental Molecular Sciences Laboratory, a Department of Energy, Office of Science user facility housed on the PNNL campus. The National Security Directorate (NSD) drives science-based, mission-focused solutions to take on complex, real-world threats to our nation and the world. The AI and Data Analytics Division, part of NSD, combines profound domain expertise and creative integration of advanced hardware and software to deliver computational solutions that address complex data and analytic challenges. Working in multidisciplinary teams, we connect foundational research to engineering to operations, providing the tools to innovate quickly and field results faster. Our strengths are integrated across the data analytics lifecycle, from data acquisition and management to analysis and decision support., Pacific Northwest National Laboratory (PNNL), is a world-class research institution powered by a highly educated, diverse workforce committed to the values of Integrity, Creativity, Collaboration, Impact, and Courage. Every year, scores of dynamic, driven people come to PNNL to work with renowned researchers on meaningful science, innovations and outcomes for the U.S. Department of Energy and other sponsors; here is your chance to be one of them! At PNNL, you will find an exciting research environment and excellent benefits including health insurance, and flexible work schedules. PNNL is located in eastern Washington State-the dry side of Washington known for its stellar outdoor recreation and affordable cost of living. The Lab's campus is only a 45-minute flight (or ~3 hour drive) from Seattle or Portland, and is serviced by the convenient PSC airport, connected to 8 major hubs., Please be aware that the Department of Energy (DOE) prohibits DOE employees and contractors from having any affiliation with the foreign government of a country DOE has identified as a "country of risk" without explicit approval by DOE and Battelle. If you are offered a position at PNNL and currently have any affiliation with the government of one of these countries, you will be required to disclose this information and recuse yourself of that affiliation or receive approval from DOE and Battelle prior to your first day of employment.

Apply for this position