Senior Site Reliability Engineer

Elsevier group

Philadelphia, United States of America

7 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Philadelphia, United States of America

Tech stack

Amazon Web Services (AWS)

Cloud Computing

Cloud Engineering

Computer Networks

Continuous Delivery

Continuous Integration

Fault Tolerance

Github

Monitoring of Systems

Role-Based Access Control

Reliability Engineering

Newrelic

Data Processing

Scripting (Bash/Python/Go/Ruby)

System Availability

Delivery Pipeline

Infrastructure as Code (IaC)

Kubernetes

Deployment Automation

Amazon Web Services (AWS)

Job description

About team; This diverse team of Engineers in assisting multiple product teams as we continue to innovate all of our products within our global Cloud AWS landscape., * Designing, deploying, and maintaining highly available, scalable Kubernetes clusters on AWS EKS as well as the supporting ecosystem.

Managing and optimizing cross-portfolio cloud infrastructure, leveraging AWS services and supported organizational tooling
Developing and maintaining Infrastructure as Code (IaC) solutions to automate provisioning and management of cloud and Kubernetes resources.
Writing automation processes to streamline operational workflows, incident response, and infrastructure management.
Implementing CI/CD pipelines to facilitate deployments, testing, and validation.
Supporting multi-regional critical infrastructure, ensuring high availability and rapid incident resolution. Monitoring system health, instrument system components, troubleshoot issues, and perform root cause analysis.
Managing and supporting a complex cross-portfolio environment, coordinating across teams to ensure consistency and reliability.
Maintaining comprehensive documentation and best practice guides for solutions, ensuring users have clear instructions and support to effectively implement and operate their systems.
Mentoring junior team members and promoting best practices in SRE, automation, and cloud architecture.

Requirements

About the role, We are looking to immediately hire a highly skilled and proactive Senior SRE to join our dynamic team. You will combine software thinking and service operations to enable and run Elsevier's large-scale, 24x7, distributed and fault-tolerant systems within agreed reliability objectives, whilst enabling the fast flow of feature and service updates. The successful candidate will possess deep expertise in cloud-native architectures, along with strong automation skills., * Extensive experience deploying, managing, and troubleshooting containerised applications.

Deep understanding of Kubernetes architecture, networking, security, storage, and operational best practices.
Proven expertise with AWS services and architectural principles.
Extensive knowledge of AWS security, compliance, and best practices.
Advanced skills in writing modular, reusable IaC components.
Strong Python scripting skills for automation, tooling, and data processing.
Ability to develop custom solutions for monitoring, automation, and incident management. Experience designing and maintaining CI/CD workflows using GitHub Actions.
Curren experience Automating deployment pipelines, testing, and validation processes.
Familiarity with monitoring tools such as NewRelic. Knowledge of security best practices, network policies, and enterprise-grade RBAC policies.

About the company

Elsevier employs 10,000 people worldwide, including over 2,500 technologists. We have supported the work of our research and health partners for more than 140 years. Growing from our roots in publishing, we offer knowledge and valuable analytics that help our users make breakthroughs and drive societal progress. Digital solutions such as ScienceDirect, Scopus, SciVal, ClinicalKey and Sherpath support strategic research management, R&D performance, clinical decision support, medical education, and nursing education. Researchers and healthcare professionals rely on over 2,800 journals, including The Lancet and Cell; 46,000+ eBook titles; and iconic reference works, such as Gray's Anatomy. With the Elsevier Foundation and our external Inclusion & Diversity Advisory Board, we work in partnership with diverse stakeholders to advance inclusion and diversity in science, research and healthcare in developing countries and around the world. Elsevier is part of RELX a global provider of information-based analytics and decision tools for professional and business customers.

Role details

Job location

Tech stack

Job description

Requirements

About the company

Apply for this position

Good distractions

Moments

Videos View all