Site Reliability Developer 3

Oracle
Austin, United States of America
yesterday

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Shift work
Languages
English

Job location

Seattle, United States of America

Tech stack

Java
JavaScript
Build Automation
Bash
Unix
Cloud Computing
Configuration Management
Continuous Availability
Continuous Delivery
Continuous Integration
Dynamic Host Configuration Protocol
Linux
Distributed Systems
DNS
Perl
Hypertext Transfer Protocols (HTTP)
Python
Oracle Applications
Performance Tuning
Reliability Engineering
Cloud Services
Ruby
Software Deployment
Transmission Control Protocol (TCP)
Scripting (Bash/Python/Go/Ruby)
Cloud Platform System
Kubernetes
Puppet
Terraform
Software Performance
Oracle Cloud Infrastructure
Docker
Go

Job description

At Oracle Cloud Infrastructure (OCI), we build the more intelligent future of cloud. OCI Sovereign Cloud is a team of smart, motivated, and diverse people that are focused on bringing the world's most important work to OCI. We build and operate our government, classified, and sovereign cloud regions to be reliable and high-performance, just like our public cloud. Our customers and their mission are the center of what we do. We strive to improve our knowledge of the challenges our customers face which we use to enhance our cloud capabilities and work together to deliver their mission.

Solve complex problems related to infrastructure cloud services and build automation to prevent problem recurrence. Design, write, and deploy software to improve the availability, scalability, and efficiency of Oracle products and services. Design and develop designs, architectures, standards, and methods for large-scale distributed systems. Facilitate service capacity planning and demand forecasting, software performance analysis, and system tuning.

You will provide cloud operations for Oracle National Security Realms. You'll be part of a dynamic team with a broad knowledge of how Oracle's cloud platform works. You'll partner with customer support, service owners, and engineering teams around the globe to ensure high-quality service for customers.

Note - this role is not a Monday to Friday core hours role - it will involve working a 24/7 shift rotation with on-call duties, including nights, weekends and public holidays.

  • Escalation points for junior Service Operations Engineers during complex or high-impact incidents.
  • Manage and execute complex manual Change Management tickets, by working closely with the service teams to ensure safety and minimal disruption to services.
  • Support the on-boarding of new services and tools, ensuring they are operationally ready and properly integrated.
  • Provide mentorship and training to SOEs, helping build team capability and confidence.
  • Create and maintain clear, useful documentation for operational processes and system support.
  • Identify areas of manual work and drive automation to reduce toil and improve efficiency.
  • Automate tasks to enable continuous delivery and ensure continuous availability with minimal human overhead
  • Recognize unsafe or inefficient practices and work with teams to design safer, more effective solutions.
  • Complete change requests to enable new functionality and maintain realm compliance
  • Ensure timely resolution of incidents, service requests, and change requests
  • Collaborate with global service and engineering teams
  • Define and drive change management, continuous integration, and deployment best practices
  • Help create and maintain real-world production architectures, scalability, and system design
  • Use a methodical approach to troubleshoot, large, complex, interconnected systems

Requirements

  • Linux and Unix operating systems
  • Docker, Kubernetes, and Terraform
  • Scripting languages such as Shell, Perl, Python, Java, and Go
  • Citizenship/location requirements - i.e. US Citizenship, U.S. Citizenship and possess and maintain TS/SCI w/Poly security clearance, reside in Seattle, WA.
  • Technology related bachelor's degree and/or equivalent work experience
  • A desire to learn and keep up with modern technologies
  • Proficient with writing services/task automation in Python, Bash, Ruby, Perl, JavaScript, or Java
  • Familiarity with core protocols (DNS, DHCP, HTTP, TCP)
  • Deep knowledge of Linux internals and host-based networking
  • Knowledge of Linux and/or Unix operating systems
  • Familiarity with configuration management solutions such as Chef, Puppet, etc
  • Experience with devising, managing, and extending monitoring solutions for large scale environments.
  • Knowledge of cloud computing concepts
  • Experience working in a mission-critical environment (Operations, Technical Support, NOC etc)
  • Proficient with communication skills (writing, organization, learning exchange)
  • Experience executing tasks under change management procedures
  • Experience resolving auto-cut and manual alarms following runbooks
  • A focus on customer satisfaction

About the company

 Oracle offers integrated suites of applications plus secure, autonomous infrastructure in the Oracle Cloud. For more information about Oracle (NYSE: ORCL), please visit us at www.oracle.com.

Our mission is to help people see data in new ways, discover insights, unlock endless possibilities.

Apply for this position