Software Developer 5

Oracle
Austin, United States of America
yesterday

Role details

Contract type
Permanent contract
Employment type
Part-time (≤ 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior
Compensation
$ 306K

Job location

Pierre, United States of America

Tech stack

Java
API
Artificial Intelligence
Amazon Web Services (AWS)
Build Automation
Azure
Border Gateway Protocol
Cloud Computing
Cloud Engineering
Cloud Storage
Computer Clusters
Code Coverage
Software Quality
Code Review
Nvidia CUDA
Computer Networks
Software Debugging
Linux
Distributed Computing Environment
Distributed Systems
Monitoring of Systems
Identity and Access Management
InfiniBand
Virtual Private Networks (VPN)
Network Control
Node.js
Open Source Technology
Operational Databases
Remote Direct Memory Access
Cloud Services
Service Discovery
Software Engineering
Google Cloud Platform
Load Balancing
Computer Network Technologies
Autoscaling
Istio
Amazon Web Services (AWS)
Build Management
Kubernetes
Storage Technologies
Machine Learning Operations
Terraform
Oracle Cloud Infrastructure
Go

Job description

You will work on core OKE platform capabilities including cluster lifecycle management, orchestration, scalability, reliability, performance, automation, observability, security, and integration with OCI infrastructure services. The ideal candidate has hands-on experience designing, building, operating, or deeply debugging production cloud services, infrastructure platforms, or Kubernetes-based systems at meaningful scale.

This role requires advanced Kubernetes experience, including Kubernetes control plane behavior, controllers and operators, scheduling, autoscaling, networking, storage, service discovery, container runtimes, node lifecycle, Kubernetes APIs, and etcd. Experience with Kubernetes networking and storage technologies such as CNI, Cilium, Calico, Flannel, other container networking implementations, CSI drivers, and cloud provider integrations is highly relevant.

OKE is also expanding to support demanding AI and accelerated computing use cases. Experience with AI/ML infrastructure, multi-node GPU clusters, accelerated compute, model training or inference platforms, GPU scheduling, device plugins, Karpenter, cluster autoscaling, CUDA, NCCL, RoCE, InfiniBand, RDMA, SmartNIC/DPU offload, or high-performance AI/HPC networking is a significant plus.

This role also requires an engineer who is ready to use modern agentic engineering practices responsibly. We expect senior engineers to apply AI-assisted and agentic workflows to accelerate design exploration, implementation, testing, debugging, documentation, operational analysis, and developer productivity while maintaining strong ownership, security judgment, code quality, and production accountability.

Responsibilities

As a member of the software engineering division, you will take an active role in defining and evolving standard practices and procedures. You will define specifications for significant new projects and specify, design, develop, troubleshoot, and debug software for OCI's managed Kubernetes service.

Responsibilities include:

  • Provide technical leadership for major OKE platform initiatives from architecture through implementation, launch, and production operation.

  • Design and build distributed systems that create, update, scale, repair, and operate Kubernetes clusters across OCI regions.

  • Improve OKE reliability, scalability, performance, upgrade safety, lifecycle management, observability, automation, and operational tooling.

  • Work deeply with Kubernetes technologies, including control plane components, controllers/operators, scheduling, autoscaling, Kubernetes APIs, container runtimes, node behavior, and etcd.

  • Design, debug, and improve Kubernetes networking and storage integrations, including CNI-based networking, Cilium, Calico, Flannel, other container networking implementations, CSI drivers, and OCI infrastructure integrations.

  • Build automation for cluster validation, health checks, readiness testing, failure detection, remote recovery, and reduction of post-deployment operational issues.

  • Lead technical design reviews, code reviews, incident reviews, and production readiness reviews for complex service changes.

  • Debug difficult production issues across service boundaries, including Kubernetes, Linux, networking, compute, storage, identity, telemetry, and OCI infrastructure dependencies.

  • Apply performance engineering practices including profiling, tracing, latency analysis, throughput optimization, and production diagnostics across distributed systems.

  • Build automation that reduces manual operations, improves fleet health, accelerates diagnosis, and raises the quality bar for OKE engineering.

  • Partner with OCI service teams to deliver end-to-end platform capabilities regardless of organizational boundaries.

  • Apply AI-assisted and agentic engineering workflows to improve engineering velocity, test coverage, debugging, operational analysis, and documentation while ensuring correctness, security, and maintainability.

  • Mentor engineers, influence technical direction, and help establish patterns that scale across the OKE organization.

  • Participate in operating a 24x7 cloud service and use customer feedback, production data, and operational experience to prioritize improvements.

Requirements

We are looking for a senior IC5 software engineer with deep Kubernetes expertise, required cloud infrastructure experience, and a strong distributed systems background. This is a high-impact technical leadership role for an engineer who can define architecture, drive cross-team execution, solve ambiguous production and platform problems, and deliver durable systems that improve both customer experience and operational excellence., + 10+ years of software engineering experience, or equivalent experience building and operating production software systems.

  • Hands-on cloud infrastructure experience is required, ideally designing, building, operating, or debugging production services or platforms on OCI, AWS, Azure, GCP, or a large-scale private cloud.

  • Strong hands-on Kubernetes expertise is required, including Kubernetes architecture, APIs, control plane behavior, controllers/operators, scheduling, autoscaling, networking, storage, nodes, cluster lifecycle management, or production cluster operations.

  • Advanced Kubernetes knowledge, including CNI, CSI, etcd, service discovery, container runtimes, node lifecycle, and Kubernetes failure modes.

  • Experience with Kubernetes networking technologies such as Cilium, Calico, Flannel, or other CNI implementations.

  • Experience with Kubernetes storage integrations, including CSI drivers or cloud storage integrations.

  • Strong distributed systems fundamentals, including availability, failure handling, performance, scalability, and operational tradeoffs.

  • Experience building highly available infrastructure services, platform services, or cloud native systems used in production.

  • Strong development experience in both Go/Golang and Java is required.

  • Strong Linux, networking, debugging, and production operations skills.

  • Demonstrated ability to lead ambiguous technical projects, influence across teams, and deliver through other engineers without relying on formal authority.

  • Strong communication skills, ownership, judgment, and ability to make pragmatic tradeoffs in production systems.

Preferred qualifications:

  • Experience with AI/ML infrastructure, GPU workloads, multi-node GPU clusters, accelerated compute, model training or inference platforms, GPU scheduling, device plugins, Karpenter, cluster autoscaling, CUDA, NCCL, high-performance networking, or distributed training systems.

  • Experience with eBPF-based networking, Kubernetes network policy, service mesh, ingress, load balancing, overlays/underlays, BGP, VXLAN, SmartNIC/DPU offload, RoCE, InfiniBand, RDMA, or multi-cluster networking.

  • Experience with infrastructure as code and cloud provisioning tools such as Terraform, Packer, cloud-init, IAM, VCN/VPC networking, VPN, FastConnect/Direct Connect, or equivalent cloud primitives.

  • Experience building developer productivity, operational automation, or responsible AI-assisted and agentic engineering workflows.

  • Experience with observability systems, incident response, safe deployment practices, canary analysis, rollback strategies, service health automation, and large fleet operations.

  • Open-source or upstream contribution experience in Kubernetes, cloud native infrastructure, observability, networking, or related systems.

Benefits & conditions

US: Hiring Range in USD from: $96,800 to $306,400 per annum. May be eligible for bonus, equity, and compensation deferral.

Oracle maintains broad salary ranges for its roles in order to account for variations in knowledge, skills, experience, market conditions and locations, as well as reflect Oracle's differing products, industries and lines of business.

Candidates are typically placed into the range based on the preceding factors as well as internal peer equity.

Oracle US offers a comprehensive benefits package which includes the following:

  1. Medical, dental, and vision insurance, including expert medical opinion

  2. Short term disability and long term disability

  3. Life insurance and AD&D

  4. Supplemental life insurance (Employee/Spouse/Child)

  5. Health care and dependent care Flexible Spending Accounts

  6. Pre-tax commuter and parking benefits

  7. 401(k) Savings and Investment Plan with company match

  8. Paid time off: Flexible Vacation is provided to all eligible employees assigned to a salaried (non-overtime eligible) position. Accrued Vacation is provided to all other employees eligible for vacation benefits. For employees working at least 35 hours per week, the vacation accrual rate is 13 days annually for the first three years of employment and 18 days annually for subsequent years of employment. Vacation accrual is prorated for employees working between 20 and 34 hours per week. Employees working fewer than 20 hours per week are not eligible for vacation.

  9. 11 paid holidays

  10. Paid sick leave: 72 hours of paid sick leave upon date of hire. Refreshes each calendar year. Unused balance will carry over each year up to a maximum cap of 112 hours.

  11. Paid parental leave

About the company

 Oracle offers integrated suites of applications plus secure, autonomous infrastructure in the Oracle Cloud. For more information about Oracle (NYSE: ORCL), please visit us at www.oracle.com.

Our mission is to help people see data in new ways, discover insights, unlock endless possibilities.

Apply for this position