Kubernetes Security Engineer

OpenKyber LLC

Morton Township, United States of America

2 months ago

Role details

Contract type

Temporary contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Compensation

$ 119K

Job location

Remote

Morton Township, United States of America

Tech stack

Kubernetes Security

API

Artificial Intelligence

Amazon Web Services (AWS)

JIRA

Azure

Bash

Cloud Computing

Databases

Linux

DevOps

Monitoring of Systems

Information Technology Operations

Python

Knowledge Management

Network Security

NoSQL

Powershell

Cloud Services

Prometheus

Runbook

SQL Databases

Datadog

Scripting (Bash/Python/Go/Ruby)

Google Cloud Platform

Enterprise Software Applications

Load Balancing

Cloud Platform System

Chatbots

Grafana

Apigee

Kubernetes

Kafka

Hardware Infrastructure

Splunk

ServiceNow

Requirements

Do you have experience in Technical troubleshooting support?, Description Of Services : SRE Operations Engineer The L1 SRE is the first line of defense in monitoring, triaging, and executing standardized operational tasks for all enterprise applications running on standard patterns and platforms like Kubernetes, APIs, WAF, databases, API Proxy (Gloo, APIGEE), Kafka, and Cloud (AWS/Azure/Google Cloud Platform). They will followrunbooks, leverage automation, and escalate appropriately to minimize downtime. Skills Mandatory Skills (Must-Have):

System & Infrastructure Monitoring Expectation: Ability to use monitoring dashboards (e.g., Grafana, Datadog, Splunk, Argos, AIOps) toidentify anomalies, follow alert workflows, and escalate when thresholds are breached.
Runbook Execution Expectation: Strictly follow documented steps to resolve standard incidents, escalate when stepsdo not apply or fail.
Incident Triage & Communication Expectation: Perform first-line triage of alerts, gather logs/metrics, categorize severity, and notify stakeholders in clear, concise language.
Kubernetes (Cloud or on-prem) operations knowledge Expectation: Ability to check pod status, understand logs, and verify service endpoints using kubectl and monitoring tools.
Scripting (Python, Bash, PowerShell) Expectation: Able to read and make small edits to scripts to automate repetitive checks.
Networking & Security Awareness Expectation: Understand troubleshooting (ping, netstat, curl, traceroute) and know when issues may be related to firewall, WAF, or proxy.
Documentation & Knowledge Capture Expectation: Accurately record steps taken during incidents, suggest runbook updates where gapsexist.

Preferred Skills (Nice-to-Have):

Cloud Platform Familiarity (AWS, Azure, Google Cloud Platform) Expectation: Understand basics of cloud services (VMs, load balancers, storage) and how tonavigate a cloud console.
Database Basics (SQL/NoSQL) Expectation: Run simple queries to validate DB connectivity and health.
Automation & Self-Service Mindset Expectation: Identify repetitive manual steps and propose candidates for automation.
Exposure to Incident Management Tools (xMatters, ServiceNow, Jira, etc.) Expectation: Comfortable working within ITSM/incident workflows.
AI/Chatbot-Assisted Ops (emerging skill) Expectation: Use AI assistants to search runbooks or suggest remediation steps.

Qualifications 2 5 years in IT operations, NOC, or SRE/DevOps engineer role. Kubernetes 101, Linux 101, Networking 101 Understanding of cloud-ready applications Understanding of observability tools (Prometheus, Grafana, ELK, Splunk, etc.). Strong troubleshooting mindset, ability to follow structured workflows. Eg: 5 Why?s and Fishbone

Deliverables : Monitor system health, alerts, dashboards, and logs across cloud and on-prem infrastructure. Ability to isolate functional issue with application versus platform Execute standardized runbooks for incident resolution, deployments, and routine tasks. Perform initial triage of incidents and escalate to L2/L2+ as needed to mitigate the issue to get tobypass. Document new issues, gaps in runbooks, and automation opportunities. Provide excellent communication to stakeholders during incidents. Support onboarding of new applications into the operations framework.

Role details

Job location

Tech stack

Requirements

Apply for this position

Good distractions

Moments

Videos View all