TechOps / Support Engineer

The Maven

Seattle, United States of America

yesterday

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Intermediate

Job location

Seattle, United States of America

Tech stack

API

Amazon Web Services (AWS)

JIRA

Azure

Bash

Business Software

Unix

Cloud Computing

Cloud Engineering

Databases

Continuous Integration

Linux

DevOps

Monitoring of Systems

Information Technology Operations

Python

Powershell

Reliability Engineering

Prometheus

Datadog

Scripting (Bash/Python/Go/Ruby)

Google Cloud Platform

Cloud Platform System

Grafana

Information Technology

Splunk

New Relic (SaaS)

Dynatrace

ServiceNow

Microservices

Job description

We are seeking a highly motivated TechOps / Support Engineer to join our Technology Operations team. The role is responsible for maintaining platform reliability, managing production incidents, coordinating Major Incident Management (MIM), driving Root Cause Analysis (RCA), and ensuring timely resolution of Severity (Sev) incidents. The engineer will participate in an active on-call rotation and work closely with engineering, infrastructure, product, and business teams to minimize service disruptions and improve operational excellence., < data-start="750" data-end="787">Incident Management & Operations

Participate in a 24x7 on-call rotation and provide production support for critical business applications and services.
Act as Incident Commander or coordinator during Severity (Sev) incidents and major outages.
Lead Major Incident Management (MIM) activities, including stakeholder communication, bridge coordination, escalation management, and service restoration.
Drive incidents through to resolution while ensuring adherence to defined SLAs and operational procedures.
Monitor application, infrastructure, and platform health using observability and monitoring tools.
Perform proactive issue detection, troubleshooting, and remediation.

< data-start="1446" data-end="1476">Root Cause Analysis (RCA)

Lead and coordinate post-incident reviews and Root Cause Analysis (RCA) activities.
Identify underlying causes of recurring issues and collaborate with engineering teams to implement permanent fixes.
Track corrective and preventive actions to closure.
Maintain detailed incident documentation, timelines, and lessons learned.

< data-start="1812" data-end="1860">Problem Management & Continuous Improvement

Analyze incident trends and recommend operational improvements.
Develop and enhance runbooks, knowledge base articles, and operational procedures.
Drive automation initiatives to reduce manual effort and improve response times.
Contribute to operational readiness reviews for new releases and platform changes.

< data-start="2181" data-end="2211">Stakeholder Communication

Provide timely updates to internal stakeholders during critical incidents.
Coordinate across engineering, infrastructure, cloud, security, and vendor teams during issue resolution.
Ensure clear communication throughout the incident lifecycle.

Requirements

Bachelor''s degree in Computer Science, Information Technology, Engineering, or a related field.
3-7+ years of experience in Technical Operations, Production Support, Site Reliability Engineering (SRE), or IT Operations.
Strong experience managing production incidents and Sev1/Sev2 issues.
Hands-on experience with Major Incident Management (MIM) processes.
Proven experience conducting Root Cause Analysis (RCA) and driving corrective actions.
Strong troubleshooting skills across applications, APIs, databases, and infrastructure.
Experience with monitoring and observability tools such as Splunk, Datadog, Dynatrace, New Relic, Grafana, Prometheus, or similar.
Knowledge of Linux/Unix systems and cloud environments (AWS, Azure, or Google Cloud Platform).
Familiarity with ticketing and ITSM platforms such as ServiceNow, Jira Service Management, or similar.
Excellent communication and stakeholder management skills., * Experience supporting cloud-native and microservices-based architectures.
Knowledge of DevOps, CI/CD pipelines, and automation scripting (Python, Shell, PowerShell, etc.).
ITIL Foundation or relevant operational certifications.
Experience working in high-availability, mission-critical production environments., * Incident Leadership
Major Incident Management (MIM)
Root Cause Analysis (RCA)
Production Support
Operational Excellence
Problem Solving
Stakeholder Communication
Escalation Management
Automation Mindset
Team Collaboration

Role details

Job location

Tech stack

Job description

Requirements

Apply for this position

Good distractions

Moments

Videos View all