Platform Engineer

Blackhawk Network

Pleasanton, United States of America

2 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Junior

Compensation

$ 85K

Job location

Remote

Pleasanton, United States of America

Tech stack

Artificial Intelligence

Amazon Web Services (AWS)

Bash

Cloud Computing

Continuous Integration

Cursor (Graphical User Interface Elements)

Linux

DevOps

Distributed Systems

Monitoring of Systems

Python

Operational Data Store

Reliability Engineering

Software Tools

Prometheus

Software Engineering

Scripting (Bash/Python/Go/Ruby)

Cloud Platform System

GitHub Copilot

System Availability

Grafana

GIT

Kubernetes

Information Technology

Splunk

New Relic (SaaS)

Docker

Jenkins

ServiceNow

Job description

You'll split your time between engineering solutions and operating our production platforms-maintaining the health of BHN's production services while building the automation, observability, and AI-driven capabilities that make incidents less frequent, easier to diagnose, and faster to resolve.

As part of the OCC, you'll play an active role in Major Incident Management, partnering with engineering teams to diagnose and restore production services during critical incidents. Outside of incident response, you'll build dashboards, improve monitoring, develop automation, analyse operational data, and engineer intelligent tooling that continuously improves platform reliability.

This role provides exceptional exposure to large-scale distributed systems, cloud infrastructure, Kubernetes, CI/CD, observability platforms, automation, AI-assisted software development, and production engineering.

If you're naturally curious, enjoy solving complex technical problems, and want to accelerate your engineering career, we'd love to hear from you.

Responsibilities:

Major Incident Response & Production Operations

Participate in the 24×7 on-call rotation supporting BHN's production platforms.
Monitor production health using modern observability platforms.
Lead or support Major Incident bridges, coordinating technical teams during high-severity production incidents.
Perform technical triage, identify probable causes, and drive rapid service restoration.
Communicate clearly with engineers, leadership, and business stakeholders throughout incidents.
Lead post-incident reviews focused on learning and continuous improvement.
Identify recurring operational pain points and engineer permanent solutions.

Platform Engineering & Automation

Develop automation that reduces manual operational effort.
Build internal engineering tools that improve developer productivity and platform reliability.
Create dashboards, alerts, health scores, and operational insights.
Improve CI/CD pipelines and deployment safety.
Automate operational workflows and repetitive tasks.
Build self-service capabilities for engineering teams.
Develop auto-remediation and self-healing capabilities.
Continuously improve platform reliability through engineering rather than manual intervention.

Observability & Reliability Engineering

Design alerts that detect customer-impacting issues early while minimising alert fatigue.
Improve platform visibility through metrics, logs, traces, and dashboards.
Analyse production behaviour to identify reliability improvements.
Develop operational KPIs and engineering health metrics.
Define and measure Service Level Indicators (SLIs) and Service Level Objectives (SLOs).
Use operational data to drive engineering decisions and improve platform resilience.

AI Engineering & Intelligent Operations

Use AI-assisted software development tools to improve engineering productivity.
Develop AI-powered incident summarisation and communication capabilities.
Build intelligent root cause analysis and diagnostic tooling.
Create operational copilots and engineering assistants.
Enhance alerts with contextual intelligence.
Automate diagnostics and operational workflows.
Build AI-driven orchestration and auto-remediation capabilities.
Develop engineering knowledge systems that improve troubleshooting and accelerate learning.

You'll Gain Experience Across

Cloud Infrastructure (AWS)
Kubernetes
CI/CD Engineering
Infrastructure as Code
Observability & Monitoring
Major Incident Management
Production Operations
Reliability Engineering (SRE)
Automation Engineering
AI Engineering
Large-scale Distributed Systems, * Kubernetes
Docker
Jenkins
Splunk
New Relic
Prometheus
Grafana
OpenTelemetry
ServiceNow
Python automation
Infrastructure as Code (Terraform, CloudFormation, etc.)
CI/CD engineering
Distributed systems
FinTech, payments, or other high-availability production environments

We seek candidates who not only demonstrate curiosity and adaptability in emerging technologies but have also successfully implemented and utilized AI tools to enhance their work, improve processes, or deliver measurable results. Our teams embrace continuous learning and the thoughtful integration of AI to create meaningful impact - for our employees and the future of work.

Requirements

Our Operations Command Centre (OCC) is looking for a Platform Engineer with strong technical foundations, exceptional problem-solving ability, and a passion for building reliable systems., * Bachelor's degree in Computer Science, Engineering, or equivalent practical experience.

Experience in Platform Engineering, DevOps, Site Reliability Engineering (SRE), Infrastructure Engineering, Technical Operations, or a similar technical role.
Strong Linux fundamentals.
Experience with AWS or another major cloud platform.
Experience with Git and modern software development workflows.
Basic scripting experience using Python, Bash, or a similar language.
Strong analytical and troubleshooting skills, with exposure to production support or Major Incident Management.
Excellent written and verbal communication skills.
Strong ownership mindset with a passion for continuous improvement.
Experience using modern AI engineering tools such as GitHub Copilot, Cursor, Claude, or similar AI-assisted development platforms.

Benefits & conditions

Non-exempt, Hourly Rate for California Residents Only: USD $40.67/Hr

Non-exempt, Hourly Rate for Illinois Residents Only: USD $32.09/Hr

Pay is based on several factors including but not limited to education, work experience, certifications, etc. In addition to your salary, Blackhawk Network offers benefits including 401k with employer match, medical, dental, vision, 12 paid holidays throughout the year, sick pay accrual according to state law, parental leave, life insurance, disability insurance, accident and illness insurance, health and dependent care flexible spending accounts, wellness benefits, and paid time off for all full-time employees.

About the company

Today, through BHN's single global platform, businesses of all kinds can tap into the world's largest network of branded payment solutions. BHN helps businesses grow revenue, increase loyalty, motivate and reward their teams, disburse funds and engage consumers. Branded payment solutions include the issuance and distribution of gift cards, egifts, corporate payouts and rewards, along with the technology to deliver these products in seamless, integrated ways. BHN's network spans the globe with more than 400,000 consumer touchpoints. Learn more at BHN.com. Hybrid flexibility: At Blackhawk Network, you'll enjoy the best of both worlds-focused remote work plus in-person collaboration on Tuesdays and Wednesdays, our regular in-office days at our Pleasanton headquarters. This rhythm gives you the tools, connection, and autonomy you need to make a real impact.

Role details

Job location

Tech stack

Job description

Requirements

Benefits & conditions

About the company

Apply for this position

Good distractions

Moments

Videos View all