Platform Engineer

Blackhawk Network
Pleasanton, United States of America
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Junior
Compensation
$ 85K

Job location

Remote
Pleasanton, United States of America

Tech stack

Artificial Intelligence
Amazon Web Services (AWS)
Bash
Cloud Computing
Continuous Integration
Cursor (Graphical User Interface Elements)
Linux
DevOps
Distributed Systems
Monitoring of Systems
Python
Operational Data Store
Reliability Engineering
Software Tools
Prometheus
Software Engineering
Scripting (Bash/Python/Go/Ruby)
Cloud Platform System
GitHub Copilot
System Availability
Grafana
GIT
Kubernetes
Information Technology
Splunk
New Relic (SaaS)
Docker
Jenkins
ServiceNow

Job description

You'll split your time between engineering solutions and operating our production platforms-maintaining the health of BHN's production services while building the automation, observability, and AI-driven capabilities that make incidents less frequent, easier to diagnose, and faster to resolve.

As part of the OCC, you'll play an active role in Major Incident Management, partnering with engineering teams to diagnose and restore production services during critical incidents. Outside of incident response, you'll build dashboards, improve monitoring, develop automation, analyse operational data, and engineer intelligent tooling that continuously improves platform reliability.

This role provides exceptional exposure to large-scale distributed systems, cloud infrastructure, Kubernetes, CI/CD, observability platforms, automation, AI-assisted software development, and production engineering.

If you're naturally curious, enjoy solving complex technical problems, and want to accelerate your engineering career, we'd love to hear from you.

Responsibilities:

Major Incident Response & Production Operations

  • Participate in the 24×7 on-call rotation supporting BHN's production platforms.
  • Monitor production health using modern observability platforms.
  • Lead or support Major Incident bridges, coordinating technical teams during high-severity production incidents.
  • Perform technical triage, identify probable causes, and drive rapid service restoration.
  • Communicate clearly with engineers, leadership, and business stakeholders throughout incidents.
  • Lead post-incident reviews focused on learning and continuous improvement.
  • Identify recurring operational pain points and engineer permanent solutions.

Platform Engineering & Automation

  • Develop automation that reduces manual operational effort.
  • Build internal engineering tools that improve developer productivity and platform reliability.
  • Create dashboards, alerts, health scores, and operational insights.
  • Improve CI/CD pipelines and deployment safety.
  • Automate operational workflows and repetitive tasks.
  • Build self-service capabilities for engineering teams.
  • Develop auto-remediation and self-healing capabilities.
  • Continuously improve platform reliability through engineering rather than manual intervention.

Observability & Reliability Engineering

  • Design alerts that detect customer-impacting issues early while minimising alert fatigue.
  • Improve platform visibility through metrics, logs, traces, and dashboards.
  • Analyse production behaviour to identify reliability improvements.
  • Develop operational KPIs and engineering health metrics.
  • Define and measure Service Level Indicators (SLIs) and Service Level Objectives (SLOs).
  • Use operational data to drive engineering decisions and improve platform resilience.

AI Engineering & Intelligent Operations

  • Use AI-assisted software development tools to improve engineering productivity.
  • Develop AI-powered incident summarisation and communication capabilities.
  • Build intelligent root cause analysis and diagnostic tooling.
  • Create operational copilots and engineering assistants.
  • Enhance alerts with contextual intelligence.
  • Automate diagnostics and operational workflows.
  • Build AI-driven orchestration and auto-remediation capabilities.
  • Develop engineering knowledge systems that improve troubleshooting and accelerate learning.

You'll Gain Experience Across

  • Cloud Infrastructure (AWS)
  • Kubernetes
  • CI/CD Engineering
  • Infrastructure as Code
  • Observability & Monitoring
  • Major Incident Management
  • Production Operations
  • Reliability Engineering (SRE)
  • Automation Engineering
  • AI Engineering
  • Large-scale Distributed Systems, * Kubernetes
  • Docker
  • Jenkins
  • Splunk
  • New Relic
  • Prometheus
  • Grafana
  • OpenTelemetry
  • ServiceNow
  • Python automation
  • Infrastructure as Code (Terraform, CloudFormation, etc.)
  • CI/CD engineering
  • Distributed systems
  • FinTech, payments, or other high-availability production environments

We seek candidates who not only demonstrate curiosity and adaptability in emerging technologies but have also successfully implemented and utilized AI tools to enhance their work, improve processes, or deliver measurable results. Our teams embrace continuous learning and the thoughtful integration of AI to create meaningful impact - for our employees and the future of work.

Requirements

Our Operations Command Centre (OCC) is looking for a Platform Engineer with strong technical foundations, exceptional problem-solving ability, and a passion for building reliable systems., * Bachelor's degree in Computer Science, Engineering, or equivalent practical experience.

  • Experience in Platform Engineering, DevOps, Site Reliability Engineering (SRE), Infrastructure Engineering, Technical Operations, or a similar technical role.
  • Strong Linux fundamentals.
  • Experience with AWS or another major cloud platform.
  • Experience with Git and modern software development workflows.
  • Basic scripting experience using Python, Bash, or a similar language.
  • Strong analytical and troubleshooting skills, with exposure to production support or Major Incident Management.
  • Excellent written and verbal communication skills.
  • Strong ownership mindset with a passion for continuous improvement.
  • Experience using modern AI engineering tools such as GitHub Copilot, Cursor, Claude, or similar AI-assisted development platforms.

Benefits & conditions

Non-exempt, Hourly Rate for California Residents Only: USD $40.67/Hr

Non-exempt, Hourly Rate for Illinois Residents Only: USD $32.09/Hr

Pay is based on several factors including but not limited to education, work experience, certifications, etc. In addition to your salary, Blackhawk Network offers benefits including 401k with employer match, medical, dental, vision, 12 paid holidays throughout the year, sick pay accrual according to state law, parental leave, life insurance, disability insurance, accident and illness insurance, health and dependent care flexible spending accounts, wellness benefits, and paid time off for all full-time employees.

About the company

Today, through BHN's single global platform, businesses of all kinds can tap into the world's largest network of branded payment solutions. BHN helps businesses grow revenue, increase loyalty, motivate and reward their teams, disburse funds and engage consumers. Branded payment solutions include the issuance and distribution of gift cards, egifts, corporate payouts and rewards, along with the technology to deliver these products in seamless, integrated ways. BHN's network spans the globe with more than 400,000 consumer touchpoints. Learn more at BHN.com. Hybrid flexibility: At Blackhawk Network, you'll enjoy the best of both worlds-focused remote work plus in-person collaboration on Tuesdays and Wednesdays, our regular in-office days at our Pleasanton headquarters. This rhythm gives you the tools, connection, and autonomy you need to make a real impact.

Apply for this position