Platform Engineer

The Hershey Company
Hershey, United States of America
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Intermediate

Job location

Remote
Hershey, United States of America

Tech stack

API
Azure
Cloud Computing
Information Systems
Computer Programming
Continuous Delivery
Continuous Integration
Data Infrastructure
DevOps
Python
Key Management
SQL Databases
Management of Software Versions
Data Logging
Scripting (Bash/Python/Go/Ruby)
Azure
Mttr
Data Strategy
GIT
Core Data
Information Technology
Data Analytics
Machine Learning Operations
Terraform
Software Version Control
Databricks

Job description

The Platform Engineer, Data & Analytics Platforms runs and continuously improves the enterprise data and analytics platforms that power Hershey's data products. This role focuses on platform operations and enablement-standardizing environments, automating delivery, improving reliability and observability, and reducing time-to-value for Data Product teams across development, test, and production.

The Data Platform Engineer partners with Senior Data Engineers, Solution Architects, the Cloud COE, and Security to define guardrails and operating standards, deliver self-service tooling, and keep the platform cost-effective, secure, observable, reliable, and scalable.

What We Are Building for Hershey

This role supports Hershey's enterprise data strategy by operating and enabling a trusted, governed data platform at scale. The platform team turns one-off solutions into reusable templates, guardrails, and automated workflows-improving reliability, cost transparency, and developer experience so Data Product teams can deliver high-quality data products faster.

Major Duties & Responsibilities

  1. Data Platform Components
  • Own and operate core data platform components (with emphasis on Databricks and supporting Azure services) across development, test, and production.
  • Build and maintain CI/CD and environment standardization using Azure DevOps and infrastructure-as-code (e.g., Terraform) to improve consistency, security, and delivery speed.
  • Implement observability (logging, monitoring, alerting, dashboards) and maintain operational runbooks to enable proactive detection and faster recovery.
  • Implement identity/access controls, secrets management, and configuration standards in partnership with the Cloud COE and Security.
  • Plan and execute platform releases and upgrades (libraries, runtimes, clusters/pools) and coordinate change communications to minimize disruption for Data Product teams.
  1. Machine Learning Operations (MLOps)
  • Enable MLOps capabilities (e.g., MLflow standards, deployment patterns, automation) in partnership with Data Science and engineering teams.
  1. Governance, Quality & Operations
  • Implement governance, security, and compliance standards through platform guardrails (policies, templates, controls) and clear documentation.
  • Support FinOps by monitoring usage, identifying optimization opportunities (clusters, jobs, storage), and improving cost transparency (e.g., tagging and showback/chargeback inputs).
  • Monitor platform health, resolve incidents, perform root-cause analysis, and drive problem management to improve stability and meet agreed service levels.
  • Define, track, and report operational KPIs (availability, performance, deployment frequency, MTTR) and drive continuous improvement through automation and standardization.
  • Provide operational support during standard business hours, with planned maintenance windows and documented support processes (no on-call rotation).
  1. Collaboration Across Domains
  • Enable Data Product teams with self-service tooling, reusable patterns/templates, and onboarding/training; manage a clear intake and prioritization process; and partner on platform performance and operational readiness.

Requirements

  • Cloud & Platforms: Hands-on administration and operations for Databricks and Azure data platform services. Strong understanding of environment provisioning, secrets management, identity/access, and networking patterns. Infrastructure-as-code experience (e.g., Terraform) is strongly preferred.
  • Programming & Development: Proficient in Python and SQL for automation and troubleshooting; experience with modular coding, APIs, scripting, and source control (Git).
  • DevOps: Experience implementing CI/CD in Azure DevOps, managing releases, and improving deployment safety through testing, approvals, and consistent branching/versioning practices.
  • MLOps (nice to have): Familiarity with MLflow and operational patterns for model lifecycle management.
  • Operations & Observability: Experience implementing monitoring/alerting and using operational metrics to drive reliability improvements; familiarity with incident management and root-cause analysis.
  • Collaboration & Communication: Communicate best practices and technical solutions effectively across teams., * Bachelor's degree in Computer Science, Engineering, Information Systems, Data Science, or related field
  • 2-5 years in platform engineering roles.

Apply for this position