Sr. Reliability Engineer, Digital Marketing

Skechers
Hawthorne, United States of America
yesterday

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Hawthorne, United States of America

Tech stack

JavaScript
API
Artificial Intelligence
Data analysis
Bash
Software as a Service
Customer Data Management
Data Validation
Information Engineering
Cursor (Graphical User Interface Elements)
Database Queries
Google Analytics
Monitoring of Systems
Python
Reliability Engineering
Software Tools
Runbook
Salesforce
Systems Integration
Web Analytics
Scripting (Bash/Python/Go/Ruby)
Data Ingestion
Information Technology
Performance Monitor
GPT

Job description

The Sr. Reliability Engineer, Digital Marketing is responsible for ensuring the reliability, observability, and continuous improvement of Skechers' global marketing technology ecosystem. This role owns end-to-end reliability across customer data, audiences, campaigns, journeys, loyalty events, and digital experience signals - building a more intelligent, resilient, and increasingly self-healing marketing stack.

Working closely with analytics, data engineering, and service provider teams, this engineer prevents issues before they impact customers, reduces operational toil through automation, and continuously monitors and validates the accuracy, completeness, and timeliness of customer data across marketing and loyalty platforms., * Own service reliability across the full marketing platforms flow, including data ingestion, identity resolution, audience creation, segmentation, activation, campaign and journey execution, triggered and scheduled communications, loyalty program events, and downstream measurement.

  • Ensure Salesforce Data Cloud audiences are accurate, timely, and operationally dependable, with strong controls around data freshness, segmentation quality, publish success, and downstream activation.
  • Ensure Salesforce Marketing Cloud campaigns, journeys, automations, and email operations execute as designed, with clear operational thresholds, monitoring, and recovery playbooks.
  • Ensure Salesforce Loyalty Management processes including member activity, accrual, redemption, promotions, and related integration points are reliable, traceable, and aligned with customer experience expectations.

Observability & Monitoring

  • Build and maintain observability across platform health, business process health, and customer-impact signals, including dashboards, alerts, trend reporting, and escalation paths.
  • Leverage Google Analytics and Quantum Metric to connect technical incidents to customer and business impact, including conversion degradation, journey drop-off, landing page friction, loyalty enrollment issues, and campaign experience problems.
  • Define and manage SLIs, SLOs, and operational thresholds for business-critical marketing services, with proactive detection of platform and data issues before they impact customers or campaigns.

Incident Management & Operational Readiness

  • Own end-to-end operational response for priority incidents, including detection, triage, severity assessment, stakeholder communication, vendor engagement, mitigation, recovery, and post-incident review - translating technical issues into clear business impact across audience activation, journeys, loyalty activity, deliverability, and digital experience.
  • Lead incident coordination including bridge calls, cross-functional alignment, vendor escalation, recovery communication, and closure messaging, with defined update cadences for marketing operations, CRM, loyalty, analytics, leadership, and Salesforce support.
  • Own release readiness and go/no-go support for all product changes, including risk assessment, dependency checks, and rollback readiness.
  • Maintain steady-state operational risk reporting covering platform health trends, recurring failure patterns, deliverability risks, and proactive recommendations.

Automation & Reliability Engineering

  • Design and implement automation that reduces manual work, speeds recovery, and enables safer scale, including AI-assisted alert enrichment, knowledge retrieval, incident summarization, runbook execution, and low-risk self-healing patterns under human oversight.
  • Define and maintain operational standards including runbooks, change controls, release readiness checks, and problem management processes for business-critical marketing services.
  • Drive adoption of reliability engineering best practices across delivery and marketing technology teams.

Cross-Functional Collaboration

  • Partner with marketing operations, CRM, data engineering, eCommerce, loyalty, analytics, and vendor teams to ensure reliability considerations are built into new initiatives from the start, serving as a reliability advocate during architecture design and solution reviews.
  • Collaborate with data engineering on proactive monitoring and validation of data accuracy, completeness, timeliness, and consistency across ingestion, identity resolution, transformation, and activation layers.
  • Serve as the primary engineering partner for Salesforce Signature Success, incorporating Proactive Monitoring alerts and recommendations, managing escalations, and converting vendor insights into permanent improvements.
  • Participate in a global support and escalation model while continuously reducing after-hours operational load through better monitoring, smarter automation, and stronger engineering discipline., * Hands-on experience supporting complex SaaS platforms in production, ideally including Salesforce Data Cloud, Salesforce Marketing Cloud, and/or enterprise CRM or marketing technology platforms with high business criticality.

Requirements

  • Strong understanding of customer data flows, segmentation, audience activation, marketing journeys, campaign operations, and loyalty-related business processes.
  • Experience with Google Analytics, Quantum Metric, or similar digital analytics platforms used to diagnose customer and business impact.
  • Strong troubleshooting skills across data, integrations, APIs, workflows, and application behavior, with hands-on experience building and operating monitoring, alerting, dashboards, runbooks, and incident management processes.
  • Strong SQL skills and working knowledge of scripting or automation languages such as Python, JavaScript, or Bash, with experience leveraging AI-assisted engineering tools such as Claude Code, ChatGPT Codex, or Cursor to improve operational efficiency and automation.
  • Strong understanding of email deliverability, including operational drivers of inbox placement, sender health, and remediation practices.
  • Ability to communicate clearly with both technical and business stakeholders during normal operations and high-severity incidents, with a demonstrated ability to identify repetitive manual work and replace it with durable engineering solutions., * Bachelor's degree in Computer Science, Engineering, or related field, or equivalent experience.
  • 7+ years of experience in reliability engineering, site reliability engineering, platform engineering, production engineering, application support engineering, or marketing technology operations.
  • This is a hybrid role based in Manhattan Beach, CA, requiring a minimum of 3 days onsite per week.

About the company

Headquartered in Southern California, Skechers-the Comfort Technology Company®-has spent over 30 years helping men, women, and kids everywhere look and feel good. Comfort innovation is at the core of everything we do, driving the development of stylish, high-quality products at a great value. From our diverse footwear collections to our expanding range of apparel and accessories, Skechers is a complete lifestyle brand.

Apply for this position