Software Engineering Manager 1 - Streaming & Cloud Platform Reliability

Hewlett-Packard Enterprise
Cupertino, United States of America
6 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Intermediate
Compensation
$ 315K

Job location

Cupertino, United States of America

Tech stack

Java
API
Agile Methodologies
Artificial Intelligence
Airflow
Amazon Web Services (AWS)
Amazon Web Services (AWS)
Data analysis
Big Data
Continuous Integration
Data Control
ETL
Software Debugging
DevOps
Distributed Data Store
Distributed Systems
DNS
Elasticsearch
Python
PostgreSQL
Scrum
Redis
Regression Testing
Prometheus
Service Design
Software Engineering
Data Streaming
Management of Software Versions
WebSocket
SSL Certificate Management
Data Logging
Cloud Platform System
Datastax
Snowflake
Grafana
Spark
Multi-Cloud
Backend
Event Driven Architecture
Integration Tests
Kubernetes
Apache Flink
Cassandra
Production Code
Kafka
Cloud Migration
Kibana
REST
Webhooks
Microservices

Job description

We're looking for a hands-on Software Engineering Manager to lead a small team (2-4 developers) focused on improving the reliability of Mist's cloud platform by driving concrete postmortem action items from our incident management process.

This team owns follow-ups from production incidents-especially those involving our streaming data pipelines (Kafka / Flink / Storm) and core APIs. You'll work closely with senior engineers to turn incident learnings into durable engineering improvements.

This is a hybrid role requiring on-site collaboration multiple days per week in Cupertino, California. Due to the requirements of this position, this role requires a US Citizen or Green Card holder.

What You'll Do

  • Own and drive post-incident follow-ups from our Incident Management process, turning incident reports into design and implementation work.
  • Lead, mentor, and grow a 2-4 person engineering team, while contributing hands-on code in production services.
  • Design, implement, and harden streaming topologies using Kafka, Storm, and/or Flink (e.g., stats, telemetry, alerts, pcaps).
  • Improve reliability of core APIs (REST API, WebSocket, Webhooks, etc.), including auth, rate limiting, and DR-sensitive flows.
  • Enhance observability and runbooks: add metrics/alerts, define SLOs, and codify playbooks for recurring incident patterns.
  • Collaborate with SRE, Platform, and Data teams on DR, multi-region, and multi-cloud behavior (AWS, GCP, DR regions).
  • Ensure robust testing and deployment practices (unit/integration tests, regression tests for past incidents, safe rollout/rollback)., * Direct, visible impact on the stability and reliability of Mist's cloud platform and AI-driven networking products.
  • A focused charter with real, concrete backlogs driven by incidents-not vague "platform work."
  • Close collaboration with strong senior engineers and SREs, with room to shape both technical direction and team culture.

Additional Skills: Accountability, Accountability, Action Planning, Active Learning, Active Listening, Agile Methodology, Agile Scrum Development, Analytical Thinking, Bias, Coaching, Creativity, Critical Thinking, Cross-Functional Teamwork, Data Analysis Management, Data Collection Management (Inactive), Data Controls, Design, Design Thinking, Empathy, Follow-Through, Group Problem Solving, Growth Mindset, Intellectual Curiosity (Inactive), Long Term Planning, Managing Ambiguity {+ 5 more}

What We Can Offer You:

Health & Wellbeing

We strive to provide our team members and their loved ones with a comprehensive suite of benefits that supports their physical, financial and emotional wellbeing.

Requirements

  • 7+ years total professional software engineering experience.
  • This is a hybrid role requiring on-site collaboration multiple days per week in Cupertino, California. Due to the requirements of this position, this role requires a US Citizen or Green Card holder.
  • 2+ years in a team lead role (mentors, performance feedback, prioritization), while remaining hands-on technically.
  • 5+ years building backend or distributed systems in Python, Go, or Java proficiency in at least one of these languages to lead design reviews and contribute production code.
  • 3+ years designing, implementing, and operating distributed, event-driven systems using:
  • Kafka and at least one of Flink or Storm, or a comparable streaming framework.
  • 3+ years building and operating RESTful APIs (service design, auth, rate limiting, client IP handling, versioning).
  • 3+ years working with cloud-native infrastructure:
  • Kubernetes, containerized microservices, CI/CD pipelines.
  • 3+ years with production datastores such as:
  • Redis, Postgres, Cassandra/Datastax, S3/GCS, or similar distributed storage systems.
  • 2+ years directly involved in production incident response:
  • On-call participation, postmortems, and driving remediation work through to completion.
  • Proven ability to debug latency, throughput, data correctness, and availability issues in streaming pipelines and/or APIs.
  • Experience adding or improving metrics, logging, tracing, and alerts for production services.

Preferred Qualifications

  • 2+ years working with big-data / analytics or ETL systems (e.g., Apache Spark, Airflow, Snowflake, or similar).

  • Experience with webhook or event-delivery systems (idempotency, retries, ordering, DLQs).

  • Exposure to multi-region / DR design: cross-cloud migrations, DNS and certificate management, environment-driven configuration.

  • Familiarity with DevOps practices, CI/CD automation, and service ownership.

  • Experience with observability stacks such as Prometheus, Grafana, Kibana/Elasticsearch.

Benefits & conditions

"The expected salary/wage range for this position is provided below. Actual offer may vary from this range based upon geographic location, work experience, education/training, and/or skill level.

  • United States of America: Annual Salary USD 155,500 - 315,000 in California The listed salary range reflects base salary. Variable incentives may also be offered."

About the company

Hewlett Packard Enterprise is the global edge-to-cloud company advancing the way people live and work. We help companies connect, protect, analyze, and act on their data and applications wherever they live, from edge to cloud, so they can turn insights into outcomes at the speed required to thrive in today's complex world. Our culture thrives on finding new and better ways to accelerate what's next. We know varied backgrounds are valued and succeed here. We have the flexibility to manage our work and personal needs. We make bold moves, together, and are a force for good. If you are looking to stretch and grow your career our culture will embrace you. Open up opportunities with HPE., HPE will comply with all applicable laws related to employer use of arrest and conviction records, including laws requiring employers to consider for employment qualified applicants with criminal histories.

Apply for this position