Dir Systems Engineering
Role details
Job location
Tech stack
Job description
Our Global Technology and Innovation team for Merchant Operations is looking for a Technology leader who will be accountable for day-to-day Production Operations. This leader is accountable for long-term reliability, scalability, and operability of the Merchant product portfolio through modern SRE and Operations Engineering practices, with a primary focus on reducing customer impact via improved detection and recovery (MTTD/MTTR). This leader will be responsible for shaping, delivering, and operating our cloud-native, platform-based Merchant services teams, leveraging SRE practices, and AI-driven operational intelligence to improve availability, security, scalability, and customer experience
This role will help evolve from traditional cloud operations to internal productized platforms, enabling engineering teams to deploy, operate, and scale services safely and efficiently in a regulated financial services environment.
Our ideal candidate will thrive in fast-paced environments, be action-oriented, results-driven with a focus on scalable processes and continuous improvement. You are passionate, with a strong work ethic, able to develop partnerships with both business and technical counterparts. You are comfortable leading and working as part of a geographically dispersed team, and cross functionally across a global Cloud Hosting organization. You have the ability to navigate when the path is not clear; and collaborate when faced with challenges; develop procedures and flows that are transparent, scalable, and that can be successfully implemented across many functions and locations.
Responsibilities:
- Be a strong people Leader - inspire, mentor, advocate for, and develop your team to drive change and innovation in partnership with other business and operations leaders
- Own service reliability outcomes for the Merchant portfolio including availability, MTTD, MTTR, and customer impact metrics. Establish and operationalize SLOs, SLIs, and detection SLOs in partnership with Product and Engineering
- Accountable for day-to-day Service Delivery of the Merchant SaaS Portfolio to our Customers at the highest levels of quality
- Lead proactive resilience and observability strategies leveraging unified telemetry, AI-driven anomaly detection, synthetic transactions, and end-to-end traceability
- Sponsor and scale AI-assisted operations including automated incident triage, diagnostics, correlation, and executive communication for the Merchant product portfolioStrengthen change governance, release controls, and UAT reliability across Merchant platforms to reduce change-driven incidents and customer impact
Requirements
Do you have experience in Vendor communication?, Do you have a Bachelor's degree?, * Bachelor's degree in Computer Science, Information Systems Management or related field; equivalent experience (5+ years); or an equivalent combination of education and experience
- Experience leading DevOps, SRE, or Platform Engineering teams in large-scale cloud environment with demonstrated success driving automation-first operational models in regulated or mission-critical systems
- Demonstrated experience owning SLOs that include detection metrics and reducing client-detected incidents at scale
- Experience scaling observability platforms and AI-assisted operational capabilities
- Proven ability to scale judgment through leaders and represent operational risk and trade-offs crisply at the executive level
- Proven track record of coaching, mentoring and managing a team with strong workload management and process development skills
- Excellent verbal and written communication skills. Ability to communicate, connect with and engage Executive stakeholders and team members at all levels both internally, and Customer facing
- Proven skills in the areas of budgeting, project structuring, vendor/partner management, staff structuring, and negotiations
- 15% travel which may be domestic or international. More travel may be required during initial on-boardingAbility to support Weekend and off-hours activities as required
Highly Desired:
- At least 5+ years leading mission critical applications and/or Platforms in Azure highly preferred, ideally in the Financial or Payments Industry space and related compliance activities
- Technical background with a proven ability to apply AI/ML-driven insights (AIOps) to incident management, capacity planning, anomaly detection, and operational optimization
- Experience architecting, building and maintaining Application CI/CD frameworks in a Financial Services setting
- Experience running SRE teams for a modern technology stack using cloud native technologies with a focus on improving systems availability, performance and resiliencyStrong advocate for DevOps culture, including shared ownership, automation, and continuous improvement across engineering and operations teams
Applicants must be currently authorized to work in the United States on a full-time basis. This position does not offer sponsorship for employment visa status or work permit now or in the future.
Benefits & conditions
I n return for your expertise, we offer opportunities for growth, career development, and a competitive compensation and benefits package-all within an innovative and collaborative work environment.