Slack Proactive Monitoring Engineer
Role details
Job location
Tech stack
Job description
- Continuously monitor dashboards, alerting systems, and telemetry data (error rates, latency spikes, API failures, deployment anomalies) for early signals of degradation.
- Triage and correlate alerts from multiple sources (Splunk, internal tools, etc) to identify patterns before customers report issues.
- Actively monitor Slack platform health dashboards, network latency signals, message delivery queues, and database capacities for high-frequency workspaces.
- Monitor critical custom automations, Slack Workflow Builder runs, Enterprise Key Management (EKM) operations, and Identity Provider (IDP) authentication syncs.
- Identify customers potentially affected by degraded service conditions and coordinate proactive outreach with Customer Success and Support teams.
- Partner with the Incident Management team to escalate signals that meet incident-threshold criteria.
- Technical Advisory: Partner with Customer Success Managers and Success Architects to deliver annual technical health check reviews, assessing platform metrics, configuration limits, and custom integration health.
- Perform root cause analysis (RCA) on proactively detected issues, documenting findings in internal case and incident management systems.
- Work closely with Engineering and SRE teams to drive rapid remediation of identified issues
- Intervene in low-risk system exceptions (e.g., advising clients on misconfigured Slack Webhooks, API rate limit exhaustion, or broken Salesforce-Slack app connections) before they trigger widespread downtime.
- Build and maintain Slack-based automations and workflows to streamline proactive monitoring operations.
Requirements
- 2+ years of experience in technical support, site reliability engineering, or a related operations role.
- Hands-on experience with observability and monitoring tools (e.g., Grafana, Splunk, Datadog, PagerDuty, or equivalent).
- Strong understanding of cloud-based SaaS architecture, APIs, and common failure modes.
- Proficiency in reading and analyzing logs, metrics, and traces.
- Excellent written and verbal communication skills; ability to clearly convey technical findings to both technical and non-technical audiences.
- Demonstrated ability to leverage modern AI tools to optimize workflows, conduct research, and enhance daily productivity.
Preferred Requirements:
- Experience working with Slack platform (Slack API, Slack workflows, Bolt framework).
- Familiarity with Salesforce Service Cloud / OrgCS case management.
- Scripting or automation experience (Python, JavaScript, Bash).
- Experience in a customer-facing support engineering or reliability role at a SaaS company.
- ITIL, SRE, or similar certification.
Benefits & conditions
benefits, training, assessment of job performance, discipline, termination, and everything in between. Recruiting, hiring, and promotion decisions at Salesforce are fair and based on merit. The same goes for compensation, benefits, promotions, transfers, reduction in workforce, recall, training, and education.
In the United States, compensation offered will be determined by factors such as location, job level, job-related knowledge, skills, and experience. Certain roles may be eligible for incentive compensation, equity, and benefits. Salesforce offers a variety of benefits to help you live well including: time off programs, medical, dental, vision, mental health support, paid parental leave, life and disability insurance, 401(k), and an employee stock purchasing program. More details about company benefits can be found at the following link: https://www.salesforcebenefits.com.
At Salesforce, we believe in equitable compensation practices that reflect the dynamic nature of labor markets across various regions. The typical base salary range for this position is $75,000 - $113,500 annually. The range represents base salary only, and does not include company bonus, incentive for sales roles, equity or benefits, as applicable.