Senior AI/ML Observability Engineer

Dexian DISYS
Tampa, United States of America
4 days ago

Role details

Contract type
Temporary to permanent
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Tampa, United States of America

Tech stack

Artificial Intelligence
Monitoring of Systems
Python
Machine Learning
Azure
Cloud Platform System
Grafana
Reliability of Systems
Splunk
Dynatrace

Job description

  • Design, build, and deploy AI/ML models for anomaly detection across telemetry data (logs, metrics, traces, KPIs)
  • Translate early stage use cases into generalized, reusable observability solutions
  • Modify and extend models to support multiple applications and teams
  • Apply ML techniques to predict system anomalies before production impact

Telemetry & System Monitoring

  • Analyze and correlate logs, metrics, traces, and system KPIs
  • Identify early warning signals of instability or degradation
  • Build dashboards and alerts using observability platforms

Collaboration & Strategy

  • Work closely with Infrastructure, SRE, Developers, and Architects
  • Contribute to enterprise observability strategy
  • Act as a subject matter expert for AI driven observability
  • Operate independently within a small, high impact team

Requirements

We are seeking a Senior AI/ML Observability Engineer to join a strategic observability team focused on building reusable, enterprise wide anomaly detection solutions. This role blends hands on AI/ML engineering, observability expertise, and automation to proactively detect system issues and improve production reliability.

The ideal candidate has strong Python-based ML experience, a solid grasp of observability principles (logs, metrics, traces), and has worked closely with Infrastructure, SRE, and Engineering teams to implement scalable observability solutions across complex systems.

This is a senior individual contributor role requiring independence, initiative, and subject matter expertise., * 6+ years of experience in AI/ML engineering, SRE, or observability focused roles

  • Strong expertise in Python for data processing and ML development
  • Hands on experience building ML models for anomaly detection
  • Solid understanding of observability principles (logs, metrics, traces)
  • Experience withobservability tools such as:
  • Grafana (preferred)
  • Splunk
  • Dynatrace
  • Familiarity with OpenTelemetry
  • Strong automation skills (pipelines, workflows, reusable components)
  • Experience working in cloud environments
  • Excellent problem solving and communication skills, * Experience designing predictive models for system reliability
  • Background supporting production systems in large scale environments
  • Experience building reusable ML platforms or shared services
  • Exposure to enterprise wide monitoring or observability programs, * Senior level, hands on engineer
  • Strong ownership mindset; able to drive work end to end
  • Comfortable operating with limited supervision
  • Strategic thinker with pragmatic execution skills
  • Passionate about reliability, automation, and proactive problem detection

About the company

Dexian stands at the forefront of Talent + Technology solutions with a presence spanning more than 70 locations worldwide and a team exceeding 10,000 professionals. As one of the largest technology and professional staffing companies and one of the largest minority-owned staffing companies in the United States, Dexian combines over 30 years of industry expertise with cutting-edge technologies to deliver comprehensive global services and support.

Apply for this position