Senior AI/ML Observability Engineer

Dexian DISYS

Tampa, United States of America

4 days ago

Role details

Contract type

Temporary to permanent

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Tampa, United States of America

Tech stack

Artificial Intelligence

Monitoring of Systems

Python

Machine Learning

Azure

Cloud Platform System

Grafana

Reliability of Systems

Splunk

Dynatrace

Job description

Design, build, and deploy AI/ML models for anomaly detection across telemetry data (logs, metrics, traces, KPIs)
Translate early stage use cases into generalized, reusable observability solutions
Modify and extend models to support multiple applications and teams
Apply ML techniques to predict system anomalies before production impact

Telemetry & System Monitoring

Analyze and correlate logs, metrics, traces, and system KPIs
Identify early warning signals of instability or degradation
Build dashboards and alerts using observability platforms

Collaboration & Strategy

Work closely with Infrastructure, SRE, Developers, and Architects
Contribute to enterprise observability strategy
Act as a subject matter expert for AI driven observability
Operate independently within a small, high impact team

Requirements

We are seeking a Senior AI/ML Observability Engineer to join a strategic observability team focused on building reusable, enterprise wide anomaly detection solutions. This role blends hands on AI/ML engineering, observability expertise, and automation to proactively detect system issues and improve production reliability.

The ideal candidate has strong Python-based ML experience, a solid grasp of observability principles (logs, metrics, traces), and has worked closely with Infrastructure, SRE, and Engineering teams to implement scalable observability solutions across complex systems.

This is a senior individual contributor role requiring independence, initiative, and subject matter expertise., * 6+ years of experience in AI/ML engineering, SRE, or observability focused roles

Strong expertise in Python for data processing and ML development
Hands on experience building ML models for anomaly detection
Solid understanding of observability principles (logs, metrics, traces)
Experience withobservability tools such as:

Grafana (preferred)
Splunk
Dynatrace

Familiarity with OpenTelemetry
Strong automation skills (pipelines, workflows, reusable components)
Experience working in cloud environments
Excellent problem solving and communication skills, * Experience designing predictive models for system reliability
Background supporting production systems in large scale environments
Experience building reusable ML platforms or shared services
Exposure to enterprise wide monitoring or observability programs, * Senior level, hands on engineer
Strong ownership mindset; able to drive work end to end
Comfortable operating with limited supervision
Strategic thinker with pragmatic execution skills
Passionate about reliability, automation, and proactive problem detection

About the company

Dexian stands at the forefront of Talent + Technology solutions with a presence spanning more than 70 locations worldwide and a team exceeding 10,000 professionals. As one of the largest technology and professional staffing companies and one of the largest minority-owned staffing companies in the United States, Dexian combines over 30 years of industry expertise with cutting-edge technologies to deliver comprehensive global services and support.