Gen AI Engineer - Dallas/Tampa 991765
Role details
Job location
Tech stack
Job description
AI/ML & Observability Engineering
- Design, build, and deploy AI/ML models for anomaly detection across telemetry data (logs, metrics, traces, KPIs)
- Translate earlystage use cases into generalized, reusable observability solutions
- Modify and extend models to support multiple applications and teams
- Apply ML techniques to predict system anomalies before production impact
Telemetry & System Monitoring
- Analyze and correlate logs, metrics, traces, and system KPIs
- Identify early warning signals of instability or degradation
- Build dashboards and alerts using observability platforms
Collaboration & Strategy
- Work closely with Infrastructure, SRE, Developers, and Architects
- Contribute to enterprise observability strategy
- Act as a subject matter expert for AIdriven observability
- Operate independently within a small, highimpact team
Automation & Cloud
- Develop automation to support endtoend observability workflows
- Deploy solutions in cloud environments
- Leverage OpenTelemetry standards for instrumentation and data collection
Requirements
We are seeking a Senior AI/ML Observability Engineer to join a strategic observability team focused on building reusable, enterprisewide anomaly detection solutions. This role blends handson AI/ML engineering, observability expertise, and automation to proactively detect system issues and improve production reliability.
The ideal candidate has strong Python-based ML experience, a solid grasp of observability principles (logs, metrics, traces), and has worked closely with Infrastructure, SRE, and Engineering teams to implement scalable observability solutions across complex systems.
This is a senior individual contributor role requiring independence, initiative, and subjectmatter expertise., * 6+ years of experience in AI/ML engineering, SRE, or observabilityfocused roles
- Strong expertise in Python for data processing and ML development
- Handson experience building ML models for anomaly detection
- Solid understanding of observability principles (logs, metrics, traces)
- Experience withobservability tools such as:
- Grafana (preferred)
- Splunk
- Dynatrace
- Familiarity with OpenTelemetry
- Strong automation skills (pipelines, workflows, reusable components)
- Experience working in cloud environments
- Excellent problemsolving and communication skills
Preferred Qualifications
- Experience designing predictive models for system reliability
- Background supporting production systems in largescale environments
- Experience building reusable ML platforms or shared services
- Exposure to enterprisewide monitoring or observability programs, * Seniorlevel, handson engineer
- Strong ownership mindset; able to drive work endtoend
- Comfortable operating with limited supervision
- Strategic thinker with pragmatic execution skills
- Passionate about reliability, automation, and proactive problem detection