Enterprise Observability & AIOps Architect

Triunity Software
7 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English

Job location

Remote

Tech stack

API
Artificial Intelligence
Amazon Web Services (AWS)
Application Performance Management
Azure
Cloud Computing
Configuration Management Databases
Collaborative Software
Databases
Noise Reduction
Distributed Systems
Middleware
Monitoring of Systems
Log Analysis
Reliability Engineering
Systems Integration
Datadog
Cloud Monitoring
Mttr
HybridCloud
Kubernetes
Enterprise Integration
ArcSight Event Correlation
Splunk
New Relic (SaaS)
Dynatrace
Pagerduty
Legacy Systems
ServiceNow
ManageEngine
Microservices

Job description

Enterprise Observability Architecture

· Lead enterprise-wide observability assessments across applications, infrastructure, cloud, databases, and operational workflows.

· Define current-state and target-state observability architecture.

· Develop monitoring rationalization and consolidation strategies across enterprise toolsets.

· Establish standards for telemetry, tagging, service identity, alerting, dashboards, and governance.

· Define scalable operating models aligned to SRE, ITSM, and platform engineering practices.

Application Observability

· Architect observability solutions across:

APM | Distributed tracing | Logs & metrics | RUM & synthetics

· Define SLI/SLO-driven monitoring and alerting strategies.

· Improve service dependency visibility, transaction tracing, and telemetry quality.

· Design monitoring patterns for microservices, APIs, Kubernetes, Azure-native, and legacy applications.

Infrastructure & Platform Observability

· Design observability solutions for cloud infrastructure, middleware, databases, platform services, and batch ecosystems.

· Assess alert quality, duplication, routing inefficiencies, and monitoring overlaps.

· Define event correlation, severity models, enrichment standards, and operational ownership structures.

AIOps & Intelligent Operations

· Design AIOps capabilities including:

o Event correlation

o Noise reduction

o Intelligent alert prioritization

o Anomaly detection

o Predictive insights

o Root-cause contextualization

· Define AI-assisted operational workflows for incident reduction, MTTR optimization, and automated remediation.

ITSM & Operational Integration

· Integrate observability platforms with ServiceNow, incident workflows, CMDB, and collaboration tools.

· Define monitoring-to-incident operational workflows and governance standards.

· Establish KPI-driven operational maturity frameworks.

Governance & Blueprinting

· Develop enterprise standards, onboarding blueprints, engineering playbooks, and reusable observability patterns.

· Create reference architectures, dashboard standards, and operational governance frameworks.

· Define "Day-1 Observability" onboarding models for new services.

Requirements

We are seeking a highly experienced Enterprise Observability & AIOps Architect with 15+ years of experience in designing and modernizing enterprise-scale observability ecosystems across applications, infrastructure, cloud platforms, databases, integrations, and operational workflows.

The ideal candidate should possess strong expertise in:

· AIOps & Event Correlation

· ITSM Integration

· Telemetry Governance

· SRE & Operational Excellence

· Enterprise Monitoring Rationalization

· AI-driven Operational Transformation

This role requires both strategic architecture leadership and strong hands-on expertise across modern observability and AIOps platforms in large enterprise environments., · 15+ years of experience in observability, infrastructure, SRE, production operations, platform engineering, or AIOps architecture.

· Strong experience in enterprise-scale hybrid cloud and distributed environments.

· Proven experience leading observability transformation and monitoring rationalization initiatives.

· Experience working with executive leadership, enterprise architects, platform teams, and operations organizations.

· Strong understanding of enterprise operational workflows, incident management, and reliability engineering.

Required Technical Expertise

Observability Platforms

Strong hands-on expertise in:

Dynatrace | Azure Monitor | Azure Application Insights | Azure Log Analytics | LogicMonitor | ManageEngine

Preferred:

Splunk | ELK/OpenSearch | PrometheGrafana | Datadog | New Relic | BigPanda | PagerDuty

Core Skills

· Event correlation & alert engineering

· Distributed tracing & topology mapping

· AIOps & intelligent operations

· Cloud monitoring & telemetry

· Kubernetes & microservices observability

· ITIL / ITSM integration

· SRE principles & operational governance

Cloud & Platform Experience

Azure | AWS | Kubernetes | APIs & integrations | Middleware & distributed systems

Preferred Qualifications

· Experience defining enterprise observability standards and governance models.

· Experience with operational transformation initiatives involving AI/AIOps.

· Strong workshop facilitation, stakeholder management, and executive presentation skills.

· Certifications in Cloud, Observability, ITIL, SRE, or AIOps preferred.

Success Criteria

· Establish a unified enterprise observability architecture.

· Reduce alert noise and operational inefficiencies.

· Improve telemetry quality, service visibility, and incident response.

· Enable scalable AIOps-driven operational workflows.

· Deliver standardized onboarding, governance, and engineering blueprints.

· Improve operational maturity, reliability, and service resilience.

Apply for this position