Solution Architect - Agentic AI and Observability

Tata Consultancy Services Limited
Milford, United States of America
11 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior
Compensation
$ 182K

Job location

Milford, United States of America

Tech stack

API
Artificial Intelligence
Amazon Web Services (AWS)
Data analysis
Cloud Computing
DevOps
Information Technology Operations
Machine Learning
Reliability Engineering
Wide Area Networks
Datadog
Scripting (Bash/Python/Go/Ruby)
Large Language Models
Grafana
SolarWinds (Software)
Virtual Agents
Splunk
Dynatrace
Microservices

Job description

TCS Cloud Unit is looking for an experienced Observability and AI Solution Architect to design and implement enterprise-grade observability and AI solutions that provide deep visibility into infrastructure, applications, networks and transform IT Operations., * Provide observability strategies for infrastructure (servers, storage, cloud), applications (microservices, APIs), and networks (LAN/WAN, SD-WAN). Collaborate with DevOps, SRE, and IT operations teams to ensure end-to-end visibility and reliability.

  • Design and architect to deliver end-to-end AIOps and observability solutions, covering telemetry collection, ingestion, correlation, analytics, dashboards, and operational workflows.
  • Design and architect AIOps solutions using industry-leading platforms like OpenAI, AWS Bedrock, Google Gemini, Anthropic, and similar technologies. Develop predictive analytics and anomaly detection models to proactively identify and resolve operational issues.
  • Guidance to onshore and offshore solution teams through requirement understanding, solution creation, estimation, and defining operating model; support RFP solutioning with architecture, roadmap, and estimates.
  • Design and recommend integrations between monitoring/observability platforms and ITSM tools using APIs and service interfaces.
  • Integrate observability tools with ITSM platforms and automation workflows. Enable automated root cause analysis and remediation using AI/ML models. Define self-healing and runbook automation
  • Collaborate with business and IT teams to identify key metrics and integrate them into dashboards and alerting systems.
  • Establish observability standards, KPIs, and SLAs for performance and availability. Ensure compliance with security and regulatory requirements in monitoring solutions.

Requirements

This role requires expertise in leading observability platforms and hands-on experience in IT operations, combined with the ability to integrate AI-driven solutions for IT Operations (AIOps) using cutting-edge technologies such as LLMs, agentic frameworks, and industry-leading platforms like Anthropic, OpenAI, Bedrock, Gemini, and others. This role requires strong customer-facing capabilities, including architecture defense, RFP solutioning, and leadership of onshore and offshore solution and delivery teams., * 10+ years of experience in IT operations, managed services, infrastructure, cloud and application support or transformation roles with significant architecture responsibility.

  • Strong Hands-on experience implementing AIOps and observability solutions across multiple observability platforms (e.g., Grafana, Datadog, Splunk, Dynatrace, ScienceLogic, Solarwinds).
  • Strong experience in integration of monitoring, event management, and automation into ITSM platforms and dashboard developments
  • Strong experience in automation and orchestration driven through AI, including solutioning with scripting techniques
  • Experience in solution design using GenAI and agentic AI use cases in IT operations ex: automated triage, knowledge generation, and runbook generation, assisted AI.
  • Communication, offshore management and stakeholder management skills, present, influence, and defend technical solutions with customers and executive audiences.

Apply for this position