Senior Engineer, AIOps

Royal Caribbean International
Miami, United States of America
4 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Miami, United States of America

Tech stack

API
Artificial Intelligence
Azure
Cloud Engineering
Computer Networks
Continuous Integration
Data Infrastructure
Data Systems
DevOps
Data Intelligence
Python
Key Management
Role-Based Access Control
Reliability Engineering
Search Technologies
SQL Databases
Enterprise Data Management
Cloud Platform System
Azure
Cloud Monitoring
System Availability
Large Language Models
Generative AI
Data Lake
AI Platforms
Information Technology
Data Analytics
Data Management
Machine Learning Operations
Cloud Optimization
ServiceNow
Databricks

Job description

The Royal Caribbean Group's AI & Analytics Team has an exciting career opportunity for a full time Senior Engineer, AIOps reporting to the Senior Manager, Data Intelligence Operations ., The Senior Engineer, AIOps serves as a technical anchor for the reliability, scalability, and continuous improvement of Royal Caribbean Group's enterprise AI, Generative AI (GenAI), and modern data platforms. This senior-level role leads incident response, drives operational maturity, mentors junior team members, and partners with platform engineering and data science teams to shape how AI and data systems are built, deployed, and maintained at scale. The ideal candidate brings deep expertise in Microsoft Azure and Databricks, strong command of LLM and GenAI tooling, and the judgment to make sound architectural and operational decisions independently., * Leads the operational health and reliability of enterprise AI, GenAI, and data platforms, ensuring high availability and performance.

  • Serves as the senior technical escalation point for L2/L3 production issues across AI and GenAI-enabled applications, including LLM-based services and RAG pipelines.
  • Designs and owns observability strategies for AI platform health, covering availability, latency, throughput, cost attribution, and model behavior drift.
  • Leads root cause analysis for complex AI inference failures and drives permanent remediation across engineering and product teams.
  • Evaluates, onboards, and operationalizes new GenAI capabilities, including Azure OpenAI Service, Foundation Model APIs, and vector store solutions.
  • Defines operational standards, SLAs, and runbooks for AI platform services, championing a proactive operations culture.
  • Builds and operates AIOps pipelines that leverage GenAI to analyze incidents, identify failure causes, and recommend remediation actions.
  • Integrates AIOps insights into CI/CD pipelines, validating deployments against known failure patterns and implementing AI-driven quality gates.
  • Owns the operational health of enterprise data platforms built on Azure and Databricks, including governance, table management, and job orchestration.
  • Leads cloud cost governance efforts for Databricks and Azure services, partnering with FinOps to optimize spend.
  • Enforces and continuously improves platform security posture, including RBAC, managed identity, network policies, and secrets management.
  • Leads major incident response for platform outages, produces high-quality RCAs, and drives post-incident improvements.
  • Mentors and guides junior engineers, contributing to hiring, onboarding, and skills development within the AI Ops team.

Requirements

  • Bachelor's degree in Computer Science, Engineering, or related field required; Master's degree preferred.

  • 7+ years of experience in platform operations, cloud engineering, AI/data platform support, or site reliability engineering in enterprise environments.

  • Deep hands-on experience with Microsoft Azure, including Azure OpenAI Service, Azure AI Search, Azure Data Factory, Azure Monitor, and related data and AI services.

  • Expert-level experience with Databricks, including Unity Catalog administration, cluster and pool management, Delta Lake operations, and job orchestration at scale.

  • Strong command of LLM and GenAI concepts, including inference architectures, RAG pipelines, embeddings, vector databases, and model serving patterns.

  • Proficiency in Python and SQL, with experience automating operational tasks and reviewing pipeline and application code.

  • Demonstrated ability to lead incident response independently, produce high-quality RCAs, and drive cross-functional remediation.

  • Experience with ITSM platforms (ServiceNow preferred) and formal incident and change management processes.

  • Strong communication skills, able to translate complex technical issues into clear, actionable updates for both technical and non-technical stakeholders.

  • Expertise in AI and data platform operations, observability, and incident management.

  • Proficiency in cloud cost optimization and FinOps practices.

  • Experience with CI/CD pipelines, DevOps practices, and automation tools.

  • Strong understanding of platform security, governance, and compliance requirements.

  • Demonstrated ability to mentor and guide junior engineers.

  • Strong organizational, analytical, and problem-solving skills.

  • Ability to foster a culture of operational excellence and continuous improvement.

  • Effective collaborator with cross-functional teams and external partners.

About the company

Journey with us! Combine your career goals and sense of adventure by joining our exciting team of employees. Royal Caribbean Group is pleased to offer a competitive compensation and benefits package, and excellent career development opportunities, each offering unique ways to explore the world. We are proud to be the vacation-industry leader with global brands - including Royal Caribbean International, Celebrity Cruises and Silversea Cruises - the most innovative fleet and private destinations, and the best people. Together, we are dedicated to turning the vacation of a lifetime into a lifetime of vacations for our guests., It is the policy of the Company to ensure equal employment and promotion opportunity to qualified candidates without discrimination or harassment on the basis of race, color, religion, sex, age, national origin, disability, sexual orientation, sexuality, gender identity or expression, marital status, or any other characteristic protected by law. Royal Caribbean Group and each of its subsidiaries prohibit and will not tolerate discrimination or harassment.

Apply for this position