AI Integration Engineer (Java + AI)

OpenKyber LLC
4 days ago

Role details

Contract type
Temporary to permanent
Employment type
Full-time (> 32 hours)
Working hours
Shift work
Languages
English
Experience level
Senior
Compensation
$ 137K

Job location

Tech stack

Java
JavaScript
Artificial Intelligence
Amazon Web Services (AWS)
Business Analytics Applications
Application Integration Architecture
Systems Engineering
CA Workload Automation Ae
Azure
Bootstrap
Cluster Analysis
Computer Programming
Continuous Integration
IBM DB2
Database Queries
Database Schema
Linux
Java Platform Enterprise Edition (J2EE)
Hadoop
Tivoli Management Framework
Python
Machine Learning
Microsoft SQL Server
MongoDB
Openshift
Oracle
Reliability Engineering
Site Reliability Engineering Practices
Cloud Services
Ansible
Prometheus
Cloudera
Software Engineering
SonarQube
Teradata
Unstructured Data
Data Logging
Google Cloud Platform
React
System Availability
Grafana
Spark
Model Validation
Gitlab
Kubernetes
Kafka
REST
Splunk
Appdynamics
Jenkins
ServiceNow
Artifactory
Microservices

Job description

We are seeking a highly skilled Senior Site Reliability Engineer (SRE) to support key Shared Services Operations Technology platforms, including Payment Evaluations, Regulatory Operations, Financial Crimes, and Business & Real Estate Evaluation. You will be part of a team responsible for maintaining availability, performance, and reliability across ~85 applications that support KYC, AML, and other critical financial-crimes-related workloads. This role blends software engineering , systems operations , and cloud-native reliability practices to drive automation, enhance resilience, and support modernization across a large enterprise ecosystem. You will also help evolve AIOps capabilities, including predictive alerting, self-healing workflows, and AI/ML-driven incident analysis. Some occasional weekend work or overtime may be required for critical system support.

What You'll Do

  • Site Reliability & Operations Lead SRE practices that enhance system availability, performance, and scalability across multi-cloud environments.
  • Support and improve critical applications and customer journeys; lead incident response and blameless postmortems.
  • Conduct root-cause analysis and drive long-term remediation of recurrent issues.
  • Define and enforce operational readiness and Non-Functional Requirements (NFRs) during platform modernization.

Automation & Tooling

  • Design and implement automation to eliminate operational toil and improve service reliability.
  • Build frameworks for automated SLO/SLI tracking, availability metrics, error budgeting, and customer impact analysis.
  • Implement self-healing and autonomic systems using AI/ML, RPA, and intelligent monitoring.

Monitoring, Observability & AIOps

  • Develop and enhance monitoring, alerting, and observability capabilities.
  • Drive adoption of AIOps platforms to support anomaly detection, predictive alerting, and automated incident resolution.

Collaboration & Leadership

  • Collaborate with platform teams, product owners, and technology partners across the COO Technology organization.
  • Mentor peers and champion SRE best practices across engineering teams.
  • Identify process gaps across domains and recommend scalable, long-term improvements.

Requirements

  • 5+ years in Systems Engineering, Site Reliability Engineering, Technology Architecture, or related fields (or equivalent military/training/education experience).
  • 2+ years performing as part of an SRE team.
  • Strong written and verbal communication skills.

Technical Skills

  • Software Development Proficiency in Python and/or Java/J2EE .
  • Experience with REST APIs , microservices , Kafka/MQ , and modern integration patterns.
  • Familiarity with JavaScript frameworks (React, Bootstrap).
  • Strong SQL skills and database schema design experience.
  • Infrastructure & Cloud Expertise with Linux and container orchestration ( Kubernetes , OpenShift/OCP strongly preferred).
  • Experience with PCF, AWS, Google Cloud Platform, or Azure environments.
  • CI/CD & Automation Tools: Jenkins , GitLab , SonarQube , Artifactory , Ansible .
  • Observability & AIOps Tools: Grafana , Prometheus , Splunk/ELK , AppDynamics , Elastic , ThousandEyes , Aternity , Google Cloud Logging .
  • AIOps Platforms: Moogsoft , AI/ML-based analytics frameworks.
  • Operations & Data ITSM Tools: ServiceNow , Remedy , IBM Netcool .
  • Databases: Oracle , DB2 , SQL Server , MongoDB , Hadoop/Cloudera , Spark , Teradata .
  • Foundational AI Knowledge Understanding of common AI/ML concepts (classification, regression, clustering, anomaly detection).
  • Ability to work with structured/unstructured data for model evaluation.
  • Awareness of ethical/operational considerations in AI systems.
  • Experience integrating AI into automation workflows is a plus., * Experience with AutoSys .
  • Prior experience in corporate banking or financial services.
  • Strong interest in AI-driven operations and AIOps.

Apply for this position