AI Integration Engineer (Java + AI)

OpenKyber LLC

4 days ago

Role details

Contract type

Temporary to permanent

Employment type

Full-time (> 32 hours)

Working hours

Shift work

Languages

English

Experience level

Senior

Compensation

$ 137K

Job location

Tech stack

Java

JavaScript

Artificial Intelligence

Amazon Web Services (AWS)

Business Analytics Applications

Application Integration Architecture

Systems Engineering

CA Workload Automation Ae

Azure

Bootstrap

Cluster Analysis

Computer Programming

Continuous Integration

IBM DB2

Database Queries

Database Schema

Linux

Java Platform Enterprise Edition (J2EE)

Hadoop

Tivoli Management Framework

Python

Machine Learning

Microsoft SQL Server

MongoDB

Openshift

Oracle

Reliability Engineering

Site Reliability Engineering Practices

Cloud Services

Ansible

Prometheus

Cloudera

Software Engineering

SonarQube

Teradata

Unstructured Data

Data Logging

Google Cloud Platform

React

System Availability

Grafana

Spark

Model Validation

Gitlab

Kubernetes

Kafka

REST

Splunk

Appdynamics

Jenkins

ServiceNow

Artifactory

Microservices

Job description

We are seeking a highly skilled Senior Site Reliability Engineer (SRE) to support key Shared Services Operations Technology platforms, including Payment Evaluations, Regulatory Operations, Financial Crimes, and Business & Real Estate Evaluation. You will be part of a team responsible for maintaining availability, performance, and reliability across ~85 applications that support KYC, AML, and other critical financial-crimes-related workloads. This role blends software engineering , systems operations , and cloud-native reliability practices to drive automation, enhance resilience, and support modernization across a large enterprise ecosystem. You will also help evolve AIOps capabilities, including predictive alerting, self-healing workflows, and AI/ML-driven incident analysis. Some occasional weekend work or overtime may be required for critical system support.

What You'll Do

Site Reliability & Operations Lead SRE practices that enhance system availability, performance, and scalability across multi-cloud environments.
Support and improve critical applications and customer journeys; lead incident response and blameless postmortems.
Conduct root-cause analysis and drive long-term remediation of recurrent issues.
Define and enforce operational readiness and Non-Functional Requirements (NFRs) during platform modernization.

Automation & Tooling

Design and implement automation to eliminate operational toil and improve service reliability.
Build frameworks for automated SLO/SLI tracking, availability metrics, error budgeting, and customer impact analysis.
Implement self-healing and autonomic systems using AI/ML, RPA, and intelligent monitoring.

Monitoring, Observability & AIOps

Develop and enhance monitoring, alerting, and observability capabilities.
Drive adoption of AIOps platforms to support anomaly detection, predictive alerting, and automated incident resolution.

Collaboration & Leadership

Collaborate with platform teams, product owners, and technology partners across the COO Technology organization.
Mentor peers and champion SRE best practices across engineering teams.
Identify process gaps across domains and recommend scalable, long-term improvements.

Requirements

5+ years in Systems Engineering, Site Reliability Engineering, Technology Architecture, or related fields (or equivalent military/training/education experience).
2+ years performing as part of an SRE team.
Strong written and verbal communication skills.

Technical Skills

Software Development Proficiency in Python and/or Java/J2EE .
Experience with REST APIs , microservices , Kafka/MQ , and modern integration patterns.
Familiarity with JavaScript frameworks (React, Bootstrap).
Strong SQL skills and database schema design experience.
Infrastructure & Cloud Expertise with Linux and container orchestration ( Kubernetes , OpenShift/OCP strongly preferred).
Experience with PCF, AWS, Google Cloud Platform, or Azure environments.
CI/CD & Automation Tools: Jenkins , GitLab , SonarQube , Artifactory , Ansible .
Observability & AIOps Tools: Grafana , Prometheus , Splunk/ELK , AppDynamics , Elastic , ThousandEyes , Aternity , Google Cloud Logging .
AIOps Platforms: Moogsoft , AI/ML-based analytics frameworks.
Operations & Data ITSM Tools: ServiceNow , Remedy , IBM Netcool .
Databases: Oracle , DB2 , SQL Server , MongoDB , Hadoop/Cloudera , Spark , Teradata .
Foundational AI Knowledge Understanding of common AI/ML concepts (classification, regression, clustering, anomaly detection).
Ability to work with structured/unstructured data for model evaluation.
Awareness of ethical/operational considerations in AI systems.
Experience integrating AI into automation workflows is a plus., * Experience with AutoSys .
Prior experience in corporate banking or financial services.
Strong interest in AI-driven operations and AIOps.