Site Reliability Engineer

TechSpace Solutions Inc.

Cincinnati, United States of America

yesterday

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Cincinnati, United States of America

Tech stack

Java

API

Amazon Web Services (AWS)

Cloud Computing

Databases

Data Systems

Software Debugging

Distributed Systems

Payment Systems

Performance Tuning

Ruby on Rails

Reliability Engineering

Runbook

Software Engineering

SQL Databases

Datadog

Grafana

Information Technology

Splunk

New Relic (SaaS)

Service Stack

Microservices

Job description

Client is looking for an enterprise-grade embedded finance platform enabling organizations to build, launch, and scale compliant banking, payments, and lending solutions.
We are seeking a Principal Software Engineer to join our Production Engineering team. This is a hands-on technical leadership role focused on operating, debugging, and improving highly distributed, mission-critical payment systems. The ideal candidate thrives in complex production environments and enjoys solving deep technical challenges across applications, infrastructure, and data systems., * Lead production triage and incident response across APIs, payment systems, distributed services, infrastructure, and databases.
Diagnose and resolve complex production issues spanning code, infrastructure, data, and third-party dependencies.
Partner with engineering teams to implement permanent fixes and improve platform reliability.
Design and implement monitoring, alerting, automation, and operational tooling.
Improve system observability, resiliency, and debuggability.
Work across a mixed technology stack including Ruby on Rails, Java, AWS, APIs, and SQL databases.
Develop runbooks and diagnostic workflows for operational excellence.
Mentor engineers and influence best practices across engineering and SRE teams.
Participate in architectural discussions to build highly reliable and scalable systems.

Requirements

10+ years of experience in Software Engineering, Production Engineering, SRE, or Distributed Systems.
Strong experience debugging production issues end-to-end (application, infrastructure, data, and dependencies).

Hands-on experience with:

AWS and cloud-native environments
Ruby on Rails and/or Java
APIs, Microservices, and Distributed Systems
SQL and database troubleshooting
Observability tools such as Splunk, Datadog, New Relic, etc.

Deep understanding of:

System behavior in production
Fault isolation and troubleshooting
Performance optimization and resiliency patterns
Excellent communication and stakeholder management skills.
Ability to work effectively during incidents and high-pressure situations.

Preferred Qualifications:

Experience in Payments, FinTech, Banking, or other regulated environments.
Experience building and operating large-scale, high-availability platforms.
Bachelor's degree in Computer Science, Engineering, or equivalent practical experience.

Role details

Job location

Tech stack

Job description

Requirements

Apply for this position

Good distractions

Moments

Videos View all