Platform Reliability Engineer, Azure job in Irving

Wellfit Technologies Inc.
Irving, United States of America
yesterday

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Compensation
$ 150K

Job location

Irving, United States of America

Tech stack

.NET
API
Application Layers
Application Performance Management
Application Services
Azure
Command-Line Interface
Cloud Computing
DevOps
Powershell
Reliability Engineering
Cloud Services
Prometheus
Runbook
Software Deployment
Software Engineering
SQL Databases
Systems Integration
Datadog
Scripting (Bash/Python/Go/Ruby)
Microsoft Power Automate
Cloud Monitoring
System Availability
Grafana
Software Troubleshooting
Angular
Google Cloud Functions
Webhooks
Dynatrace
Azure

Job description

We are seeking a hands-on Platform Reliability Engineer, Azure to help strengthen the reliability, visibility, and operational maturity of our Azure-based platforms.

This role is ideal for someone who enjoys working directly in Azure, improving production systems, troubleshooting issues across infrastructure and application layers, and building practical monitoring and alerting solutions that help teams respond faster and operate more confidently.

You do not need to be an expert in every part of the stack on day one. We are looking for someone with strong Azure experience, solid troubleshooting instincts, a DevOps/reliability mindset, and the ability to collaborate closely with engineering teams across systems, services, and applications.

What You'll Do

&bull Own and improve monitoring, alerting, and observability across Azure-based production systems. &bull Work directly in Azure Monitor, Application Insights, App Services, logs, metrics, traces, and related Azure tooling to troubleshoot reliability and performance issues. &bull Build and refine practical alerting workflows, including Sev0/Sev1 alert routing, escalation paths, and runbook integration. &bull Create and maintain clear, actionable runbooks that help on-call engineers respond confidently to production incidents. &bull Partner with engineering teams to investigate issues across infrastructure, configuration, deployments, services, and application behavior. &bull Support release readiness by improving visibility into critical Azure resources before, during, and after production deployments. &bull Build and maintain dashboards in tools such as Grafana, Azure Monitor, Application Insights, or similar observability platforms. &bull Help configure incident routing integrations, including Slack/webhook-based alert delivery to the appropriate team channels. &bull Automate repeatable operational tasks using PowerShell, Logic Apps, Azure tooling, or similar workflow automation methods. &bull Contribute to RCA documentation, incident follow-up, reliability improvements, and operational playbook development., &bull You understand how production systems are monitored, where alerting gaps exist, and how to improve them. &bull You can work directly in Azure to investigate issues, improve visibility, and support reliable operations. &bull You build practical runbooks, dashboards, and alerting workflows that teams actually use. &bull You collaborate well with engineers, ask strong troubleshooting questions, and help drive issues to resolution. &bull You bring ownership, curiosity, and a builder mindset to a growing platform environment.

Requirements

&bull Hands-on experience supporting production systems in Azure. &bull Strong working knowledge of Azure App Services, Azure Monitor, Application Insights, and Azure production troubleshooting. &bull Experience with DevOps, cloud operations, site reliability, platform engineering, or production support in a hands-on environment. &bull Strong troubleshooting instincts and the ability to work through ambiguous production issues. &bull Comfort working across logs, metrics, traces, alerts, configurations, deployments, and service dependencies. &bull Ability to collaborate with software engineering teams across the stack, including .NET, Angular, SQL, APIs, and cloud services. &bull Experience building or improving dashboards, alerts, runbooks, incident workflows, or operational playbooks. &bull Working knowledge of scripting or automation, preferably with PowerShell, Logic Apps, CLI tooling, or similar technologies. &bull Clear communication skills with the ability to document findings, explain issues, and drive follow-through after incidents. &bull A high-ownership mindset with the ability to create structure, improve processes, and operate effectively in a fast-moving environment.

Preferred Experience &bull Azure certifications. &bull Grafana, Prometheus, DataDog, Dynatrace, or similar observability/APM tools. &bull Slack integrations, webhooks, Logic Apps, or incident routing workflows. &bull Azure Front Door, CDN, Function Apps, WebJobs, Service Bus, Event Hub, Event Grid, SQL Pools, App Service Plans, or related Azure services. &bull Experience in healthcare, fintech, payments, or other high-availability environments. &bull Experience in startup, SMB, or scale-up environments where ownership is broad and hands-on.

Benefits & conditions

&bull Make an Impact: Your work will directly strengthen the reliability of a fast-growing healthcare fintech platform supporting 1,100+ offices and $1.5B+ in annual transactions. &bull Build and Own: This is a high-impact role where you will help shape how we monitor, operate, and scale production systems. &bull Work Flexibly: Hybrid model based in Dallas with 3 days per week in office. &bull Comprehensive Benefits: Full medical, dental, vision, generous PTO, bonus eligibility, and 401(k) matching. &bull Fast-Growth Environment: A rare opportunity to grow with a profitable startup on a national trajectory.

$130,000 - $150,000 a year

Alongside a competitive annual bonus, we offer a 401(k) with up to a 4% match, generous paid time off, and comprehensive healthcare benefits.

About the company

Wellfit is the dental industry's fintech solution, breaking down financial barriers so patients, providers, employers, and payors can all access better care. As a healthcare fintech innovator, we're transforming the patient journey and redefining what's possible in dental care. About Wellfit Wellfit is the dental industry's fintech solution, breaking down financial barriers so patients, providers, employers, and payors can access better care. As a healthcare fintech innovator, we are transforming the patient journey and redefining what is possible in dental care. Today, Wellfit supports a growing production platform serving 1,100+ offices and processing $1.5B+ in annual transactions. As we continue to scale, reliability, observability, alerting, and production readiness are critical to how we support our customers and deliver with confidence., Why Wellfit

Apply for this position