Data Reliability Engineer - Remote

Insight Global
Zionsville, United States of America
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Compensation
$ 156K

Job location

Zionsville, United States of America

Tech stack

Artificial Intelligence
Cloud Database
Data Auditing
Data Validation
Information Engineering
Data Infrastructure
Data Integrity
Data Systems
DevOps
Python
Machine Learning
Uptime
Reliability Engineering
Cloud Services
Software Deployment
SQL Databases
Snowflake
Infrastructure Automation Frameworks
Performance Monitor
Machine Learning Operations
Data Delivery
Terraform
Databricks

Job description

  • Reliability Engineering for Data Systems: Design and implement reliability practices within the data platform, focusing on uptime and performance of platforms like Snowflake, dashboards, Daxter, and overall platform stability.
  • Monitoring & Automation: Develop automated monitoring solutions to assess system health and performance for machine learning workloads, data quality checks, and uptime, continuously improving operational processes.
  • ML Model & AI Workloads Support: Build, deploy, and monitor AI/ML models and workloads, ensuring scalability and stability while evaluating new capabilities to enhance data delivery.
  • Data Quality & Compliance: Implement and scale data quality checks across systems, utilizing automation to ensure consistent data integrity and reliability.
  • Infrastructure Management: Work on infrastructure setup and code deployments using Terraform, Python, and SQL to enhance and maintain data pipelines and ensure platform scalability.
  • Collaboration & Responsiveness: Act as the on-point support for reliability issues, collaborating closely with the data platform team and other engineering teams to proactively resolve incidents and continuously improve processes.

Pay rate range: 65-75/hr

Requirements

We are looking for a highly motivated Data Reliability Engineer (Data SRE) to join one of our clients in the insurance industry specifically in their data platform team. This individual will bring an SRE mindset to the data space, focusing on building, monitoring, and enhancing the stability and performance of our data infrastructure. As part of a 6-person data platform team, this role will work closely with cross-functional teams to ensure the reliability, scalability, and quality of their data and machine learning systems.

The ideal candidate will have a solid technical foundation in data engineering, machine learning, and infrastructure management, with experience in cloud data platforms like Snowflake and Databricks. Strong skills in Python, SQL, and Terraform are essential, along with a deep understanding of automation, monitoring, and data quality assurance., Previous experience in Site Reliability Engineering (SRE) or DevOps roles, preferably within a data-focused or machine learning environment.

  • Proven experience in maintaining and monitoring data systems, with exposure to machine learning infrastructure.
  • Strong hands-on experience with cloud data warehouses, such as Snowflake or Databricks, including data quality and system performance monitoring.
  • Proficiency in Python (for automation), SQL, and infrastructure-as-code tools like Terraform.

-Ability to assess system performance and design scalable solutions for data quality checks, uptime, and reliability.

Apply for this position