Senior Technical Lead - DevOps, Python, Kubernetes

HCL America Inc.

Santa Clara, United States of America

7 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Compensation

$ 134K

Job location

Santa Clara, United States of America

Tech stack

Amazon Web Services (AWS)

Bash

Configuration Management

Data as a Services

DevOps

Disaster Recovery

Distributed Data Store

Distributed Systems

Python

Lightweight Directory Access Protocols (LDAP)

PostgreSQL

Linux System Administration

Performance Tuning

Prometheus

Service Discovery

Software Vulnerability Management

Apache Zookeeper

Scripting (Bash/Python/Go/Ruby)

Cloud Monitoring

Grafana

Apigee

Data Layers

Kubernetes

Infrastructure Automation Frameworks

Cassandra

Terraform

Job description

We are seeking an experienced Data Services Lead Engineer to own the technical direction, architecture, and operational excellence of our data platform. This role requires deep expertise in Cassandra, ZooKeeper, and Consul operations, strong leadership skills, and a passion for building robust, scalable distributed data systems. You will guide the team on best practices, lead complex technical projects, and act as the primary escalation point for data-platform-related issues. The team is also responsible for ZooKeeper, Consul, LDAP, PostgreSQL, and Qpid., Lead the design, architecture, and implementation of highly available, scalable, and performant distributed data stores (including Cassandra and PostgreSQL) across cloud and OnPrem environments. Define and drive the technical roadmap and strategy for the persistence services layer within Apigee Edge Data Services. Lead incident response and management with clear communication. Lead comprehensive post-mortem analyses for production incidents to identify root causes, document findings, and drive the implementation of preventative measures across the data platform. Lead vulnerability management initiatives, including the execution of regular version and security upgrades for all supported data services. Establish and enforce best practices for distributed systems data modeling, capacity planning, performance tuning, security, and disaster recovery. Develop and improve automation for cluster provisioning, configuration management, and upgrades. Serve as the primary technical escalation point for complex production issues, including root cause analysis. Mentor and provide technical guidance to other engineers across the organization. Collaborate with Engineering, SRE, and Support teams to align the data layer with platform requirements. Drive continuous improvement initiatives to enhance reliability and maintainability. Participate in the team's on-call rotation for production support.

Requirements

7+ years of experience managing large-scale, mission-critical distributed data systems (e.g., Cassandra, ZooKeeper) in a production environment. Understanding of Consul for service discovery and configuration management. Deep understanding of distributed system architectures, data modeling, internals, and performance tuning. Proficiency in Linux environments and scripting languages (e.g., Python, Bash). Experience with infrastructure-as-code tools (e.g., Terraform). Experience with monitoring and alerting systems (e.g., Prometheus, Grafana, Cloud Monitoring). Experience working in cloud environments (GCP, AWS, etc.).

Role details

Job location

Tech stack

Job description

Requirements

Apply for this position

Good distractions

Moments

Videos View all