Data Platform Infrastructure Engineer

RIVAGO INFOTECH INC.
Scottsdale, United States of America
yesterday

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English

Job location

Scottsdale, United States of America

Tech stack

Amazon Web Services (AWS)
Amazon Web Services (AWS)
Amazon Web Services (AWS)
Configuration Management
Computer Networks
Data Systems
DevOps
Distributed Data Store
Distributed Systems
Hadoop
Hadoop Distributed File System
Hive
Identity and Access Management
Performance Tuning
Ansible
Cloudera
Cloud Platform System
Apache Yarn
System Availability
Spark
Reliability of Systems
Electronic Medical Records
Amazon Web Services (AWS)
Containerization
Kubernetes
Infrastructure Automation Frameworks
Kafka
Data Management
Terraform
Data Pipelines
Docker
Databricks

Job description

  • Design, deploy, and manage data platform infrastructure across on-prem (Cloudera) and cloud (AWS, Databricks) environments
  • Build and maintain distributed data clusters ensuring high availability, scalability, and performance
  • Automate infrastructure provisioning using Terraform and Ansible
  • Manage and optimize Cloudera Hadoop ecosystems (HDFS, Hive, Spark, YARN, etc.)
  • Deploy and manage Databricks workspaces, clusters, and integrations on AWS
  • Implement infrastructure-as-code (IaC) and configuration management best practices
  • Monitor cluster performance, troubleshoot issues, and ensure system reliability
  • Collaborate with data engineers, architects, and DevOps teams to support data pipelines and analytics workloads
  • Ensure security, compliance, and governance across data platforms
  • Support migration from on-prem to cloud-based data platforms

Requirements

  • Strong experience in Cloudera (CDH/CDP) cluster setup and administration
  • Hands-on experience with Databricks (cluster management, jobs, notebooks)
  • Strong exposure to AWS (EC2, S3, IAM, VPC, EMR, networking concepts)

Infrastructure & Automation

  • Expertise in Terraform (mandatory) for infrastructure provisioning
  • Proficiency in Ansible for configuration management and automation
  • Experience with CI/CD pipelines for infrastructure deployments

Cluster & Data Technologies

  • Experience managing distributed systems / cluster technologies

  • Strong understanding of:

  • Hadoop ecosystem (HDFS, Hive, Spark, Kafka, etc.)

  • Spark performance tuning and cluster optimization

Knowledge of containerization (Docker/Kubernetes) is a plus

Apply for this position