Data Platform Infrastructure Engineer
RIVAGO INFOTECH INC.
Scottsdale, United States of America
yesterday
Role details
Contract type
Permanent contract Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
EnglishJob location
Scottsdale, United States of America
Tech stack
Amazon Web Services (AWS)
Amazon Web Services (AWS)
Amazon Web Services (AWS)
Configuration Management
Computer Networks
Data Systems
DevOps
Distributed Data Store
Distributed Systems
Hadoop
Hadoop Distributed File System
Hive
Identity and Access Management
Performance Tuning
Ansible
Cloudera
Cloud Platform System
Apache Yarn
System Availability
Spark
Reliability of Systems
Electronic Medical Records
Amazon Web Services (AWS)
Containerization
Kubernetes
Infrastructure Automation Frameworks
Kafka
Data Management
Terraform
Data Pipelines
Docker
Databricks
Job description
- Design, deploy, and manage data platform infrastructure across on-prem (Cloudera) and cloud (AWS, Databricks) environments
- Build and maintain distributed data clusters ensuring high availability, scalability, and performance
- Automate infrastructure provisioning using Terraform and Ansible
- Manage and optimize Cloudera Hadoop ecosystems (HDFS, Hive, Spark, YARN, etc.)
- Deploy and manage Databricks workspaces, clusters, and integrations on AWS
- Implement infrastructure-as-code (IaC) and configuration management best practices
- Monitor cluster performance, troubleshoot issues, and ensure system reliability
- Collaborate with data engineers, architects, and DevOps teams to support data pipelines and analytics workloads
- Ensure security, compliance, and governance across data platforms
- Support migration from on-prem to cloud-based data platforms
Requirements
- Strong experience in Cloudera (CDH/CDP) cluster setup and administration
- Hands-on experience with Databricks (cluster management, jobs, notebooks)
- Strong exposure to AWS (EC2, S3, IAM, VPC, EMR, networking concepts)
Infrastructure & Automation
- Expertise in Terraform (mandatory) for infrastructure provisioning
- Proficiency in Ansible for configuration management and automation
- Experience with CI/CD pipelines for infrastructure deployments
Cluster & Data Technologies
-
Experience managing distributed systems / cluster technologies
-
Strong understanding of:
-
Hadoop ecosystem (HDFS, Hive, Spark, Kafka, etc.)
-
Spark performance tuning and cluster optimization
Knowledge of containerization (Docker/Kubernetes) is a plus