Data Engineer-6

Realign Llc

New York, United States of America

28 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Compensation

$ 159K

Job location

New York, United States of America

Tech stack

Agile Methodologies

Amazon Web Services (AWS)

Data analysis

Apache HTTP Server

Cloud Computing

Code Review

Continuous Integration

Data Architecture

ETL

Software Debugging

DevOps

Amazon DynamoDB

Monitoring of Systems

Identity and Access Management

Python

Performance Tuning

SQL Databases

Data Processing

Data Ingestion

System Availability

Spark

Gitlab

Data Lake

PySpark

Kubernetes

Deployment Automation

Amazon Web Services (AWS)

Terraform

Data Pipelines

Docker

Databricks

Programming Languages

Job description

Work on migrating applications from an on-premises location to the cloud service providers.
Develop products and services on the latest technologies through contributions in
Development, enhancements, testing and implementation.
Develop, modify, extend code for building cloud infrastructure, and automate using CI/CD
pipeline.
Partners with business and peers in the pursuit of solutions that achieve business goals
through an agile software development methodology.
Perform problem analysis, data analysis, reporting, and communication.
Work with peers across the system to define and implement best practices and standards.
Assess applications and help determine the appropriate application infrastructure patterns.
Use the best practices and knowledge of internal or external drivers to improve products or services.

Requirements

Must Have Technical/Functional Skills

Hands-on experience in building ETL using Databricks SaaS infrastructure.
Experience in developing data pipeline solutions to ingest and exploit new and existing data
sources.
Expertise in leveraging SQL, programming language like Python and ETL tools like
Databricks
Perform code reviews to ensure requirements, optimal execution patterns and adherence to
established standards.
Expertise in AWS Compute (EC2, EMR), AWS Storage (S3, EBS), AWS Databases (RDS,
DynamoDB), AWS Data Integration (Glue).
Advanced understanding of Container Orchestration services including Docker and
Kubernetes, and a variety of AWS tools and services.
Good understanding of AWS Identify and Access management, AWS Networking and AWS
Monitoring tools.
Proficiency in CI/CD and deployment automation using GITLAB pipeline.
Proficiency in Cloud infrastructure provisioning tools e.g., Terraform.
Proficiency in one or more programming languages e.g., Python, Scala.
Experience in Starburst, Trino and building SQL queries in federated architecture.
Good knowledge of Lake house architecture.
Design, develop, and optimize scalable ETL/ELT pipelines using Databricks and Apache
Spark (PySpark and Scala).
Build data ingestion workflows from various sources (structured, semi-structured,