Data Engineer-6

Realign Llc
New York, United States of America
28 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior
Compensation
$ 159K

Job location

New York, United States of America

Tech stack

Agile Methodologies
Amazon Web Services (AWS)
Amazon Web Services (AWS)
Amazon Web Services (AWS)
Data analysis
Apache HTTP Server
Cloud Computing
Code Review
Continuous Integration
Data Architecture
ETL
Software Debugging
DevOps
Amazon DynamoDB
Monitoring of Systems
Identity and Access Management
Python
Performance Tuning
SQL Databases
Data Processing
Data Ingestion
System Availability
Spark
Gitlab
Data Lake
PySpark
Kubernetes
Deployment Automation
Amazon Web Services (AWS)
Terraform
Data Pipelines
Docker
Databricks
Programming Languages

Job description

  • Work on migrating applications from an on-premises location to the cloud service providers.
  • Develop products and services on the latest technologies through contributions in
  • Development, enhancements, testing and implementation.
  • Develop, modify, extend code for building cloud infrastructure, and automate using CI/CD
  • pipeline.
  • Partners with business and peers in the pursuit of solutions that achieve business goals
  • through an agile software development methodology.
  • Perform problem analysis, data analysis, reporting, and communication.
  • Work with peers across the system to define and implement best practices and standards.
  • Assess applications and help determine the appropriate application infrastructure patterns.
  • Use the best practices and knowledge of internal or external drivers to improve products or services.

Requirements

Must Have Technical/Functional Skills

  • Hands-on experience in building ETL using Databricks SaaS infrastructure.
  • Experience in developing data pipeline solutions to ingest and exploit new and existing data
  • sources.
  • Expertise in leveraging SQL, programming language like Python and ETL tools like
  • Databricks
  • Perform code reviews to ensure requirements, optimal execution patterns and adherence to
  • established standards.
  • Expertise in AWS Compute (EC2, EMR), AWS Storage (S3, EBS), AWS Databases (RDS,
  • DynamoDB), AWS Data Integration (Glue).
  • Advanced understanding of Container Orchestration services including Docker and
  • Kubernetes, and a variety of AWS tools and services.
  • Good understanding of AWS Identify and Access management, AWS Networking and AWS
  • Monitoring tools.
  • Proficiency in CI/CD and deployment automation using GITLAB pipeline.
  • Proficiency in Cloud infrastructure provisioning tools e.g., Terraform.
  • Proficiency in one or more programming languages e.g., Python, Scala.
  • Experience in Starburst, Trino and building SQL queries in federated architecture.
  • Good knowledge of Lake house architecture.
  • Design, develop, and optimize scalable ETL/ELT pipelines using Databricks and Apache
  • Spark (PySpark and Scala).
  • Build data ingestion workflows from various sources (structured, semi-structured,

· Develop reusable components and frameworks for efficient data processing. ·

· Implement best practices for data quality, validation, and governance. ·

· Collaborate with data architects, analysts, and business stakeholders to understand data requirements. ·

· Tune Spark jobs for performance and scalability in a cloud-based environment. ·

· Maintain robust data lake or Lakehouse architecture. ·

Apply for this position