Data Engineer

Ampcus Inc

New York, United States of America

3 days ago

Role details

Contract type

Temporary to permanent

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Job location

New York, United States of America

Tech stack

Java

Airflow

Amazon Web Services (AWS)

Big Data

Computer Programming

Information Engineering

ETL

Data Transformation

Data Warehousing

Distributed Computing Environment

Performance Tuning

Data Processing

Data Ingestion

System Availability

Snowflake

Spark

Electronic Medical Records

Data Lake

PySpark

Data Pipelines

Job description

We are seeking a skilled Data Engineer to design, build, and manage scalable ETL pipelines supporting a centralized data lake and Snowflake data warehouse. The role focuses on automating data ingestion, transformation, and aggregation workflows to enable reliable analytics and data-driven decision-making., * Design, develop, and maintain robust ETL pipelines for ingesting data into the enterprise data lake and Snowflake environment.

Automate data processing, aggregation, and analytical workflows to improve data availability and performance.
Implement and manage orchestration and scheduling of data pipelines using ControlM and Apache Airflow.
Develop scalable data transformation logic using PySpark and Apache Spark (Java).
Work with large, structured and semi-structured datasets on AWS infrastructure.
Ensure data quality, integrity, and reliability across data pipelines.
Optimize data pipelines for performance, cost, and scalability.
Collaborate with analytics, data science, and business teams to understand data requirements.
Monitor, troubleshoot, and resolve pipeline failures and performance bottlenecks.
Follow best practices for data engineering, security, and documentation.

Requirements

Strong experience with data lake architectures and large-scale data processing.
Hands-on experience with AWS services (e.g., S3, EC2, EMR, Glue, or related).
Proven expertise in building ETL pipelines for analytics and reporting use cases.
Solid working knowledge of Snowflake, including data loading, transformations, and performance optimization.
Experience with workflow automation and scheduling tools such as ControlM and Apache Airflow.
Proficiency in PySpark for distributed data processing.
Strong programming experience with Apache Spark using Java.
Good understanding of data modeling, partitioning, and performance tuning concepts.

Role details

Job location

Tech stack

Job description

Requirements

Apply for this position

Good distractions

Moments

Videos View all