Data Engineer

Tata Consultancy Services Limited

Philadelphia, United States of America

2 months ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Compensation

$ 120K

Job location

Philadelphia, United States of America

Tech stack

API

Amazon Web Services (AWS)

Big Data

Databases

Data as a Services

Data Architecture

Data Validation

Data Governance

Data Infrastructure

ETL

Data Sharing

Data Warehousing

Relational Databases

Fault Tolerance

Python

Microsoft SQL Server

Performance Tuning

Query Optimization

Data Streaming

Data Processing

File Transfer Protocol (FTP)

Spark

AWS Lambda

Event Driven Architecture

Data Lake

PySpark

Semi-structured Data

Information Technology

Low Latency

Amazon Web Services (AWS)

Kafka

Data Management

Tools for Reporting

Cloudwatch

Data Pipelines

Redshift

Job description

Design and implement a scalable AWS-based data platform using Amazon S3, AWS Glue (PySpark), Amazon Redshift, AWS Lambda, and supporting AWS services.
Build and maintain event-driven data ingestion pipelines, leveraging Amazon EventBridge, AWS Step Functions, and S3 triggers to orchestrate end-to-end workflows for ingestion, transformation, validation, and loading.
Develop ETL/ELT pipelines using AWS Glue, PySpark, and Python to process structured and semi-structured data from SFTP, APIs, relational databases, and flat files.
Implement batch and streaming data pipelines using Amazon MSK (Kafka), Apache Spark, and AWS Glue for high- volume, low-latency, fault-tolerant data processing.
Lead development of Master Data Management (MDM) solutions, including survivorship logic, golden record creation, and Slowly Changing Dimension (SCD Type 2) strategies for enterprise domains such as client and policy data.
Design and optimize data models (dimensional and normalized) in Amazon Redshift, including fact and dimension tables and Analysis Ready Datasets (ARDs).
Implement Redshift performance optimization techniques such as DISTKEY/SORTKEY design, partitioning strategies, workload tuning, and query optimization.
Build and manage external tables using Redshift Spectrum to enable cost-effective querying of S3-based data lake content.
Enable cross-cluster data sharing using Amazon Redshift Data Sharing to support downstream analytics and reporting platforms.
Perform large-scale data migrations from on-prem databases (e.g., SQL Server) to Amazon Redshift using AWS DMS, including schema mapping, full-load ingestion, and tuning.
Develop data validation, cleansing, and data quality frameworks using AWS Glue and Python to ensure accuracy, consistency, and governance compliance.
Create reusable Python-based ingestion and transformation frameworks, standardizing ETL processes across multiple pipelines.
Build automated data orchestration workflows using AWS Step Functions and integrate monitoring and alerting with Amazon CloudWatch.
Conduct proofs of concept (PoCs) for new tools, architectures, and delivery strategies to support evolving business needs.
Collaborate with quality engineers, analysts, and business stakeholders to design scalable data solutions aligned with analytics and reporting requirements.
Continuously improve data reliability, pipeline efficiency, and operat ional stability.

Requirements

Do you have experience in Spark implementation?, Do you have a Bachelor's degree?, Must Have Technical/Functional Skills

Strong experience with AWS data services: S3, Glue, Redshift, Lambda, DMS, EventBridge, Step Functions, CloudWatch.
Hands-on expertise in PySpark, Apache Spark, and Python for large-scale data processing.
Experience building ETL/ELT pipelines and data lakes on AWS.
Strong understanding of Kafka / Amazon MSK for streaming and event-driven architectures.
Proven experience in Amazon Redshift data modeling, performance optimization, and query tuning.
Experience with MDM concepts, SCDs, and enterprise data governance.
Solid understanding of data warehousing principles and analytics-ready data design.
Experience working with structured, semi-structured, and large-scale enterprise datasets.
Strong communication skills and ability to work with cross-functional technical and business teams., Qualifications : BACHELOR OF COMPUTER SCIENCE

Benefits & conditions

(part of Tata group) 3.93.9 out of 5 stars Philadelphia, PA $100,000 - $120,000 a year, Pulled from the full job description

Pet insurance
Health insurance
Vision insurance
Dental insurance
Commuter assistance, Discretionary Annual Incentive. Comprehensive Medical Coverage: Medical & Health, Dental & Vision, Disability Planning & Insurance, Pet Insurance Plans. Family Support: Maternal & Parental Leaves. Insurance Options: Auto & Home Insurance, Identity Theft Protection. Convenience & Professional Growth: Commuter Benefits & Certification & Training Reimbursement. Time Off: Vacation, Time Off, Sick Leave & Holidays. Legal & Financial Assistance: Legal Assistance, 401K Plan, Performance Bonus, College Fund, Student Loan Refinancing. Salary Range: $100,000 - $120,000 a year

Role details

Job location

Tech stack

Job description

Requirements

Benefits & conditions

Apply for this position

Good distractions

Moments

Videos View all