Data Engineer

Tata Consultancy Services Limited
Philadelphia, United States of America
14 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Compensation
$ 120K

Job location

Philadelphia, United States of America

Tech stack

API
Amazon Web Services (AWS)
Amazon Web Services (AWS)
Big Data
Databases
Data as a Services
Data Architecture
Data Validation
Data Governance
Data Infrastructure
ETL
Data Sharing
Data Warehousing
Relational Databases
Fault Tolerance
Python
Microsoft SQL Server
Performance Tuning
Query Optimization
Data Streaming
Data Processing
File Transfer Protocol (FTP)
Spark
AWS Lambda
Event Driven Architecture
Data Lake
PySpark
Semi-structured Data
Information Technology
Low Latency
Amazon Web Services (AWS)
Kafka
Data Management
Tools for Reporting
Cloudwatch
Data Pipelines
Redshift

Job description

  • Design and implement a scalable AWS-based data platform using Amazon S3, AWS Glue (PySpark), Amazon Redshift, AWS Lambda, and supporting AWS services.

  • Build and maintain event-driven data ingestion pipelines, leveraging Amazon EventBridge, AWS Step Functions, and S3 triggers to orchestrate end-to-end workflows for ingestion, transformation, validation, and loading.

  • Develop ETL/ELT pipelines using AWS Glue, PySpark, and Python to process structured and semi-structured data from SFTP, APIs, relational databases, and flat files.

  • Implement batch and streaming data pipelines using Amazon MSK (Kafka), Apache Spark, and AWS Glue for high- volume, low-latency, fault-tolerant data processing.

  • Lead development of Master Data Management (MDM) solutions, including survivorship logic, golden record creation, and Slowly Changing Dimension (SCD Type 2) strategies for enterprise domains such as client and policy data.

  • Design and optimize data models (dimensional and normalized) in Amazon Redshift, including fact and dimension tables and Analysis Ready Datasets (ARDs).

  • Implement Redshift performance optimization techniques such as DISTKEY/SORTKEY design, partitioning strategies, workload tuning, and query optimization.

  • Build and manage external tables using Redshift Spectrum to enable cost-effective querying of S3-based data lake content.

  • Enable cross-cluster data sharing using Amazon Redshift Data Sharing to support downstream analytics and reporting platforms.

  • Perform large-scale data migrations from on-prem databases (e.g., SQL Server) to Amazon Redshift using AWS DMS, including schema mapping, full-load ingestion, and tuning.

  • Develop data validation, cleansing, and data quality frameworks using AWS Glue and Python to ensure accuracy, consistency, and governance compliance.

  • Create reusable Python-based ingestion and transformation frameworks, standardizing ETL processes across multiple pipelines.

  • Build automated data orchestration workflows using AWS Step Functions and integrate monitoring and alerting with Amazon CloudWatch.

  • Conduct proofs of concept (PoCs) for new tools, architectures, and delivery strategies to support evolving business needs.

  • Collaborate with quality engineers, analysts, and business stakeholders to design scalable data solutions aligned with analytics and reporting requirements.

  • Continuously improve data reliability, pipeline efficiency, and operat ional stability.

Requirements

Do you have experience in Spark implementation?, Do you have a Bachelor's degree?, Must Have Technical/Functional Skills

  • Strong experience with AWS data services: S3, Glue, Redshift, Lambda, DMS, EventBridge, Step Functions, CloudWatch.

  • Hands-on expertise in PySpark, Apache Spark, and Python for large-scale data processing.

  • Experience building ETL/ELT pipelines and data lakes on AWS.

  • Strong understanding of Kafka / Amazon MSK for streaming and event-driven architectures.

  • Proven experience in Amazon Redshift data modeling, performance optimization, and query tuning.

  • Experience with MDM concepts, SCDs, and enterprise data governance.

  • Solid understanding of data warehousing principles and analytics-ready data design.

  • Experience working with structured, semi-structured, and large-scale enterprise datasets.

  • Strong communication skills and ability to work with cross-functional technical and business teams., Qualifications : BACHELOR OF COMPUTER SCIENCE

Benefits & conditions

(part of Tata group) 3.93.9 out of 5 stars Philadelphia, PA $100,000 - $120,000 a year, Pulled from the full job description

  • Pet insurance
  • Health insurance
  • Vision insurance
  • Dental insurance
  • Commuter assistance, Discretionary Annual Incentive. Comprehensive Medical Coverage: Medical & Health, Dental & Vision, Disability Planning & Insurance, Pet Insurance Plans. Family Support: Maternal & Parental Leaves. Insurance Options: Auto & Home Insurance, Identity Theft Protection. Convenience & Professional Growth: Commuter Benefits & Certification & Training Reimbursement. Time Off: Vacation, Time Off, Sick Leave & Holidays. Legal & Financial Assistance: Legal Assistance, 401K Plan, Performance Bonus, College Fund, Student Loan Refinancing. Salary Range: $100,000 - $120,000 a year

Apply for this position