Associate Principal - Data Engineering

LTM Inc

Cincinnati, United States of America

yesterday

Role details

Contract type

Internship / Graduate position

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Compensation

$ 120K

Job location

Cincinnati, United States of America

Tech stack

Query Performance

API

Airflow

Amazon Web Services (AWS)

Apache HTTP Server

Unit Testing

Cloud Computing

Cloud Storage

Databases

Data as a Services

Directed Acyclic Graph (Directed Graphs)

Data Validation

Information Engineering

Data Governance

Data Infrastructure

Data Transformation

Data Sharing

Data Warehousing

Hive

JSON

Python

SQL Azure

Modular Design

Open Data Protocol

Performance Tuning

User Defined Functions

Cloudera

Salesforce

SAP Applications

Software Construction

SQL Databases

Data Streaming

Management of Software Versions

Parquet

Data Processing

Freeform SQL

File Transfer Protocol (FTP)

Cloud Platform System

Data Ingestion

Snowflake

Spark

SAP BAPI

Change Data Capture

Pandas

Pytest

Data Lake

PySpark

Integration Tests

Avro

SAP S/4HANA

Kafka

Data Lakehouse

REST

Oracle Cloud Infrastructure

Azure

Artificial Intelligence Markup Language (AIML)

Looker Analytics

Data Pipelines

Sql Tuning

Amazon Web Services (AWS)

Databricks

Job description

PySpark Development Primary Focus

Design and develop productiongrade PySpark applications for largescale batch and streaming data processing

Implement advanced PySpark DataFrame API operations

oComplex transformations Window functions PivotUnpivot and nested struct handling

oMultidataset joins Broadcast joins SortMerge joins and skewhandling strategies

oCustom UDFs User Defined Functions and Pandas UDFs Vectorized UDFs for performancecritical transformations

oAggregations and GroupBy operations optimized for large FMCG datasets

Implement PySpark Structured Streaming for realtime data processing

oKafka Azure Event Hubs GCP PubSub as streaming sources

oWatermarking and windowing strategies for latearriving data

oStateful streaming operations using mapGroupsWithState

oExactlyonce and atleastonce delivery semantics

Apply advanced Spark Performance Tuning techniques

oPartition optimization repartition vs coalesce strategies

oHandling data skew using salting and custom partitioners

oBroadcast variable management and accumulator usage

oCatalyst optimizer hints and AQE Adaptive Query Execution tuning

oExecutor sizing memory fractions and parallelism configuration

Develop and maintain reusable PySpark libraries for shared data processing capabilities

Python Engineering Primary Focus

Build Pythonbased data services automation scripts and utility frameworks supporting the data platform

Develop REST API integrations using Python requests httpx for consuming SAP OData Salesforce and thirdparty FMCG APIs

Implement data validation and reconciliation frameworks using Python Great Expectations Pandera

Build Pythonbased orchestration scripts and helper utilities for Airflow DAGs and Databricks Workflows

Apply software engineering best practices

oUnit testing with pytest and integration testing with Testcontainers

oType hints docstrings and modular design patterns

oVirtual environments dependency management Poetrypip and packaging

Implement Pythonbased data quality checks Completeness consistency and conformity validations

Data Lakehouse Cloud Platform Primary Focus

Build and manage Data Lakehouse architectures on hyperscaler platforms

oAzure Databricks GCP Dataproc AWS EMR for Spark cluster management

oDelta Lake Apache Iceberg Apache Hudi for ACIDcompliant data lake storage

oMedallion Architecture BronzeSilverGold for progressive data refinement

Implement Delta Lake features

oACID transactions and schema enforcement

oTime Travel for data versioning and rollback

oDelta Live Tables DLT for declarative pipeline development

oOptimize and ZOrder for query performance acceleration

oChange Data Feed CDF for incremental data propagation

Manage Databricks Workflows and Job Clusters for production pipeline execution

Implement Databricks Auto Loader for incremental scalable data ingestion from cloud storage

Utilize Unity Catalog for data governance lineage and access control

Data Ingestion Integration

Build data ingestion pipelines from diverse FMCG data sources

oSAP S4HANA OData APIs BAPI extracts and IDocbased feeds

oSalesforce REST API Bulk API and Platform Events

oOperational Databases Oracle Cloud SQL Azure SQL and Cloud Spanner

oStreaming Sources Apache Kafka Azure Event Hubs and GCP PubSub

oFilebased Sources SFTP Azure Blob GCS and S3 CSV Parquet Avro JSON

Implement Change Data Capture CDC patterns for realtime database synchronization

Design schema evolution strategies to handle upstream data model changes gracefully

Publish processed data to downstream consumers

oBigQuery Azure Synapse Snowflake for BI and analytics

oFeature Stores FeastDatabricks for AIML model training

oPower BI Looker for business reporting

SQL Data Modeling

Write and optimize complex SQL queries for data extraction transformation and validation

Design data warehouse schemas Star and Snowflake models for FMCG analytics domains

Implement Spark SQL for largescale analytical query processing

Develop data quality SQL checks and reconciliation frameworks

Optimize SQL performance Query plans partition pruning and predicate pushdown

Requirements

Do you have experience in Unit testing?

Benefits & conditions

(part of Larsen and Toubro (L&T)) 3.73.7 out of 5 stars Cincinnati, OH $100,000 - $120,000 a year, Pulled from the full job description

Paid parental leave
Parental leave
Health insurance
401(k) matching
Vision insurance
Dental insurance
Life insurance, Benefits/perks listed below may vary depending on the nature of your employment with LTIMindtree ("LTIM"):

Benefits and Perks:

Comprehensive Medical Plan Covering Medical, Dental, Vision
Short Term and Long-Term Disability Coverage
401(k) Plan with Company match
Life Insurance
Vacation Time, Sick Leave, Paid Holidays
Paid Paternity and Maternity Leave

The range displayed on each job posting reflects the minimum and maximum salary target for the position across all US locations. Within the range, individual pay is determined by work location and job level and additional factors including job-related skills, experience, and relevant education or training. Depending on the position offered, other forms of compensation may be provided as part of overall compensation like an annual performance-based bonus, sales incentive pay and other forms of bonus or variable compensation., Compensation range: $100,000.00 to $120,000.00 per year

About the company

LTM is an AI-centric global technology services company and the Business Creativity partner to the world's largest and most disruptive enterprises. We bring human insights and intelligent systems together to help clients create greater value at the intersection of technology and domain expertise. Our capabilities span integrated operations, transformation, and business AI - enabling new ways of working, new productivity paradigms, and new roads to value. Together with over 87,000 employees across 40 countries and our global network of partners, LTM - a Larsen & Toubro company - owns business outcomes for our clients, helping them not just outperform the market, but to Outcreate it. Please also note that neither LTM nor any of its authorized recruitment agencies/partners charge any candidate registration fee or any other fees from talent (candidates) towards appearing for an interview or securing employment/internship. Candidates shall be solely responsible for

Role details

Job location

Tech stack

Job description

Requirements

Benefits & conditions

About the company

Apply for this position

Good distractions

Moments

Videos View all