Associate Principal - Data Engineering

LTM Inc
Cincinnati, United States of America
yesterday

Role details

Contract type
Internship / Graduate position
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Compensation
$ 120K

Job location

Cincinnati, United States of America

Tech stack

Query Performance
API
Airflow
Amazon Web Services (AWS)
Apache HTTP Server
Unit Testing
Cloud Computing
Cloud Storage
Databases
Data as a Services
Directed Acyclic Graph (Directed Graphs)
Data Validation
Information Engineering
Data Governance
Data Infrastructure
Data Transformation
Data Sharing
Data Warehousing
Hive
JSON
Python
SQL Azure
Modular Design
Open Data Protocol
Performance Tuning
User Defined Functions
Cloudera
Salesforce
SAP Applications
Software Construction
SQL Databases
Data Streaming
Management of Software Versions
Parquet
Data Processing
Freeform SQL
File Transfer Protocol (FTP)
Cloud Platform System
Data Ingestion
Snowflake
Spark
SAP BAPI
Change Data Capture
Pandas
Pytest
Data Lake
PySpark
Integration Tests
Avro
SAP S/4HANA
Kafka
Data Lakehouse
REST
Oracle Cloud Infrastructure
Azure
Artificial Intelligence Markup Language (AIML)
Looker Analytics
Data Pipelines
Sql Tuning
Amazon Web Services (AWS)
Databricks

Job description

PySpark Development Primary Focus

Design and develop productiongrade PySpark applications for largescale batch and streaming data processing

Implement advanced PySpark DataFrame API operations

oComplex transformations Window functions PivotUnpivot and nested struct handling

oMultidataset joins Broadcast joins SortMerge joins and skewhandling strategies

oCustom UDFs User Defined Functions and Pandas UDFs Vectorized UDFs for performancecritical transformations

oAggregations and GroupBy operations optimized for large FMCG datasets

Implement PySpark Structured Streaming for realtime data processing

oKafka Azure Event Hubs GCP PubSub as streaming sources

oWatermarking and windowing strategies for latearriving data

oStateful streaming operations using mapGroupsWithState

oExactlyonce and atleastonce delivery semantics

Apply advanced Spark Performance Tuning techniques

oPartition optimization repartition vs coalesce strategies

oHandling data skew using salting and custom partitioners

oBroadcast variable management and accumulator usage

oCatalyst optimizer hints and AQE Adaptive Query Execution tuning

oExecutor sizing memory fractions and parallelism configuration

Develop and maintain reusable PySpark libraries for shared data processing capabilities

Python Engineering Primary Focus

Build Pythonbased data services automation scripts and utility frameworks supporting the data platform

Develop REST API integrations using Python requests httpx for consuming SAP OData Salesforce and thirdparty FMCG APIs

Implement data validation and reconciliation frameworks using Python Great Expectations Pandera

Build Pythonbased orchestration scripts and helper utilities for Airflow DAGs and Databricks Workflows

Apply software engineering best practices

oUnit testing with pytest and integration testing with Testcontainers

oType hints docstrings and modular design patterns

oVirtual environments dependency management Poetrypip and packaging

Implement Pythonbased data quality checks Completeness consistency and conformity validations

Data Lakehouse Cloud Platform Primary Focus

Build and manage Data Lakehouse architectures on hyperscaler platforms

oAzure Databricks GCP Dataproc AWS EMR for Spark cluster management

oDelta Lake Apache Iceberg Apache Hudi for ACIDcompliant data lake storage

oMedallion Architecture BronzeSilverGold for progressive data refinement

Implement Delta Lake features

oACID transactions and schema enforcement

oTime Travel for data versioning and rollback

oDelta Live Tables DLT for declarative pipeline development

oOptimize and ZOrder for query performance acceleration

oChange Data Feed CDF for incremental data propagation

Manage Databricks Workflows and Job Clusters for production pipeline execution

Implement Databricks Auto Loader for incremental scalable data ingestion from cloud storage

Utilize Unity Catalog for data governance lineage and access control

Data Ingestion Integration

Build data ingestion pipelines from diverse FMCG data sources

oSAP S4HANA OData APIs BAPI extracts and IDocbased feeds

oSalesforce REST API Bulk API and Platform Events

oOperational Databases Oracle Cloud SQL Azure SQL and Cloud Spanner

oStreaming Sources Apache Kafka Azure Event Hubs and GCP PubSub

oFilebased Sources SFTP Azure Blob GCS and S3 CSV Parquet Avro JSON

Implement Change Data Capture CDC patterns for realtime database synchronization

Design schema evolution strategies to handle upstream data model changes gracefully

Publish processed data to downstream consumers

oBigQuery Azure Synapse Snowflake for BI and analytics

oFeature Stores FeastDatabricks for AIML model training

oPower BI Looker for business reporting

SQL Data Modeling

Write and optimize complex SQL queries for data extraction transformation and validation

Design data warehouse schemas Star and Snowflake models for FMCG analytics domains

Implement Spark SQL for largescale analytical query processing

Develop data quality SQL checks and reconciliation frameworks

Optimize SQL performance Query plans partition pruning and predicate pushdown

Requirements

Do you have experience in Unit testing?

Benefits & conditions

(part of Larsen and Toubro (L&T)) 3.73.7 out of 5 stars Cincinnati, OH $100,000 - $120,000 a year, Pulled from the full job description

  • Paid parental leave
  • Parental leave
  • Health insurance
  • 401(k) matching
  • Vision insurance
  • Dental insurance
  • Life insurance, Benefits/perks listed below may vary depending on the nature of your employment with LTIMindtree ("LTIM"):

Benefits and Perks:

  • Comprehensive Medical Plan Covering Medical, Dental, Vision
  • Short Term and Long-Term Disability Coverage
  • 401(k) Plan with Company match
  • Life Insurance
  • Vacation Time, Sick Leave, Paid Holidays
  • Paid Paternity and Maternity Leave

The range displayed on each job posting reflects the minimum and maximum salary target for the position across all US locations. Within the range, individual pay is determined by work location and job level and additional factors including job-related skills, experience, and relevant education or training. Depending on the position offered, other forms of compensation may be provided as part of overall compensation like an annual performance-based bonus, sales incentive pay and other forms of bonus or variable compensation., Compensation range: $100,000.00 to $120,000.00 per year

About the company

LTM is an AI-centric global technology services company and the Business Creativity partner to the world's largest and most disruptive enterprises. We bring human insights and intelligent systems together to help clients create greater value at the intersection of technology and domain expertise. Our capabilities span integrated operations, transformation, and business AI - enabling new ways of working, new productivity paradigms, and new roads to value. Together with over 87,000 employees across 40 countries and our global network of partners, LTM - a Larsen & Toubro company - owns business outcomes for our clients, helping them not just outperform the market, but to Outcreate it. Please also note that neither LTM nor any of its authorized recruitment agencies/partners charge any candidate registration fee or any other fees from talent (candidates) towards appearing for an interview or securing employment/internship. Candidates shall be solely responsible for

Apply for this position