Sr Principal Data Analyst

Horizontal Talent

Hopkins, United States of America

27 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Hopkins, United States of America

Tech stack

Unity

Adobe InDesign

API

Airflow

Amazon Web Services (AWS)

Data analysis

Apache HTTP Server

Software Applications

Big Data

Cloud Database

Code Review

Computer Programming

Continuous Integration

Information Engineering

Data Governance

ETL

Data Transformation

Data Systems

Data Visualization

Data Warehousing

Relational Databases

Database Queries

Hive

Python

Meta-Data Management

Microsoft SQL Server

Operational Data Store

Oracle Applications

Standard Sql

Azure

SAS (Software)

Shell Script

SQL Databases

Teradata

Unstructured Data

Data Logging

Macros

Delivery Pipeline

Snowflake

Spark

Caching

GIT

Data Lake

PySpark

Information Technology

Data Analytics

Software Version Control

Data Pipelines

Databricks

Job description

Maintain existing applications and work on developing new applications supported by FDM on FIN360 platforms for various Finance & Accounting business teams., Teams has around 25 developers (Onshore and Offshore) primarily working on developing and maintaining finance and accounting data processing and reporting applications and automation for UHC controllership. The primary skills in team are SAS, Python, SQL skills for relational databases including Snowflake, Oracle, Teradata, SQL Server etc., * Support the design, development, testing, and deployment of data analytics programs and processes for supporting various operational data stores using SAS, and Relational Databases.

Collect, interpret, and aggregate data from traditional and non-traditional data sources for supporting programs and applications utilizing various data analytics purposes.
Understand data requirements and business need to develop data tools such as dashboards and data visualizations.
Use business intelligence, data visualization, query, analytic and statistical software to build solutions, perform analysis and interpret data.
Solves moderately complex problems and can translate concepts into practice.
Recognize problems and make recommendations for solutions.
Works under minimal guidance and within tight deadlines for deliverables.
Has an exploring mindset
Effectively interact with business users for new projects, enhancement projects, and issue resolution including addressing issues reported regarding data and existing applications.
Adopts a structured approach focusing on understanding needs of business users, documenting requirements, and clarifying expectations. This involves active listening, utilizing various techniques to gather information, and ensuring clear communication to address any uncertainties or inconsistencies.
Create and document high level design, detail design, implementation and standard operating procedures guides.
Perform Production support tasks including job monitoring, addressing production failures, perform data analysis, root cause analysis and issue resolutions.
Design, develop, and maintain scalable ETL/ELT pipelines using ADF, Python, Apache Spark, and PySpark in Databricks.
Develop batch and incremental data processing pipelines handling large-scale structured, semi-structured and unstructured datasets. Implement optimized data transformation logic using Spark SQL and DataFrame APIs.
Ensure pipelines follow enterprise data engineering best practices for performance, scalability, and maintainability.
Implement reusable ingestion patterns and transformation templates aligned with enterprise architecture standards.
Ensure compliance with enterprise metadata management, monitoring, and operational standards.
Design and manage datasets stored in Apache Iceberg and Delta Lake. Implement schema evolution, partitioning strategies, and version control for large datasets.
Optimize data lake storage structures in Azure Data Lake Storage (ADLS) or AWS S3. Develop scalable pipelines using Databricks notebooks, jobs, and clusters.
Manage dataset governance and access controls using Unity Catalog. Optimize Spark performance through partitioning, caching, and cluster tuning.
Develop and schedule ETL pipelines using Apache Airflow. Implement dependency management, monitoring, alerting, and failure recovery mechanisms.
Build pipelines that integrate with Snowflake data warehouse.
Optimize transformations and data loading using Snowflake SQL and staging techniques.
Design efficient data models for analytics and reporting.
Support migration of legacy SAS pipelines to modern Spark-based frameworks and Databricks where applicable.
Use Unix/Linux commands for common tasks and shell scripting to automate data engineering workflows.
Support CI/CD deployment processes for ETL pipelines.
Implement logging, auditing, and monitoring for production pipelines. Work with data architects, analysts, and business stakeholders to gather requirements and deliver data solutions.
Participate in design reviews, architecture discussions, and code reviews.
Mentor junior data engineers and provide technical guidance.
Be the SME for DBX and do knowledge training for team

Requirements

The ideal candidate will have strong experience working with Databricks (Lakehouse, Delta Lake, Workflows, Medallion Architecture, Apache SPARK, Unity Catalog, Delta Sharing, Notebooks, SQL, GIT), PySpark, Python, Snowflake, and ADF frameworks. This role focuses on building and optimizing large-scale data pipelines using ADF, Apache Iceberg, Delta Lake, cloud data lakes (ADLS/S3), and workflow orchestration tools like Airflow. The analyst will work closely with data architects, and platform teams to build reliable and governed data solutions aligned with enterprise standards.

Must Have:

Bachelor's degree in computer science, Computer Applications, Analytics, Data Science, or Information Technology.
8+ years of experience in ETL / Data Engineering
8+ years of experience with programming using Python
8+ years of experience working in Unix/Linux environments
8+ years of experience writing Shell scripts
6+ years of experience with Databricks ecosystem including Lakehouse, Delta Lake, Workflows, Medallion Architecture, Apache SPARK, PySpark, Unity Catalog, Delta Sharing, Notebooks, SQL, GIT.
6+ years of experience with ADF
6+ years of experience working with large enterprise datasets
4+ years of experience with Snowflake
Strong analytical and troubleshooting skills
Excellent communication and collaboration abilities
Ability to work independently and mentor junior analysts
Strong documentation and design skills
Strong SQL skills
Experience implementing governance using Unity Catalog
Experience working with Apache Iceberg or other open table formats
Experience working with Azure Data Lake Storage (ADLS) or AWS S3
Understanding of cloud data lake architecture
Hands-on experience with Apache Airflow
Experience developing pipelines for Snowflake
Strong understanding of SAS programming, SAS Data step, SAS Macros, PROC SQL