Sr Principal Data Analyst

Horizontal Talent
Hopkins, United States of America
27 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Hopkins, United States of America

Tech stack

Unity
Adobe InDesign
API
Airflow
Amazon Web Services (AWS)
Data analysis
Apache HTTP Server
Software Applications
Big Data
Cloud Database
Code Review
Computer Programming
Continuous Integration
Information Engineering
Data Governance
ETL
Data Transformation
Data Systems
Data Visualization
Data Warehousing
Relational Databases
Database Queries
Hive
Python
Meta-Data Management
Microsoft SQL Server
Operational Data Store
Oracle Applications
Standard Sql
Azure
SAS (Software)
Shell Script
SQL Databases
Teradata
Unstructured Data
Data Logging
Macros
Delivery Pipeline
Snowflake
Spark
Caching
GIT
Data Lake
PySpark
Information Technology
Data Analytics
Software Version Control
Data Pipelines
Databricks

Job description

Maintain existing applications and work on developing new applications supported by FDM on FIN360 platforms for various Finance & Accounting business teams., Teams has around 25 developers (Onshore and Offshore) primarily working on developing and maintaining finance and accounting data processing and reporting applications and automation for UHC controllership. The primary skills in team are SAS, Python, SQL skills for relational databases including Snowflake, Oracle, Teradata, SQL Server etc., * Support the design, development, testing, and deployment of data analytics programs and processes for supporting various operational data stores using SAS, and Relational Databases.

  • Collect, interpret, and aggregate data from traditional and non-traditional data sources for supporting programs and applications utilizing various data analytics purposes.

  • Understand data requirements and business need to develop data tools such as dashboards and data visualizations.

  • Use business intelligence, data visualization, query, analytic and statistical software to build solutions, perform analysis and interpret data.

  • Solves moderately complex problems and can translate concepts into practice.

  • Recognize problems and make recommendations for solutions.

  • Works under minimal guidance and within tight deadlines for deliverables.

  • Has an exploring mindset

  • Effectively interact with business users for new projects, enhancement projects, and issue resolution including addressing issues reported regarding data and existing applications.

  • Adopts a structured approach focusing on understanding needs of business users, documenting requirements, and clarifying expectations. This involves active listening, utilizing various techniques to gather information, and ensuring clear communication to address any uncertainties or inconsistencies.

  • Create and document high level design, detail design, implementation and standard operating procedures guides.

  • Perform Production support tasks including job monitoring, addressing production failures, perform data analysis, root cause analysis and issue resolutions.

  • Design, develop, and maintain scalable ETL/ELT pipelines using ADF, Python, Apache Spark, and PySpark in Databricks.

  • Develop batch and incremental data processing pipelines handling large-scale structured, semi-structured and unstructured datasets. Implement optimized data transformation logic using Spark SQL and DataFrame APIs.

  • Ensure pipelines follow enterprise data engineering best practices for performance, scalability, and maintainability.

  • Implement reusable ingestion patterns and transformation templates aligned with enterprise architecture standards.

  • Ensure compliance with enterprise metadata management, monitoring, and operational standards.

  • Design and manage datasets stored in Apache Iceberg and Delta Lake. Implement schema evolution, partitioning strategies, and version control for large datasets.

  • Optimize data lake storage structures in Azure Data Lake Storage (ADLS) or AWS S3. Develop scalable pipelines using Databricks notebooks, jobs, and clusters.

  • Manage dataset governance and access controls using Unity Catalog. Optimize Spark performance through partitioning, caching, and cluster tuning.

  • Develop and schedule ETL pipelines using Apache Airflow. Implement dependency management, monitoring, alerting, and failure recovery mechanisms.

  • Build pipelines that integrate with Snowflake data warehouse.

  • Optimize transformations and data loading using Snowflake SQL and staging techniques.

  • Design efficient data models for analytics and reporting.

  • Support migration of legacy SAS pipelines to modern Spark-based frameworks and Databricks where applicable.

  • Use Unix/Linux commands for common tasks and shell scripting to automate data engineering workflows.

  • Support CI/CD deployment processes for ETL pipelines.

  • Implement logging, auditing, and monitoring for production pipelines. Work with data architects, analysts, and business stakeholders to gather requirements and deliver data solutions.

  • Participate in design reviews, architecture discussions, and code reviews.

  • Mentor junior data engineers and provide technical guidance.

  • Be the SME for DBX and do knowledge training for team

Requirements

The ideal candidate will have strong experience working with Databricks (Lakehouse, Delta Lake, Workflows, Medallion Architecture, Apache SPARK, Unity Catalog, Delta Sharing, Notebooks, SQL, GIT), PySpark, Python, Snowflake, and ADF frameworks. This role focuses on building and optimizing large-scale data pipelines using ADF, Apache Iceberg, Delta Lake, cloud data lakes (ADLS/S3), and workflow orchestration tools like Airflow. The analyst will work closely with data architects, and platform teams to build reliable and governed data solutions aligned with enterprise standards.

Must Have:

  • Bachelor's degree in computer science, Computer Applications, Analytics, Data Science, or Information Technology.
  • 8+ years of experience in ETL / Data Engineering
  • 8+ years of experience with programming using Python
  • 8+ years of experience working in Unix/Linux environments
  • 8+ years of experience writing Shell scripts
  • 6+ years of experience with Databricks ecosystem including Lakehouse, Delta Lake, Workflows, Medallion Architecture, Apache SPARK, PySpark, Unity Catalog, Delta Sharing, Notebooks, SQL, GIT.
  • 6+ years of experience with ADF
  • 6+ years of experience working with large enterprise datasets
  • 4+ years of experience with Snowflake
  • Strong analytical and troubleshooting skills
  • Excellent communication and collaboration abilities
  • Ability to work independently and mentor junior analysts
  • Strong documentation and design skills
  • Strong SQL skills
  • Experience implementing governance using Unity Catalog
  • Experience working with Apache Iceberg or other open table formats
  • Experience working with Azure Data Lake Storage (ADLS) or AWS S3
  • Understanding of cloud data lake architecture
  • Hands-on experience with Apache Airflow
  • Experience developing pipelines for Snowflake
  • Strong understanding of SAS programming, SAS Data step, SAS Macros, PROC SQL

Nice To Have:

  • Experience migrating SAS ETL pipelines to Spark and Databricks
  • Knowledge of data governance frameworks
  • Healthcare experience

Apply for this position