Data Platform Manager- Hybrid (candidates must be living in Pittsburgh, PA)

ACCAVALLO & COMPANY LLC

Pittsburgh, United States of America

yesterday

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Intermediate

Compensation

$ 155K

Job location

Pittsburgh, United States of America

Tech stack

API

Airflow

Azure

Big Data

Code Review

Computer Programming

Databases

Data Architecture

Data Validation

Information Engineering

Data Governance

Data Infrastructure

Data Integrity

ETL

Data Transformation

Data Security

Data Warehousing

Distributed Systems

Hadoop

Identity and Access Management

Python

Machine Learning

NoSQL

Performance Tuning

Powershell

Query Optimization

Role-Based Access Control

Azure

Scala

Simple Data Format

SQL Databases

Systems Integration

Parquet

Data Processing

Data Ingestion

Azure

Database Optimization

Spark

Data Lake

PySpark

Information Technology

Data Lineage

Avro

Kafka

Data Management

Machine Learning Operations

Presto

Data Pipelines

Key Vault

Databricks

Job description

The A.C.Coy company has an immediate opening for a Data Platform Manager. This role will be responsible for designing, building, and optimizing enterprise wide data platforms within the Data Warehouse. Responsibilities

Lead and mentor a team of data engineers, conducting code reviews and ensuring development standards
Support troubleshooting and incident management for data-related issues in production
Collaborate with business stakeholders, data scientists, and other team members to gather requirements and translate them into technical specifications
Lead the design, development and deployment of scalable and high-performance data pipelines using Azure Databricks; ensuring the data integrity, availability, efficient extraction, transformation, and loading of data from various sources into the Azure Databricks Data Warehouse
Collaborate with data scientists, analysts, and other engineering teams to deliver business-critical insights. Optimize pipeline performance, cost, and scalability in the Azure cloud environment
Define best practices for data ingestion, processing, storage, and governance. Implement data quality checks and validation procedures to ensure the accuracy and integrity of data between various sources, including API's, databases and streaming platforms
Collaborate with data scientists and analysts to operationalize and deploy machine learning models
Architecture Design:
Define the end-to-end Lakehouse architecture using Delta Lake, implementing medallion architecture (Bronze, Silver, Gold layers) for robust data processing
Familiarity with data modeling and schema design principles
Pipeline Engineering:
Oversee the development of robust, scalable batch and streaming ETL/ELT pipelines using PySpark, Scala, and SQL and with minimal latency
Implement data transformations, enrichment, and quality checks using PySpark/Scala within the Databricks environment
Integrate real-time and batch data sources using Apache Kafka and ADF
Support large-scale data pipelines using Apache Spark on Databricks, Kafka, Stelo, and Azure Data Factory (ADF)
Data Governance & Security:
Implement Unity Catalog for unified governance, data security, fine-grained access control (RBAC), privacy measures, and data lineage tracking
Performance Optimization & Tuning:
Tune Spark jobs and Databricks clusters to maximize throughput while maintaining cost efficiency through auto-scaling and cluster policies
Expertise in indexing strategies, query optimization, execution plans, and partitioning/sharding
Platform Integration:
Orchestrate workflows by integrating Databricks with other Azure services like Azure Data Factory (ADF), Azure Data Lake Storage (ADLS Gen2), and Azure DevOps for CI/CD pipelines

Requirements

Bachelor's degree in Computer Science, Engineering, or a related field, * 5-7+ years hands-on data engineering or architecture, with at least 2-4 years specifically focused on Azure Databricks, including Azure cloud technologies
2-5 years experience is preferred in managing a team of data engineers, data scientists and/or analysts
Certifications (Preferred): Microsoft Certified: Azure Data Engineer Associate (DP-203), Databricks Certified Data Engineer Professional, or Azure Solutions Architect Expert
Database Architecture: Proficiency in both Relational (SQL) and NoSQL (Document, Key-Value, Graph, Columnar) databases. Develop and maintain data models and schemas to support data analysis and reporting requirements
Distributed Systems: Knowledge of frameworks like Apache Hadoop, Spark, or Presto/Trino for optimizing and handling massive data volumes and retrieval mechanisms, ensuring the efficient processing of large datasets
Storage Optimization: Understanding file formats like Parquet, Avro, or ORC and compression techniques
Deep proficiency in programming languages: Python (specifically PySpark), SQL, PowerShell, and Scala
Infrastructure: Hands-on experience with Azure Cloud infrastructure, including Networking (VNETs), Key Vault, and Identity Management
Big Data Tools: Deep knowledge of Apache Spark runtime internals, MLflow for MLOps, and orchestration tools like Airflow

Data Platform Manager- Hybrid (candidates must be living in Pittsburgh, PA)

Role details

Job location

Tech stack

Job description

Requirements

About the company

Apply for this position

Role details

Job location

Tech stack

Job description

Requirements

About the company

Apply for this position

Good distractions

Moments

Videos View all