Data Engineer / Architect

VDart, Inc.
Parsippany-Troy Hills, United States of America
7 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior
Compensation
$ 210K

Job location

Parsippany-Troy Hills, United States of America

Tech stack

Airflow
Amazon Web Services (AWS)
Amazon Web Services (AWS)
Big Data
Clinical Trial Management Systems
Cloud Computing
Cloud Database
Cloud Engineering
Continuous Integration
Directed Acyclic Graph (Directed Graphs)
Data Governance
Data Integrity
ETL
Data Systems
Fault Tolerance
PostgreSQL
Regression Analysis
SQL Databases
Enterprise Data Management
Data Processing
Feature Engineering
Spark
Model Validation
GIT
Data Lake
Data Analytics
Star Schema
Data Management
Machine Learning Operations
Software Version Control
Data Pipelines
Databricks

Job description

  • Design and develop scalable data pipelines using dbt Cloud, Databricks and Apache Airflow to support enterprise analytics and reporting
  • Build and optimize Delta Lake-based data models to enable analytics-ready datasets
  • Implement advanced data modeling techniques including star schema, fact/dimension design, and SCD Type 1 & Type 2
  • Develop modular, reusable, and testable SQL-based transformations using DBT models, macros, and packages
  • Design and manage incremental data loading strategies, ensuring efficient processing of large-scale datasets
  • Leverage Databricks SQL, Spark, and Delta Lake capabilities for high-performance data processing and optimization
  • Implement robust data quality checks and testing frameworks using DBT tests (e.g., not null, unique, referential integrity)
  • Collaborate with cross-functional teams including data engineers, data scientists, and BI teams to deliver business-driven data solutions
  • Integrate DBT pipelines with CI/CD workflows using Git-based version control and orchestrate jobs via Databricks Workflows or external schedulers
  • Ensure adherence to data governance, security, and compliance standards, leveraging tools like Unity Catalog and enterprise policies.
  • Orchestrate end-to-end workflows using Airflow DAGs, ensuring dependency management, scheduling, retries, and fault tolerance

Technical Expertise

  • AWS & Cloud Architecture: Expert-level experience with AWS services (S3, RDS, Bedrock agents), PostgreSQL, and cloud-based data governance
  • Advanced Analytics: Regression analysis, time-series forecasting, multivariate analysis, and classification models
  • MLOps & Deployment: Design and maintain model deployment, monitoring, and automated retraining pipelines
  • Simulation & Forecasting: Agent-based simulation for trial enrollment forecasting and scenario planning

Data & Analytics Capabilities

  • Feature Engineering: Extract insights from site performance, historical enrollment, and competitive landscape data
  • Model Evaluation: Build evaluation frameworks (AUC, precision/recall) and optimize model granularity across disease/geography
  • Enterprise Data Integration: Merge internal (CTMS, performance data) and external sources (Citeline, epidemiological data)
  • Master Data Management: Create Golden ID datasets with data quality monitoring and continuous refresh capabilities

Requirements

  • 5+ years in pharmaceutical/clinical trial analytics
  • Focus on site selection and non-enrollment prediction
  • Proven track record with clinical operations data systems

Benefits & conditions

  • $108,000-120,000 per year

About the company

© 2026 Careerjet All rights reserved

Apply for this position