Data Engineer / Architect
VDart, Inc.
Parsippany-Troy Hills, United States of America
7 days ago
Role details
Contract type
Permanent contract Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
English Experience level
Senior Compensation
$ 210KJob location
Parsippany-Troy Hills, United States of America
Tech stack
Airflow
Amazon Web Services (AWS)
Amazon Web Services (AWS)
Big Data
Clinical Trial Management Systems
Cloud Computing
Cloud Database
Cloud Engineering
Continuous Integration
Directed Acyclic Graph (Directed Graphs)
Data Governance
Data Integrity
ETL
Data Systems
Fault Tolerance
PostgreSQL
Regression Analysis
SQL Databases
Enterprise Data Management
Data Processing
Feature Engineering
Spark
Model Validation
GIT
Data Lake
Data Analytics
Star Schema
Data Management
Machine Learning Operations
Software Version Control
Data Pipelines
Databricks
Job description
- Design and develop scalable data pipelines using dbt Cloud, Databricks and Apache Airflow to support enterprise analytics and reporting
- Build and optimize Delta Lake-based data models to enable analytics-ready datasets
- Implement advanced data modeling techniques including star schema, fact/dimension design, and SCD Type 1 & Type 2
- Develop modular, reusable, and testable SQL-based transformations using DBT models, macros, and packages
- Design and manage incremental data loading strategies, ensuring efficient processing of large-scale datasets
- Leverage Databricks SQL, Spark, and Delta Lake capabilities for high-performance data processing and optimization
- Implement robust data quality checks and testing frameworks using DBT tests (e.g., not null, unique, referential integrity)
- Collaborate with cross-functional teams including data engineers, data scientists, and BI teams to deliver business-driven data solutions
- Integrate DBT pipelines with CI/CD workflows using Git-based version control and orchestrate jobs via Databricks Workflows or external schedulers
- Ensure adherence to data governance, security, and compliance standards, leveraging tools like Unity Catalog and enterprise policies.
- Orchestrate end-to-end workflows using Airflow DAGs, ensuring dependency management, scheduling, retries, and fault tolerance
Technical Expertise
- AWS & Cloud Architecture: Expert-level experience with AWS services (S3, RDS, Bedrock agents), PostgreSQL, and cloud-based data governance
- Advanced Analytics: Regression analysis, time-series forecasting, multivariate analysis, and classification models
- MLOps & Deployment: Design and maintain model deployment, monitoring, and automated retraining pipelines
- Simulation & Forecasting: Agent-based simulation for trial enrollment forecasting and scenario planning
Data & Analytics Capabilities
- Feature Engineering: Extract insights from site performance, historical enrollment, and competitive landscape data
- Model Evaluation: Build evaluation frameworks (AUC, precision/recall) and optimize model granularity across disease/geography
- Enterprise Data Integration: Merge internal (CTMS, performance data) and external sources (Citeline, epidemiological data)
- Master Data Management: Create Golden ID datasets with data quality monitoring and continuous refresh capabilities
Requirements
- 5+ years in pharmaceutical/clinical trial analytics
- Focus on site selection and non-enrollment prediction
- Proven track record with clinical operations data systems
Benefits & conditions
- $108,000-120,000 per year
About the company
© 2026 Careerjet All rights reserved