Data Science Engineer
Role details
Job location
Tech stack
Job description
The Department of Biomedical Informatics at Columbia University is seeking a highly motivated data science engineer to support large-scale observational research within the OHDSI (Observational Health Data Sciences and Informatics) network. This role will focus on the design, implementation, and execution of distributed network studies using electronic health record (EHR) and administrative claims data to generate real-world evidence.
The successful candidate will contribute to characterization, population-level estimation (causal inference), and patient-level prediction analyses across multi-institutional data networks. This position offers a unique opportunity to work at the intersection of biomedical informatics, data science, and clinical research within a leading academic medical center.
This position is a full-time two-year position with a possibility of an extension, contingent on available funding., * Design and implement observational network studies using distributed EHR and administrative claims data
- Conduct large-scale characterization, comparative effectiveness and safety estimation, and patient-level prediction analyses
- Develop reproducible analytic pipelines using R and SQL in relational database environments
- Apply and evaluate methods from causal inference (e.g., confounding control, bias assessment, sensitivity analyses)
- Apply machine learning approaches for predictive modeling using high-dimensional healthcare data
- Work with standardized data representations, including the OMOP Common Data Model and standardized clinical vocabularies for conditions, drugs, procedures, and measurements
- Collaborate with interdisciplinary teams including clinicians, statisticians, data engineers, and informaticians
- Contribute to scholarly outputs including manuscripts, presentations, and open-source analytic tools
- Support transparent, reproducible, and scalable research practices across distributed data networks
Requirements
Master?s degree in biostatistics, public health, epidemiology, informatics, computer science, or related field, and or equivalent in education and experience, with at least 2 years? related experience.
- At least 1 year of relevant prior work experience in the healthcare industry within a health system, a pharmaceutical company, or an insurer
- Strong programming experience in R and SQL
- Experience working with relational databases and large-scale healthcare datasets
- Demonstrated interest in observational research using real-world clinical or claims data
- Ability to design, implement, and document reproducible analytic workflows
- Strong written and verbal communication skills, PhD degree in biostatistics, public health, epidemiology, informatics, computer science, or related field, and/or equivalent and experience in education, with at least 1 year of related work experience., * Familiarity with the OMOP Common Data Model and standardized vocabularies (e.g., ICD, NDC, SNOMED, MedDRA, LOINC, CPT)
- Knowledge of causal inference methods for observational studies
- Experience with machine learning techniques for patient-level prediction
- Prior experience working in distributed or federated data networks
- Familiarity with open-source research ecosystems and collaborative scientific communities