Data Engineer

Exel Inc.
Rockville, United States of America
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English

Job location

Rockville, United States of America

Tech stack

API
Amazon Web Services (AWS)
Business Analytics Applications
Data analysis
Audit Trail
Automation of Tests
Bioinformatics
Health Informatics
Clinical Data Management
Cloud Computing
Cloud Database
Cloud Storage
Software Documentation
Code Review
Information Systems
Databases
Data Validation
Information Engineering
Data Governance
Data Infrastructure
Data Integration
ETL
Data Mapping
Data Security
Data Structures
Data Systems
Data Warehousing
Relational Databases
Database Theory
Python
PostgreSQL
Metadata
Meta-Data Management
Operational Data Store
DataOps
Scientific Computating
Software Deployment
Software Engineering
SQL Databases
Data Streaming
Systems Integration
Unstructured Data
Web Applications
Data Processing
Scripting (Bash/Python/Go/Ruby)
Data Ingestion
Fast Healthcare Interoperability Resources
Snowflake
Spark
Jupyter
GIT
Data Lake
PySpark
Information Technology
Data Lineage
Health Level Seven International
Data Management
Tools for Reporting
Api Design
Streamlit Framework
Data Pipelines
Databricks

Job description

  • Data Pipeline Development: Design, build, test, and maintain data pipelines to ingest, transform, harmonize, and integrate diverse biomedical and research data sources, including clinical, genomic, experimental, imaging, biospecimen, operational, and other scientific datasets. Develop reusable transformation logic and curated datasets that support analytics, reporting, dashboards, applications, APIs, and downstream research workflows.

  • Data Integration and Lifecycle Support: Support the full research data lifecycle by enabling reliable data movement from source systems and storage environments into structured, analysis-ready formats. Assist with data ingestion, curation, metadata capture, data refreshes, source-to-target mapping, schema management, and long-term maintainability of data products and workflows.

  • Collaboration: Work closely with data scientists, bioinformaticians, researchers, application developers, project managers, and government stakeholders to gather requirements and deliver practical data solutions. Translate scientific and operational data needs into technical specifications, data models, transformation logic, and reusable datasets that accelerate biomedical research workflows and support informed decision-making.

  • Quality & Governance: Implement data validation checks, reconciliation routines, testing practices, and monitoring processes to ensure data accuracy, completeness, consistency, and integrity. Follow data governance and security best practices, including documentation of transformations, lineage, assumptions, access requirements, and compliance considerations related to sensitive, regulated, de-identified, or access-controlled research data.

  • Dashboarding & Integration: Create or support interactive dashboards, reporting layers, APIs, and application-ready datasets that allow researchers and stakeholders to visualize, explore, and analyze data. Support integration between data pipelines, databases, cloud platforms, analytics environments, and approved application platforms to enable scalable and secure data access.

  • Operational Support and Modernization: Troubleshoot data pipeline failures, source system inconsistencies, data quality issues, schema changes, access issues, and performance bottlenecks. Contribute to modernization efforts by improving automation, documentation, scalability, reproducibility, and platform readiness across environments.

Requirements

The ideal candidate will have strong experience with Python, SQL, ETL/ELT development, data modeling, data quality practices, and research data lifecycle support. This role requires the ability to work with complex multi-source datasets, support analytics and application-facing data products, and contribute to scalable, well-governed data solutions that align with the Data Science Client Services branch priorities for data accessibility, interoperability, reproducibility, modernization, and secure research enablement., * Education & Background: Bachelor's degree in Computer Science , Data Science, Bioinformatics, Biomedical Informatics, Information Systems, Engineering, or a related field, or equivalent practical experience. Proven experience as a Data Engineer, Analytics Engineer, Data Integration Developer, Bioinformatics Engineer, or similar data-intensive role, preferably supporting analytics, biomedical research, healthcare, scientific computing, or research data teams.

  • Data Engineering Expertise: Strong proficiency in Python and SQL for data manipulation, transformation, scripting, automation, and analysis. Hands-on experience building ETL/ELT processes and data pipelines to support large, complex, multi-source datasets. Familiarity with scalable data processing approaches, including Spark/ PySpark or similar frameworks, for high-volume or complex transformations is required.

  • Analytical Skills: Solid understanding of data modeling, relational databases, data warehouses, data lakes, metadata, and database concepts. Ability to work with complex, multi-modal datasets, including structured, semi-structured, and unstructured data, and optimize data workflows for reliability, performance, usability, and long-term maintainability.

  • Best Practices: Knowledge of software engineering and data engineering best practices, including version control using Git, code review, automated testing, documentation, peer review, and change management. Experience ensuring data quality and using lineage, provenance tracking, audit trails, or documentation practices to support transparency, reproducibility, and data flow traceability.

  • Collaboration & Communication: Excellent problem-solving skills and the ability to communicate effectively with both technical and non-technical stakeholders. Comfortable working in an interdisciplinary environment with biomedical researchers, analysts, developers, and project teams. Capable of translating domain-specific needs into technical solutions and explaining technical risks, limitations, and dependencies in clear stakeholder-focused language.

  • Domain Alignment: Strong interest in biomedical science, clinical research, healthcare data, and scientific discovery. Ability to quickly learn domain-specific concepts, data structures, terminology, and research workflows. Demonstrated awareness of sensitive data handling, privacy, access control, data governance, and regulatory or compliance expectations associated with biomedical and clinical research data.

Preferred Qualifications (Plus Skills)

  • Platform-as-a-Service and Data Platform Experience: Hands-on experience building data solutions in modern data platforms or platform-as-a-service environments such as Snowflake, Databricks, Palantir, cloud data warehouses, data lakes, or similar platforms. Experience supporting integrations across databases, cloud storage, APIs, analytics platforms, dashboards, and application environments is preferred.

  • Research and Application Enablement: Experience preparing curated datasets for dashboards, APIs, web applications, reporting tools, notebooks, or scientific computing environments. Familiarity with research-facing tools and platforms such as Posit Connect, R/Shiny, Streamlit , Jupyter, Galaxy, Code Ocean, or similar analytics and application delivery environments is a plus.

  • Cloud, Storage, and Automation Experience: Experience working with cloud or hybrid data environments, object storage such as S3, relational databases such as Postgres, automated data refreshes, scheduled jobs, API-based integrations, and secure data movement across controlled environments.

  • Biomedical Domain Knowledge: Previous experience in biomedical research, healthcare analytics, clinical research, public health, pharmaceutical research and development, or scientific data management. Familiarity with biomedical data standards or datasets, such as clinical trial data, clinical imaging, laboratory data, biospecimen data, transcriptomics/genomic data, HL7/FHIR, CDISC, OMOP, or related standards, and an understanding of the scientific research process will help you excel in this role.

  • Governance and Reproducibility: Experience supporting data governance, metadata management, data lineage, reproducible workflows, documentation standards, and secure handling of de-identified, sensitive, or access-controlled research datasets.

Disclaimer: The above description is meant to illustrate the general nature of work and level of effort being performed by individuals assigned to this position or job description. This is not restricted as a complete list of all skills, responsibilities, duties, and/or assignments required. Individuals may be required to perform duties outside of their position, job description or responsibilities as needed.

Benefits & conditions

Benefits We Offer:

  • 100% Medical, Dental & Vision Coverage for Employees
  • Paid Time Off and Paid Holidays
  • 401K match up to 5%
  • Educational Benefits for Career Growth
  • Employee Referral Bonus
  • Flexible Spending Accounts:
  • Healthcare (FSA)
  • Parking Reimbursement Account (PRK)
  • Dependent Care Assistant Program (DCAP)
  • Transportation Reimbursement Account (TRN), This role has a market-competitive salary with an anticipated base compensation range listed below. Actual salaries will vary depending on a candidate's experience, qualifications, skills, and location.

About the company

Axle is a bioscience and information technology company that offers advancements in translational research, biomedical informatics, and data science applications to research centers and healthcare organizations nationally and abroad. With experts in biomedical science, software engineering, and program management, we focus on developing and applying research tools and techniques to empower decision-making and accelerate research discoveries. We work with some of the top research organizations and facilities in the country including multiple institutes at the National Institutes of Health (NIH).

Apply for this position