Data Engineer
Role details
Job location
Tech stack
Job description
- Build and operate data ingestion pipelines from multiple sources.
- Ensure data quality, reliability, and consistency through monitoring.
- Collaborate with stakeholders on data standards and practices.
Conocimientos
Data Engineering SQL Python ETL/ELT pipelines Data modeling Data quality assurance Collaboration Monitoring CI/CD
Herramientas
Airflow dbt Snowflake Power BI Git Descripción del empleo About AI Labs at Insud Pharma
AI Labs is Insud Pharma's transversal team for Artificial Intelligence, Data Science, and Machine Learning, working across the group to deliver production-ready data and AI solutions with real impact. The team operates across a wide range of areas, including RD and clinical data, general health and epidemiology, manufacturing and quality, supply chain and operations, and business analytics, partnering closely with business units and external organizations. AI Labs combines strong engineering standards with pragmatic execution, focusing on building scalable AI-enabled solutions that move from experimentation to real-world adoption. Role Context
This role will be primarily focused on projects linked to Fundación Mundo Sano, an international organization dedicated to improving health and quality of life for vulnerable communities through research, innovation, and international cooperation (e.g. neglected diseases such as Chagas). The goal of this position is to ensure that data coming from multiple sources becomes available, consistent, reliable, and reusable, enabling dashboards, reporting, and AI/ML use cases. Due to the international nature of the projects, the role may involve occasional travel to Latin America, Africa, or other regions to work closely with local teams and better understand data generation on the ground. Role Objective
Build and operate a coherent, well-structured data foundation for Fundación Mundo Sano projects by owning the data engineering layer end-to-end: ingestion, modeling, data quality, availability, monitoring, and data delivery for dashboards and AI enablement., * Build and operate data ingestion pipelines (ETL/ELT) from multiple sources (field programs, research datasets, epidemiological surveillance systems, partners, files, APIs).
- Design and maintain data models and curated datasets that standardize entities, metrics, and definitions across projects.
- Ensure data quality, reliability, and consistency through automated checks, monitoring, and basic observability.
- Decide how data is structured, stored, and versioned to enable long-term reuse and scalability.
- Make data available and easy to consume for dashboards, reporting, and AI/ML use cases.
- Proactively guide business and project teams on data best practices, setting standards, shaping requirements, and influencing how data should be collected, structured, and used.
- Collaborate closely with stakeholders to translate needs into scalable, maintainable data foundations.
Technologies (examples - adapt to actual stack)
- Languages: SQL, Python
- Pipelines / orchestration: Airflow, Prefect, Dagster or similar
- Transformations: dbt or equivalent
- Storage: Data warehouse / lakehouse (e.g. Snowflake, BigQuery, Databricks, Synapse)
- Data quality / monitoring: Great Expectations, Soda, or similar
- BI / Dashboards: Power BI, Tableau, Looker or similar
- Engineering basics: Git, CI/CD, basic cloud concepts (AWS / Azure / GCP)
Requirements
The ideal candidate must have strong SQL and Python skills, along with experience in managing diverse data sources. Knowledge of ETL/ELT processes and tools like Airflow and Snowflake is essential. This position offers a permanent contract and flexible working hours., * 5+ years of experience in Data Engineering or Analytics Engineering roles.
- Strong SQL and solid Python skills for building ETL/ELT pipelines.
- Experience integrating diverse data sources including APIs and files., * A senior, hands-on Data Engineer with a strong ownership mindset, comfortable building and operating core data structures and pipelines.
- 5+ years of experience in Data Engineering or Analytics Engineering roles.
- Strong SQL and solid Python, with hands-on experience building and running ETL/ELT pipelines in production.
- Proven experience integrating heterogeneous and diverse data sources (multiple systems, files, APIs, changing schemas, inconsistent identifiers).
- Good understanding of data modeling and analytical data structures, with the ability to standardize entities, metrics, and definitions across projects.
- Experience ensuring data quality, reliability, and monitoring, including automated checks and basic observability.
- Comfortable making data available for dashboards, reporting, and AI/ML use cases through curated, analytics-ready datasets.
- Able to work proactively with business and project teams, shaping requirements and setting data standards rather than waiting for fully specified inputs.
- Spanish as the daily working language; English required for specific projects and international collaboration.
- Pragmatic, ownership-driven mindset, strong communication skills, and motivation to work on social and public-health impact projects.
Benefits & conditions
Our benefits
- Flexible start time from Monday to Friday
- Permanent contract.