Principal Software Engineer- Data Platforms
Role details
Job location
Tech stack
Job description
The Principal Data Software Engineer plays a critical role in the hands-on delivery and evolution of IDBS's data platforms. Operating at principal level, the role focuses on building, operating, and continuously improving reliable, scalable, and compliant data capabilities that support AI, analytics, and enterprise workflows. Acting as a senior technical leader, the Principal Engineer influences data engineering decisions through deep practical expertise, balances near-term delivery with long-term maintainability, and ensures data platforms can sustainably support growing product, data, and regulatory demands. In this role, you will have the opportunity to:
- Lead the hands-on delivery of reliable, scalable data pipelines and datasets that power AI discoverability, analytics, reporting, and workflow automation across scientific, clinical, and enterprise data.
- Build, evolve, and operate data ingestion and processing capabilities for structured, semi-structured, and unstructured data, supporting the transition from early prototypes through early adoption and general release.
- Implement and maintain rich metadata and data quality practices that enable cross-record querying, traceability, and AI-ready data access across experiments, files, inventory, and workflows.
- Partner closely with architects, AI engineers, workflow engineers, and domain experts to ensure data is usable, performant, and trustworthy for downstream GenAI and analytics use cases.
- Act as a technical leader by setting a high bar for data engineering practices, mentoring engineers through code and delivery, and unblocking complex data challenges within the team.
Requirements
- Significant hands-on experience building and operating production data pipelines that support analytics, AI/ML, and enterprise application use cases.
- Strong practical experience working with unstructured and structured data, including ingestion, transformation, enrichment, indexing, and lifecycle management.
- Proven ability to deliver high-quality, production-grade data systems with a focus on data quality, reliability, scalability, observability, and operational support.
- Experience enabling data for downstream AI and reporting use cases, including cross-entity queries, contextual linking, and performant data access patterns.
- Demonstrated principal-level impact through technical delivery, mentorship, and collaboration across teams, rather than through line management or pure architecture ownership.
Desirable (but not essential):
- Experience supporting GenAI, NLP, or AI-driven discovery and reporting use cases through well-designed data pipelines and curated datasets.
- Familiarity with cloud-based data platforms and tooling, including Databricks and AWS, in production enterprise environments.
- Experience working with data in regulated or quality-sensitive domains (e.g. life sciences or GxP-aligned environments), including auditability and traceability considerations.