Data Scientist
Role details
Job location
Tech stack
Job description
GroundWork seeks a Data Scientist to design, build, and maintain the data infrastructure, access software, and web applications that make lab and field measurement data reliably available to our parent company and internal stakeholders. This role sits at the intersection of database engineering, full-stack software development, data quality assurance, and AI-assisted tooling, ensuring that high-integrity datasets are collected, managed, and surfaced through modern, scalable systems. The ideal candidate combines strong technical skills in database management and software development with familiarity with solar energy measurement, accredited laboratory environments, and regulatory data standards.
As a Data Scientist, you will work closely with GroundWork's engineering, laboratory, and operations teams to prioritize, design, and deliver robust data systems that meet the needs of both internal users and our parent company. You will leverage AI-assisted development tools to accelerate the delivery of web applications and data pipelines, while maintaining the rigor and traceability required in a laboratory and regulatory context. This role requires a technically versatile individual who can work across disciplines to drive data reliability, accessibility, and operational excellence.
Key Responsibilties
-
Technical Subject Matter Expertise: Design, implement, and maintain relational and time-series databases for lab instrument data, environmental measurements, and operational records. Develop and manage ETL/ELT pipelines to ingest, transform, and store data from IoT sensors, measurement hardware, and remote sensing platforms. Build and deploy data access APIs and web applications using modern tools (e.g., Streamlit, FastAPI, React, or similar frameworks) to enable parent company analysts and stakeholders to query, visualize, and export lab data. Apply AI-assisted development tools (e.g., GitHub Copilot, Cursor, Claude) to accelerate software delivery while maintaining code quality and auditability appropriate for a laboratory environment.
-
Data Quality Assurance & Control: Develop and enforce QA/QC protocols to validate incoming data from lab instruments and field sensors in accordance with applicable regulatory and accreditation standards (e.g., ISO 17025 or similar). Implement automated checks, flagging routines, statistical validation, and audit trails to detect anomalies, missing data, and calibration drift. Maintain defensible data records that satisfy chain-of-custody and traceability requirements. Ensure data integrity from acquisition through delivery to downstream consumers.
-
Database Architecture & Optimization: Architect and optimize database schemas for performance, scalability, and ease of access. Evaluate and recommend appropriate database technologies (SQL, NoSQL, time-series) based on data volume, query patterns, and reporting requirements of the lab and parent company.
-
Stakeholder Collaboration: Partner with lab scientists, operations, and parent company data teams to understand data access requirements and translate them into technical solutions. Serve as the primary point of contact for data availability and reporting needs.
-
Web Application Development: Design and build web-based data access tools, dashboards, and reporting interfaces using modern full-stack frameworks (e.g., React, FastAPI, Streamlit, Plotly Dash). Leverage AI-assisted development environments (e.g., GitHub Copilot, Cursor, Claude Code, or similar) to accelerate development cycles while ensuring maintainability, security, and compliance with lab data governance requirements. Enable non-technical users at the parent company to explore, filter, and export lab datasets through intuitive interfaces without requiring direct database access.
-
Data Governance & Documentation: Maintain comprehensive data dictionaries, schema documentation, and data lineage records consistent with laboratory quality management systems. Contribute to laboratory SOPs and data management plans. Stay current with emerging data engineering technologies, AI tooling, and laboratory informatics practices to continuously improve the lab's data infrastructure.
Requirements
Do you have experience in Laboratory experience?, Do you have a Bachelor's degree?, * Experience: Minimum of 2 years of experience in database engineering, data software development, or a related technical discipline, preferably in a laboratory, scientific, or renewable energy context. Experience in photovoltaic (PV) testing, solar energy measurement, or a physical laboratory environment is highly preferred.
-
Education: Bachelor's degree in computer science, software engineering, information systems, data science, or a related field; advanced degree or relevant certifications preferred.
-
Skills:
-
Proficiency in SQL and experience with relational databases (PostgreSQL, MySQL, or similar); familiarity with time-series or NoSQL databases a plus.
-
Proficiency in Python (pandas, SQLAlchemy, FastAPI, or similar) for data engineering, scripting, and backend service development.
-
Experience building web applications or data dashboards using tools such as Streamlit, Dash, FastAPI, React, or modern AI-assisted development environments (e.g., GitHub Copilot, Cursor, Claude Code); ability to deliver functional, user-facing tools rapidly using AI pair-programming workflows.
-
Experience implementing QA/QC workflows for scientific or sensor data, including anomaly detection, validation rules, statistical flagging, and audit logging; familiarity with laboratory quality management standards (e.g., ISO 17025, GLP, or similar regulatory frameworks) is a strong plus.
-
Excellent communication skills; ability to translate complex technical data concepts for non-technical stakeholders including lab scientists and business analysts.
-
Familiarity with version control (Git), CI/CD practices, and cloud data platforms (AWS, Azure, or GCP); experience with containerization (Docker) is a plus.
-
Demonstrated experience using AI-assisted development tools (e.g., GitHub Copilot, Cursor, Claude Code, or similar) to write, debug, and refactor code; comfort evaluating AI-generated outputs for correctness, security, and suitability in a regulated laboratory data environment.
-
Understanding of laboratory informatics concepts and data management in accredited or regulated settings; experience with LIMS (Laboratory Information Management Systems) or similar platforms is a plus.
Approach to Work
-
Aligns with our values: Trustworthy, Caring, Knowledgeable, Trailblazing, Nimble and Meticulous.
-
Works collaboratively and directly with remote multi-functional teams and clients.
-
Presents a positive, 'can-do' attitude while working in a multi-project work environment.
-
Self-motivated, punctual, organized, and able to perform work with limited supervision.
-
Able to solve practical problems and deal with a variety of concrete variables in situations where only limited standardization exists.
-
Able to communicate verbally and in writing in a clear, concise, and professional manner.
Benefits & conditions
Pulled from the full job description
- Professional development assistance
- Parental leave
- 401(k)
- Health insurance
- Paid time off
- Vision insurance
- Health savings account, * 401(k)
- Dental insurance
- Flexible spending account
- Health insurance
- Health savings account
- Life insurance
- Paid time off
- Parental leave
- Professional development assistance
- Vision insurance