Data Science (Talent Network)
Role details
Job location
Tech stack
Job description
Focused on turning challenging, real-world industrial data into clean, analysis-ready datasets. You'll spend much of your time wrangling inspection records, anomaly registers and engineering exports, and producing early-stage analysis that feeds into larger client deliverables. It's a hands-on learning role with a clear pathway for progression into a Data Scientist role., * Clean, validate and structure raw datasets from a wide variety of sources (Excel, CSV, PDF exports, database extracts).
- Perform exploratory data analysis to surface patterns, outliers and data-quality issues.
- Produce first-pass summaries, tables and simple visualisations to support senior team members.
- Write reusable, well-documented Python for repeatable data-cleaning tasks.
- Support data validation and QA on deliverables before they reach clients., * Design, build and maintain ETL/ELT pipelines on AWS.
- Manage relational, NoSQL and vector databases, and choose the right store for the job.
- Build and operate data services in containers, with appropriate orchestration and scheduling.
- Implement monitoring, logging and CI/CD so pipelines are observable and reproducible.
- Work with the data science and ML teams to provision data and embeddings for downstream use., * Design and build interactive dashboards and reports for internal and client use.
- Perform statistical analysis (trends, correlations, anomaly/outlier detection) and translate findings into clear recommendations.
- Work directly with stakeholders to understand requirements and iterate on deliverables.
- Build self-contained, branded dashboards where a BI tool isn't the right fit.
- Ensure analyses are validated, reproducible and well-documented., * Train, fine-tune and evaluate ML and LLM-based models for real business problems.
- Build retrieval-augmented (RAG) and semantic-search systems over large document collections.
- Deploy and serve models on local GPU hardware as well as cloud where appropriate.
- Prototype rapidly, scope new use cases and demonstrate value before productionising.
- Work with the data engineering team on embeddings, pipelines and serving infrastructure.
Requirements
Do you have a Master's degree?, We're keen to hear from individuals at all stages of their data careers, from aspiring Data Analysts and Junior Data Scientists looking to develop their skills, through to experienced Data Engineers, Data Scientists and Machine Learning Engineers. We value people who are curious, collaborative and passionate about using data to solve problems, whether that's transforming messy industrial datasets into actionable insights, building robust cloud-based data platforms, creating compelling visualisations for decision-makers, or developing innovative AI and machine learning solutions. Experience in engineering, energy, infrastructure or other data-rich environments is particularly welcome, but above all we're looking for people who bring fresh thinking, technical excellence and a desire to continuously learn and innovate alongside our existing team., * Degree in a quantitative or technical discipline (data science, engineering, maths, physics, computing or similar).
-
Working knowledge of Python with pandas and numpy.
-
Comfortable with spreadsheets and basic SQL.
-
Strong attention to detail and a methodical approach to messy data.
-
Clear written communication and willingness to learn.
Desirable
- Exposure to version control (Git).
- Experience with engineering, energy or other heavily regulated industrial data.
- Basic familiarity with a visualisation tool (Power BI, matplotlib, plotly)., * Strong Python and SQL.
- Hands-on AWS (e.g. S3, RDS, App Runner / ECS, Secrets Manager, IAM).
- Relational databases (PostgreSQL) and at least one NoSQL store (e.g. MongoDB).
- Containerisation with Docker / Docker Compose.
- Experience with task queues / orchestration (Celery, Airflow or similar).
Desirable
- Vector databases (Milvus, or alternatives such as Pinecone / pgvector) and an understanding of embedding-based retrieval.
- CI/CD pipelines and infrastructure-as-code.
- Experience handling unstructured data (documents, drawings, scanned records) at scale.
Data Scientist (Visualisation & Analysis)
Experience: Mid Level
The bridge between data and decision-makers. You'll turn complex datasets into clear, client-facing dashboards and analysis that drive operational and integrity decisions. This role blends solid statistical analysis with strong storytelling through visualisation., * Strong Power BI (including DAX) and/or Tableau.
- Python for analysis and visualisation (pandas, plotly/matplotlib).
- SQL and comfort working with multiple data sources.
- Solid grounding in applied statistics.
- Excellent communication - able to present technical results to non-technical audiences.
Desirable
- Front-end visualisation skills (HTML/CSS/JS, libraries such as Leaflet, D3 or Chart.js) for bespoke dashboards.
- Experience with KPI/earned-value reporting, S-curves or campaign/programme tracking.
- Background in asset integrity, inspection or other operational engineering data.
Machine Learning Engineer
Experience / Level: Mid-Senior
A founding-style ML role for someone who wants to shape how machine learning is applied across the business rather than maintain an existing stack. The focus areas are document intelligence, semantic search and LLMs, with the freedom to identify and prove new applications. The team runs dedicated on-prem AI hardware (2 NVIDIA DGX Spark), so experience getting models running efficiently on local GPU infrastructure is a real plus., * Hands-on experience with transformer models and the Hugging Face ecosystem.
- LLM application experience: fine-tuning, prompting, RAG, and embedding/vector search.
- Understanding of model evaluation, and the practical trade-offs of accuracy vs. cost/latency.
- Comfortable with Git and reproducible ML workflows.
Desirable (and worth shouting about)
- Experience deploying models on NVIDIA GPU hardware - local inference/serving with Ollama, vLLM, TensorRT-LLM or similar, and the NVIDIA software stack (CUDA).
- Model optimisation for constrained hardware: quantisation, LoRA/PEFT fine-tuning, mixture-of-experts.
- OCR / document-layout models (PaddleOCR, DocTR, Surya, CRAFT) and computer vision on technical documents/drawings.