Data Engineer

Tyne & Wear

Boldon Colliery, United Kingdom

7 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Boldon Colliery, United Kingdom

Tech stack

API

Artificial Intelligence

Amazon Web Services (AWS)

Fluid

Cloud Computing

Information Engineering

Middleware

JSON

Python

Search Technologies

Siemens NX

Parquet

Multi-Cloud

Data Lake

PySpark

Amazon Web Services (AWS)

Ansys

Job description

Role summary:The overall technical lead and architect. Designs the metadata schema, builds the simulation onboarding pipeline, deploys metadata embedding pipeline and OpenSearch k-NN vector store, and authors data export format spec for AI/ML use case. This role is the deepest technical seat on the engagement: Key responsibilitiesRun the Sprint 1 architecture review of the existing UAT codebase (S3 + Glue + S3 Tables + OpenSearch + Athena) and deliver written gap findings.Design the metadata schema, taxonomy, and field catalogue (Light, Brain, Power).Tune data orchestration - Glue jobs, Athena queries, S3 Tables config, scheduling. Lead the deep-dive technical sessions with analysts on visualization requirements Build and validate the simulation data onboarding pipeline against real data - including the 30 GB-per-run acoustic spectra dataset.Configure and validate the OpenSearch k-NN vector store and the Bedrock embedding pipeline.Author the AI/ML data export format specification and

Requirements

the AI onboarding pattern document.Co-design the API middleware blueprint with the Cloud Infrastructure Architect. Must-have Principal-level hands-on data engineering on AWS - 7+ years Deep production experience with S3, S3 Tables, Glue, Athena, and OpenSearch (including k-NN / vector search) Built and shipped vector embedding workloads Strong metadata modelling and data taxonomy design experience for scientific or engineering domains Comfort working with Parquet, JSON-LD, and large binary scientific data formats (mesh, time-series, spectra) Python proficiency; PySpark / Glue job tuning experience Nice-to-have / differentiatorsPrior simulation / CAE / HPC data lake experience (Ansys, Siemens NX, BETA CAE, OpenFOAM, etc.)Familiarity with surrogate model training data pipelinesExperience with SageMaker Unified Studio or comparable governed data-mesh tooling (in case of required integration)Multi-cloud data engineering (AWS GCP) experiencePublished or contributed to AWS data architecture patterns or blueprints

Role details

Job location

Tech stack

Job description

Requirements

Apply for this position

Good distractions

Moments

Videos View all