GenAI Data Engineer

Postaladdress Uk
Edinburgh, United Kingdom
4 days ago

Role details

Contract type
Temporary contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English

Job location

Edinburgh, United Kingdom

Tech stack

Artificial Intelligence
Amazon Web Services (AWS)
Amazon Web Services (AWS)
Information Engineering
ETL
Data Warehousing
Software Debugging
Distributed Computing Environment
Distributed Systems
Amazon DynamoDB
Python
Machine Learning
SQL Databases
Unstructured Data
Parquet
Data Processing
Data Storage Technologies
Large Language Models
Database Optimization
Generative AI
Data Lake
PySpark
Machine Learning Operations
Data Pipelines
Redshift

Job description

6 months contract position Your responsibilities: * Design and maintain scalable data pipelines using PySpark, Python, and distributed computing frameworks to support high-volume data processing. * Architect and optimize AWS-based data and AI infrastructure, ensuring secure, performant, and cost-efficient ingestion, transformation, and storage. * Develop, finetune, benchmark, and evaluate GenAI/LLM models, including custom training and inference optimization. * Implement and maintain RAG pipelines, vector databases, and document-processing workflows for enterprise GenAI applications. * Build reusable frameworks for prompt management, evaluation, and GenAI operations. * Collaborate with cross-functional teams to integrate GenAI capabilities into production systems and ensure high-quality data, governance, and operational reliability Your Profile Essential skills/knowledge/experience: * Strong experience with PySpark, distributed data processing, and largescale ETL/ELT pipelines. *

Requirements

Strong SQL expertise including star/snowflake schema design, indexing strategies, writing optimized queries, and implementing CDC, SCD Type 1/2/3 patterns for reliable data warehousing. * Advanced proficiency in Python for data engineering, automation, and ML/GenAI integration. * Hands on expertise with AWS services (S3, Glue, Lambda, EMR, Bedrock / custom model hosting). * Practical experience with Gen AI/LLM model creation, finetuning, benchmarking, and evaluation. * Solid understanding of RAG architectures, embeddings, vector stores, and LLM evaluation methods. * Experience working with structured and unstructured datasets (documents, logs, text, images). * Familiarity with scalable data storage solutions (Delta Lake, Parquet, Redshift, DynamoDB). * Understanding model optimization techniques (quantization, distillation, inference optimization). * Strong capability to debug, tune, and optimize distributed systems and AI pipelines. * Desirable skills/knowledge/experience: (As applicable) * Pyspark, Python, SQL, AWS, Gen AI

Apply for this position