GenAI Data Engineer
Role details
Job location
Tech stack
Job description
6 months contract position Your responsibilities: * Design and maintain scalable data pipelines using PySpark, Python, and distributed computing frameworks to support high-volume data processing. * Architect and optimize AWS-based data and AI infrastructure, ensuring secure, performant, and cost-efficient ingestion, transformation, and storage. * Develop, finetune, benchmark, and evaluate GenAI/LLM models, including custom training and inference optimization. * Implement and maintain RAG pipelines, vector databases, and document-processing workflows for enterprise GenAI applications. * Build reusable frameworks for prompt management, evaluation, and GenAI operations. * Collaborate with cross-functional teams to integrate GenAI capabilities into production systems and ensure high-quality data, governance, and operational reliability Your Profile Essential skills/knowledge/experience: * Strong experience with PySpark, distributed data processing, and largescale ETL/ELT pipelines. *
Requirements
Strong SQL expertise including star/snowflake schema design, indexing strategies, writing optimized queries, and implementing CDC, SCD Type 1/2/3 patterns for reliable data warehousing. * Advanced proficiency in Python for data engineering, automation, and ML/GenAI integration. * Hands on expertise with AWS services (S3, Glue, Lambda, EMR, Bedrock / custom model hosting). * Practical experience with Gen AI/LLM model creation, finetuning, benchmarking, and evaluation. * Solid understanding of RAG architectures, embeddings, vector stores, and LLM evaluation methods. * Experience working with structured and unstructured datasets (documents, logs, text, images). * Familiarity with scalable data storage solutions (Delta Lake, Parquet, Redshift, DynamoDB). * Understanding model optimization techniques (quantization, distillation, inference optimization). * Strong capability to debug, tune, and optimize distributed systems and AI pipelines. * Desirable skills/knowledge/experience: (As applicable) * Pyspark, Python, SQL, AWS, Gen AI