AI Sr. Data Engineer

Lenovo

Edinburgh, United Kingdom

2 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Edinburgh, United Kingdom

Tech stack

Java

Artificial Intelligence

Airflow

Amazon Web Services (AWS)

Data analysis

Azure

Big Data

Google BigQuery

Cloud Computing

Computer Programming

Databases

Computer Engineering

Information Engineering

Data Governance

Data Integrity

Data Security

Data Systems

Data Visualization

Data Warehousing

Relational Databases

Hadoop

Python

PostgreSQL

Machine Learning

MongoDB

MySQL

NoSQL

NumPy

Power BI

SQL Databases

Tableau

Data Processing

Google Cloud Platform

Feature Engineering

Snowflake

Spark

Pandas

Information Technology

Data Lineage

Cassandra

Data Management

Machine Learning Operations

Data Pipelines

Redshift

Job description

Data Creation & Annotation: Design, build, and implement processes for creating task-specific training datasets. This may include data labeling, annotation, and data augmentation techniques.
Data Pipeline Development: Leverage tools and technologies to accelerate dataset creation and improvement. This includes scripting, automation, and potentially working with data labeling platforms.
Data Quality & Evaluation: Perform thorough data analysis to assess data quality, identify anomalies, and ensure data integrity. Utilize machine learning tools and techniques to evaluate dataset performance and identify areas for improvement.
Big Data Technologies: Utilize database systems (SQL and NoSQL) and big data tools (e.g., Spark, Hadoop, cloud-based data warehouses like Snowflake/Redshift/BigQuery) to process, transform, and store large datasets.
Data Governance & Lineage: Implement and maintain data governance best practices, including data source tracking, data lineage documentation, and license management. Ensure compliance with data privacy regulations.
Collaboration with Model Developers: Work closely with machine learning engineers and data scientists to understand their data requirements, provide clean and well-documented datasets, and iterate on data solutions based on model performance feedback.
Documentation: Create and maintain clear and concise documentation for data pipelines, data quality checks, and data governance procedures.
Stay Current: Keep up-to-date with the latest advancements in data engineering, machine learning, and data governance.

Requirements

Education: Bachelor's or Master's degree in Computer Science, Computer Engineering, Electrical Engineering, Statistics, Mathematics, or a related field.
Experience: 15+ years of experience in a data engineering or data science role.
Programming Skills: Mastery in Python and SQL. Experience with other languages (e.g., Java, Scala) is a plus.
Database Skills: Strong experience with relational databases (e.g., PostgreSQL, MySQL) and NoSQL databases (e.g., MongoDB, Cassandra).
Big Data Tools: Experience with big data technologies such as Spark, Hadoop, or cloud-based data warehousing solutions (Snowflake, Redshift, BigQuery).
Data Manipulation: Proficiency in data manipulation and cleaning techniques using tools like Pandas, NumPy, and other data processing libraries.
ML Fundamentals: Solid understanding of machine learning concepts and techniques, including data preprocessing, feature engineering, and model evaluation.
Data Governance: Understanding of data governance principles and practices, including data lineage, data quality, and data security.
Communication Skills: Excellent written and verbal communication skills, with the ability to explain complex technical concepts to both technical and non-technical audiences.
Problem Solving: Strong analytical and problem-solving skills.

Bonus Points:

Experience with data labeling platforms (e.g., Labelbox, Scale AI, Amazon SageMaker Ground Truth).
Experience with MLOps practices and tools (e.g., Kubeflow, MLflow).
Experience with cloud platforms (e.g., AWS, Azure, GCP).
Experience with data visualization tools (e.g., Tableau, Power BI).
Experience with building and maintaining data pipelines using orchestration tools (e.g. Airflow, Prefect)

Benefits & conditions

What we offer:

Opportunities for career advancement and personal development
Access to a diverse range of training programs
Performance-based rewards that celebrate your achievements
Flexibility with a hybrid work model (3:2) that blends home and office life
Electric car salary sacrifice scheme
Life insurance

About the company

Why Work at Lenovo We are Lenovo. We do what we say. We own what we do. We WOW our customers. Lenovo is a US$69 billion revenue global technology powerhouse, ranked #196 in the Fortune Global 500, and serving millions of customers every day in 180 markets. Focused on a bold vision to deliver Smarter Technology for All, Lenovo has built on its success as the world's largest PC company with a full-stack portfolio of AI-enabled, AI-ready, and AI-optimized devices (PCs, workstations, smartphones, tablets), infrastructure (server, storage, edge, high performance computing and software defined infrastructure), software, solutions, and services. Lenovo's continued investment in world-changing innovation is building a more equitable, trustworthy, and smarter future for everyone, everywhere. Lenovo is listed on the Hong Kong stock exchange under Lenovo Group Limited (HKSE: 992) (ADR: LNVGY). This transformation together with Lenovo's world-changing innovation is building a more inclusive, trustworthy, and smarter future for everyone, everywhere. To find out more visit www.lenovo.com, and read about the latest news via our StoryHub. Description and Requirements This role is open for the Edinburgh, Scotland location only. Candidates must be based there, as the position requires working from the office at least three days per week (3:2 hybrid policy). The Lenovo AI Technology Center (LATC)-Lenovo's global AI Center of Excellence-is driving our transformation into an AI-first organization. We are assembling a world-class team of researchers, engineers, and innovators to position Lenovo and its customers at the forefront of the generational shift toward AI. Lenovo is one of the world's leading computing companies, delivering products across the entire technology spectrum, spanning wearables, smartphones (Motorola), laptops (ThinkPad, Yoga), PCs, workstations, servers, and services/solutions. This unmatched breadth gives us a unique canvas for AI innovation, including the ability to rapidly deploy cutting-edge foundation models and to enable flexible, hybrid-cloud, and agentic computing across our full product portfolio. To this end, we are building the next wave of AI core technologies and platforms that leverage and evolve with the fast-moving AI ecosystem, including novel model and agentic orchestration & collaboration across mobile, edge, and cloud resources. This space is evolving fast and so are we. If you're ready to shape AI at a truly global scale, with products that touch every corner of life and work, there's no better time to join us. Lenovo is seeking a talented and motivated Sr. Data Engineer/Scientist to join our growing team. This role is critical to the success of our machine learning initiatives, focusing on the creation, quality control, and governance of the datasets that power our models. You will bridge the gap between raw data and model readiness, working closely with model developers to understand their needs and deliver high-quality, reliable data. This is a hands-on role requiring strong technical skills in data engineering, data analysis, and machine learning fundamentals. If you are passionate about making Smarter Technology For All, come help us realize our Hybrid AI vision!