Machine Learning Engineer

Ai.
yesterday

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Shift work
Languages
English

Job location

Tech stack

Artificial Intelligence
Confluence
JIRA
Distributed Systems
Github
Monitoring of Systems
InfiniBand
Python
Linux System Administration
Machine Learning
Scrum
TensorFlow
Prometheus
Graphics Processing Unit (GPU)
High Performance Computing
PyTorch
System Availability
Model Validation
Reliability of Systems
Gitlab
Scikit Learn
Kubernetes
Information Technology
Data Analytics
Machine Learning Operations
Software Version Control
Workday

Job description

solutions for infrastructure monitoring, reliability, and cybersecurity in HPC environments. The role focuses on leveraging large-scale operational telemetry, metrics and logs to build predictive capabilities that improve system availability, detect anomalies and support proactive operations. The selected candidate will be responsible for model development, rigorous validation, operationalization and integration within a Kubernetes-based platform. Responsibilities - Design and develop ML/DL models for predicting hardware failures and detecting software or behavioral anomalies in HPC systems. - Apply advanced analytics techniques such as time-series forecasting, anomaly detection, classification and predictive maintenance using large-scale monitoring data. - Build and maintain data pipelines and features from infrastructure telemetry and logs. - Perform rigorous model validation to ensure robustness, reliability and production readiness. - Deploy and operationalize models within a

Requirements

Kubernetes-based environment, including scalable inference services and lifecycle management. - Contribute to AI-driven cybersecurity use cases, such as detecting abnormal behaviors, potential intrusions or security-related anomalies in infrastructure and system activity. - Work within an Agile/Scrum environment, participating in sprint planning, stand-ups and retrospectives. - Collaborate with system administrators, support teams and data engineers to translate operational challenges into data-driven solutions that enhance system reliability and automation. Education - Master's or PhD in Computer Science, Artificial Intelligence, Data Science, Telecommunications or a related field. Skills & Competencies - Strong experience with Machine Learning and Deep Learning frameworks (e.g., TensorFlow, PyTorch, Scikit-learn). - Experience working with time-series data and anomaly detection models. - Proficiency in Python and data science ecosystems. - Experience with Prometheus or similar monitoring/telemetry systems. - Familiarity with containerization and orchestration technologies, especially Kubernetes. - Experience building production-grade ML pipelines. - Experience handling large-scale monitoring and operational datasets. - Understanding of distributed systems and infrastructure monitoring. - Knowledge of HPC environments, GPUs and high-speed interconnects (e.g., Infiniband) is highly desirable. - Proficiency with Git-based version control systems (GitHub, GitLab). - Solid experience working in Linux environments. - Good understanding of Scrum methodology and experience with Jira and Confluence. Location Spain Benefits - Flexible Work Schedule - Half day Fridays and an intensive summer workday supporting work-life balance. - Learning and Growth Opportunities to work with advanced AI technologies in an innovative and supportive R&D environment. Join us Here, your ideas, curiosity and technical excellence directly shape the next era of advanced computing - unlocking enterprise value, accelerating scientific progress and driving positive impact for society.

About the company

{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"trabajo.org","value":"JOB--1355022164510777011"},"datePosted":"2026-05-21T16:42:49+02:00","validThrough":"2026-05-28T00:00:00+02:00","title":"Machine Learning Engineer","hiringOrganization":{"@type":"Organization","name":"Atos"},"jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"","addressCountry":"ES"}},"description":"Bull is a story of over a century of European innovation, focusing on powerful, sustainable, and sovereign digital solutions that let states and industries retain full control over their data and AI. Thousands of engineers, researchers and tech professionals shape the future of high-performance computing, AI and quantum technologies, pushing boundaries from next-generation HPC architectures to exascale supercomputers with world-class R&D and over 1,600 patents. Role Description We are searching for a Machine Learning Engineer to join Bull's innovative R&D team and contribute to AI-driven

Apply for this position