Datacenter Hardware Engineer, HPC
Role details
Job location
Tech stack
Job description
-
Direct impact on scale: your work keeps one of France's largest AI clusters healthy as we grow to unprecedented scale.
-
Enable breakthrough AI: you unlock our science & engineering teams to deliver groundbreaking AI solutions.
What you will do
-
Diagnose & operate core server/cluster components - Investigate and handle compute/storage hardware issues (CPU, memory, drives, NICs, GPUs, PSUs) and interconnect problems (switches, cables, transceivers; Ethernet/InfiniBand). Perform safe interventions (power-off/lockout, ESD) to replace, re-seat, or recable components and restore service.
-
Safety & procedures - Apply lockout/tagout (LOTO) and ESD discipline; follow pre/post-work checklists; maintain tidy, safe work areas.
-
First-line diagnostics - Triage using LEDs, POST, beep codes and basic tests; capture evidence (photos, serials, results); open/update/close tickets with clear notes.
-
Preventive maintenance - Provide feedback and ideas to improve proactive activities, monitoring, and targeted follow-ups on recurring or specific anomalies; help turn ad-hoc checks into SOPs, alerts, and dashboards.
-
Parts & logistics - Receive and track parts, keep labeled inventory accurate, manage simple RMAs, and coordinate with vendors.
-
Collaboration & escalation - Partner with senior hardware/firmware owners on complex or multi-node issues; communicate status and next steps crisply.
-
Documentation & quality - Keep SOPs/checklists current; ensure zero undocumented changes and consistent, audit-ready records.
Requirements
-
Hands-on mindset in datacenters/server hardware: you can install/re-seat/swap GPU/PCIe cards, NICs, PSUs, drives, and work cleanly in racks (rails, cabling, labeling). We also welcome candidates with strong Linux fundamentals (boot/check, logs) and scripting (Python/Bash) who are eager to learn hardware; you'll be trained and mentored by a senior hardware engineer.
-
Disciplined and meticulous: follows checklists, ESD/LOTO; no rough handling; careful with all high-value server components.
-
Practical electrical basics: power-off, PPE, short-circuit risk awareness.
-
Comfortable in racks: cooling, network, storage, PDU, cable management; can lift/mount safely (within HSE limits).
-
Clear communicator: short factual updates; reliable teammate; punctual and process-minded.
-
Hardware-passionate, professionally grounded: strong curiosity and craft mindset.
Nice to have
-
HPC/AI/Cloud at scale experience (production environments), large-fleet/server install & maintenance in datacenters.
-
Basic networking (Ethernet/InfiniBand) and basic Linux (boot/check; no coding needed).
-
Coding/automation skills (Python/Bash): small tools/scripts to improve checklists, photo/serial capture, inventory sync, or simple monitoring/reporting.
-
Experience with inventory/RMA tools and vendor coordination.
-
Exposure to HPC/research/industrial environments.
Location & Remote
The position is based in our Paris HQ offices and we encourage going to the office as much as we can (at least 3 days per week) to create bonds and smooth communication. Our remote policy aims to provide flexibility, improve work-life balance and increase productivity. Each manager can decide the amount of days worked remotely based on autonomy and a specific context (e.g. more flexibility can occur during summer). In any case, employees are expected to maintain regular communication with their teams and be available during core working hours.
Benefits & conditions
Competitive salary and equity package
️ Health insurance
Transportation allowance
Sport allowance
Meal vouchers
Private pension plan
Generous parental leave policy