Data Center Onsite Operations Technician
Role details
Job location
Tech stack
Job description
We are hiring Onsite Operations technicians to support a large-scale AI computing cluster project in Paris. This is a hands-on onsite role focused on operation and maintenance of GPU servers.
This team will provide 24×7 onsite support through rotating shifts. The role includes ticket execution, remote hands support, basic hardware checks, server inspection, hardware replacement, and escalation coordination., * Perform onsite GPU server operation and daily inspection.
- Execute work orders according to customer procedures.
- Provide remote hands and eyes support for remote technical teams.
- Perform basic fault checks and initial troubleshooting.
- Check server status, BMC/IPMI status, LED indicators, alarms, power, fans, and cables.
- Support server reboot, power cycle, GPU reseat, SSD replacement, PSU replacement, cable replacement, and server replacement according to procedure.
- Support hardware replacement, including GPU, SSD, PSU, cables, and server units.
- Collect basic logs and operational information when required.
- Update ticket status and maintain accurate onsite operation records.
- Escalate complex issues to customer higher level teams or OEM/vendor support.
- Support shift handover, daily records, incident updates, and onsite reports.
- Follow data center access, safety, ESD, and operational procedures.
Role Scope
This is an onsite operations role. The focus is onsite execution, hardware handling, basic checks, ticket updates, and escalation.
This role does not include deep GPU engineering, AI platform troubleshooting, firmware debugging, root cause analysis, or deep network architecture troubleshooting. Issues related to Kubernetes, Slurm, CUDA, NCCL, AI frameworks, InfiniBand fabric, BGP, firmware debugging, RCA, or OEM-level analysis will be escalated to L2/L3 or vendor teams.
Requirements
Do you have experience in Root cause analysis?, At least part of the L1 team must be able to communicate in Mandarin Chinese, so Mandarin-speaking candidates are strongly encouraged to apply., * Experience in IT infrastructure, data center operations, server maintenance, hardware support, or basic technical support.
- Hands-on experience with physical servers, server rooms, or data center environments is preferred.
- Basic knowledge of Linux and/or Windows Server.
- Familiarity with BMC/IPMI, server alarms, logs, and hardware status checks is a plus.
- Basic networking knowledge, including IP connectivity, switch ports, cables, and troubleshooting.
- Ability to follow step-by-step technical procedures accurately.
- Strong attention to detail when handling production servers and hardware components.
- Good documentation and ticket update discipline.
- Willingness to work rotating shifts, including nights, weekends, and public holidays.
- Professional working English is required.
- Mandarin Chinese is a strong advantage.
Nice to Have
- Experience with GPU servers, NVIDIA GPU hardware, AI/HPC, or cloud infrastructure.
- Experience with commands/tools such as nvidia-smi, lspci, dmesg, journalctl, ipmitool, or similar.
- Experience with rack inspection, cable tracing, labeling, asset scanning, or spare parts handling.
- Previous experience working in a controlled-access data center environment.
Benefits & conditions
- Monthly gross salary starting from €3,100, depending on experience, technical background, language skills, and overall match with the role.
- Night shift premium will be paid in accordance with applicable French labor law and collective agreement requirements.
- Sunday and public holiday work compensation will be paid in accordance with applicable French labor law and collective agreement requirements.
- Employer-supported public transportation reimbursement will be provided according to local requirements and internal policy.
Rémunération : à partir de 3 100,00€ par mois
Lieu du poste : En présentiel