Systems Engineer (HPC/Server Farm)
Role details
Job location
Tech stack
Job description
Seeking a Server Farm Engineer to join our team to support, manage, and improve the compute farm environment. The candidate should have hands-on experience with cloud solutions and proven expertise in working directly with R&D software development teams to develop solutions to optimize their working environment collaboratively., * Supporting multiple geological locations to serve user communities across North America, Europe, and Asia sites.
- Focusing on improving R&D productivity and committing to customer success.
- Driving the overall operational strategy for internal High-Performance Compute (HPC) farms in all locations.
- Developing and executing the three-year compute roadmap and planning annual capacity growth for on-premises server farm in San Jose.
- Operating, managing, and enhancing the internal compute farm and associated cloud (AWS).
- Maintaining, enhancing, monitoring, reporting, and improving its efficiency.
Requirements
30-year history of applying leading-edge optimization and analysis algorithms to highly complex problems in semiconductor and electronic design, verification, and analysis. We are looking for a recent graduate software engineer to join our team of collaborative EDA professionals to deliver the best-in-class next-generation software for physical IC applications. The software engineer will work on complex problems where data analysis requires an evaluation of intangible variance factors to develop leading-edge software for the physical design and verification of products at advanced nodes., * 8+ years of technical experience architecting, managing, and improving a compute farm environment running Linux.
- At least 5 years of direct hands-on experience in a global or regional compute farm and/or hybrid cloud environment consisting of 1,000 or more servers with some remote direct reports
- At least 3 years working in a global group, coordinating support, strategies, projects, and operations across multiple geographies in a team-oriented approach
- Extensive technical experience managing IBM LSF and RTM and scripting using Python, shell, Perl, etc., in a Farm environment and knowledge of LSF spanning Farm to Cloud is highly desirable
- Solid understanding and proven operational experience with compute farms, job submission/management technologies, cloud, and associated management tools.
- Proven experience working directly with R&D software development teams to collaboratively develop solutions to optimize their working environment (Direct EDA experience desired)
- Proven experience in capacity and performance management, optimizing performance, ensuring adequate capacity, working with R&D on optimization of their workloads, and development and maintenance of key performance indicators
- A proven process focus shown through documentation, change management, incident management and problem-resolution activities
Education: BS / MS in computer science or related field