CommunicationOperationsWorkflow ManagementLeadershipTeamworkAutomationCloud-Native ComputingSoftware EngineeringPython
Role details
Job location
Tech stack
Job description
This is an opportunity to work at the frontier of AI for science and the Department of Energy Genesis mission, where large-scale machine learning, scientific data, simulation, and leadership-class supercomputers come together to enable new modes of discovery. You will join the AI group - a highly collaborative, multidisciplinary environment and work alongside experts in AI, simulation, computer science, applied mathematics, and domain science.If you are excited by building real systems, improving developer workflows, deploying services at scale, and supporting mission-driven research, this is a unique opportunity to contribute to one of the world's leading computing environments.
In this role you can expect to:
- Design, build, deploy, and maintain AI-enabled software services that support scientific and operational use cases at ALCF
- Build and improve CI/CD pipelines , automated testing workflows, and engineering processes for AI software and services
- Support deployment and operations in containerized and orchestration environments , including Kubernetes
- Develop internal tools, APIs, and platform services that improve usability, access, and reproducibility for AI workflows
- Conduct research and development aligned with Argonne's strategic mission in computation, AI, and scientific discovery.
- Contribute to a team culture that values scientific excellence, collaboration, innovation, and inclusive professional growth., CommunicationOperationsWorkflow ManagementLeadershipTeamworkAutomationCloud-Native ComputingSoftware EngineeringPython (Programming Language)Test AutomationScalabilityNode.js (Javascript Library)InnovationResearchComputer ScienceApplication Programming Interface (API)CI/CDKubernetesArtificial IntelligenceAuthorization (Computing)Machine LearningUsabilityC++ (Programming Language)GitlabGithubApplied MathematicsPyTorch (Machine Learning Library)SupercomputingInfrastructure Automation
Requirements
- RD2: Bachelor's degree and 5+ years of experience, or Master's degree and 3+ years of experience, or PhD and 0+ years of experience, or equivalent experience
- Educational background in computer science, software engineering, computational science, or a related field
- Experience designing, building, and maintaining production software systems
- Strong programming skills in one or more languages such as Python, C, C++, Rust
- Experience with one or more AI frameworks such as PyTorch and vLLM.
- Experience building or maintaining CI/CD pipelines and automated software delivery workflows
- Good communication skills, both verbal and written.
- Ability to model Argonne's core values of impact, safety, respect, integrity, and teamwork
Preferred Qualifications:
- Experience with model serving , inference systems, or ML platform components
- Experience with Kubernetes operators, Helm, Argo, GitLab CI, GitHub Actions , or related tooling
- Experience with cloud-native tools , infrastructure automation, or service observability
- Experience with running AI workloads on large-scale systems
- Experience with multi-node or multi-accelerator execution and optimization.