Network Architect AI Infrastructure & Data Center Networks

Cyber 1 Armor
Milpitas, United States of America
3 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Milpitas, United States of America

Tech stack

Artificial Intelligence
Amazon Web Services (AWS)
Bash
Border Gateway Protocol
Computer Clusters
Computer Programming
Data Centers
Multi-protocol Systems
Python
Linux System Administration
Nagios
Network Architecture
Open Shortest Path First
Performance Tuning
Remote Direct Memory Access
Ansible
Zabbix
AI Infrastructure
Pulumi
Scripting (Bash/Python/Go/Ruby)
Google Cloud Platform
High Performance Computing
Reliability of Systems
HybridCloud
Juniper
Data Center Networking
Low Latency
Machine Learning Operations
Terraform
Open Network Automation Platform
Cisco networks
Go

Job description

We are seeking a highly experienced Senior Network Architect to lead the design, architecture, and evolution of large-scale AI/ML, data center, and backbone network infrastructure. The ideal candidate will have deep expertise in high-performance networking, multi-terabit WAN architectures, EVPN/VXLAN fabrics, network automation, and cloud-scale infrastructure supporting AI workloads. Key Responsibilities Design and architect large-scale AI/ML data center networks and high-capacity WAN infrastructure. Lead deployment of EVPN/VXLAN fabrics supporting GPU clusters and AI training environments. Drive network scalability, reliability, performance, and automation initiatives across global infrastructure. Design and optimize low-latency, high-throughput networks supporting RDMA/RoCE workloads. Develop network automation solutions using Python, Ansible, Terraform/OpenTofu, and CI/CD pipelines. Define network standards, operational processes, observability frameworks, and reliability best practices. Collaborate with infrastructure, cloud, systems, and AI engineering teams on strategic architecture initiatives. Lead troubleshooting and performance optimization for large-scale production environments. Mentor engineers and contribute to technical leadership, documentation, and architecture reviews.

Requirements

15+ years of experience in Network Architecture, Network Engineering, or Network Reliability Engineering. Deep expertise with: BGP, OSPF, IS-IS, MPLS EVPN/VXLAN Data Center Networking WAN and Backbone Architecture AI/ML Infrastructure Networking Network Performance and Capacity Planning Strong experience with Juniper, Arista, Cisco, and multi-vendor environments. Hands-on experience with Linux administration and network automation. Strong scripting/programming skills in Python, Go, Bash, or similar languages. Experience with Infrastructure-as-Code and automation frameworks (Ansible, Terraform/OpenTofu, Pulumi). Experience building highly available, scalable cloud and data center networks. Preferred Qualifications Experience supporting AI training clusters, GPU fabrics, or HPC environments. Knowledge of PTP, RDMA, RoCEv2, and low-latency networking technologies. Experience with network observability platforms such as Kentik, ThousandEyes, Zabbix, Nagios, or similar. Exposure to AWS, Google Cloud Platform, and hybrid cloud networking architectures. Experience leading architecture reviews and cross-functional infrastructure programs. Nice to Have Experience with large-scale hyperscaler environments. Participation in industry organizations such as NANOG, RIPE, or Internet Society. Background supporting multi-terabit AI or research infrastructure environments.

Apply for this position