Network Engineer with SRE
Role details
Job location
Tech stack
Job description
- We are seeking a Network SRE to ensure the reliability, scalability, and performance of cloud and hybrid network platforms.
- This role applies SRE principles to networking by shifting from manual network operations to automated, observable, and resilient network services.
- The ideal candidate is a network engineer who thinks like a software engineer and SRE.
Key Responsibilities
-
Network Reliability Engineering
-
Define SLIs, SLOs, and Error Budgets for network services.
Design networks for:
- High availability
- Fault tolerance
- Low latency
- Predictable performance
Improve network reliability while reducing operational toil.
Cloud & Hybrid Networking
Architect and operate AWS networking:
- VPCs, Subnets, Route Tables
- Transit Gateway
- NAT, IGW
- PrivateLink, VPC Endpoints
Design hybrid connectivity:
- VPN
- Direct Connect
Support multi-account and multi-region architectures.
Network Observability & Monitoring
Build deep network observability using:
- VPC Flow Logs
- CloudWatch
- Datadog
- Prometheus / Grafana
Analyze packet loss, latency, and throughput.
Implement proactive alerting based on SLOs.
Correlate network signals with application performance.
Automation & Infrastructure as Code
Automate network provisioning and changes using:
- Terraform / CloudFormation
Implement CI/CD for network changes.
Reduce manual configuration and human error.
Version-control network definitions.
Incident Response & Troubleshooting
Lead network-related incident response.
Perform deep root-cause analysis for:
- Packet drops
- Routing issues
- DNS failures
- Load balancer degradation
Participate in on-call rotation and post-incident reviews.
Drive permanent fixes rather than workarounds.
Security & Traffic ManagementDesign and enforce:
- Network segmentation
- Zero-Trust principles
- Firewall rules (Security Groups, NACLs)
Implement secure ingress/egress patterns.
Support DDoS protection (AWS Shield, WAF).
Work with Security teams on audits and remediation.
Performance & Capacity Planning
Conduct traffic modeling and capacity forecasting.
Tune load balancers (ALB, NLB).
Optimize routing and failover strategies.
Requirements
- Strong networking fundamentals (TCP/IP, DNS, BGP, routing)
- AWS networking expertise
- SRE concepts & practices
- Network observability & monitoring
- Infrastructure as Code
- Production incident handling experience