GCP Architect
Role details
Job location
Tech stack
Requirements
- 8-12 years of infrastructure / platform engineering, with 3+ years as a principal-level technical authority on a production cloud platform
- Deep GCP expertise - you have designed GCP organizations, multi-tenant GKE environments, VPC architectures, and IAM models for production workloads; you can defend design decisions in an Org Policy discussion as readily as a Terraform code review
- Terraform mastery - multi-module design patterns, per-tenant factory modules, complex for each + dynamic blocks, state isolation strategy, module versioning; you have written Terraform that other engineers build on
- ArgoCD at scale - ApplicationSets, multi-cluster agent/pull model, promotion gates, RBAC, HA- you have operated ArgoCD across 20+ clusters, not just installed it
- Multi-tenant networking depth - CIDR management, IPAM tooling, VPC peering/PSC design, CGNAT or equivalent overlapping-address problem solving; you have solved customer CIDR conflict at scale
- Security architecture - VPC Service Controls, Binary Authorization, Cloud KMS/CMEK, Workload Identity, IAP zero-trust, least-privilege IAM; you have designed the security model for a compliance-audited SaaS platform
- Distributed systems intuition - you can evaluate trade-offs between Consul/Vault on VMs vs. containerized, between Elasticsearch and OpenSearch, between service mesh and no service mesh, and produce a written rationale that holds up under scrutiny
- Strong written communication: architecture documents, decision records, and design
- Distributed systems intuition - you can evaluate trade-offs between Consul/Vault on VMs vs. containerized, between Elasticsearch and OpenSearch, between service mesh and no service mesh, and produce a written rationale that holds up under scrutiny
- Strong written communication: architecture documents, decision records, and design reviews are your primary output alongside code
Strong Plus:
- HA VPN / OpenVPN architecture with per-tenant PKi at scale (cert lifecycle, rotation automation, GCP CA Service)
- EU Sovereign Cloud experience: GCP Assured Workloads, AWS EU Sovereign, Azure EU, SecNumCloud, BSI C5, GDPR DPA design
- HOK/BYOK with external KMS (Thales CipherTrust, HSM) - architectural experience, not just theoretical
- Temporal.io workflow architecture for multi-step provisioning orchestration
- Experience building agentic or Al-augmented infrastructure pipelines
- SOC2 Type II, ISO 27001, or PCI-DSS architecture-to-controls mapping (you've been in the audit room)
- Elasticsearch / OpenSearch cluster architecture at production scale
- Google Cloud Professional Cloud Architect certification (required within 90 days if not already held)
Benefits & conditions
Own the GCP organization design end-to-end: folder hierarchy (Platform-Infrastructure, Customer-Hosting/Americas/EMEA/APAC, Engineering, OF, SE, PM, project naming conventions, IAM group model (sav-eic-* Google Groups - Least-privilege role bindings), and Organization Polloy framework (region constraints, external IP restrictions, SA key prevention, domain-restricted sharing, uniform bucket access)
Define and document the per-customer tenant isolation model: dedicated GCP project + VPC
- GKE cluster per environment (prod/nonprod) - full billing, permission, and operational isolation. Own trade-off analysis between this model and namespace-level isolation as customer count grows
Resolve the critical open gaps in the current architecture: IPAM tooling selection, ArgoCD sharding strategy at 50-100+ clusters, PKIstrategy for SC2, Well-Architected Framework compliance gaps between MVP and production paths
Networking & CGNAT Architecture
Own the CGNAT (RFC 6598, 100.64.0.0/10) per-tenant addressing design: /23 CIDR
allocation framework (App /24, Web /25, Mgmt /26, GKE Master /28, PSA /24), IPAM tooling selection and integration into the provisioning pipeline
Design the full Connect 2.0 (SC2) architecture: HA OpenVPN topology (primary + secondary
VM per tenant, different zones, CGNAT side), PKIstrategy (GP CA Service Root CA + Issuing CA), per-tenant certificate lifecycle (generation, rotation, expiry alerting, revocation).
Firestore tenant config schema, Cloud Function orchestration (connect-health-probe, connect-failover, connect-failback), and . ovpn dual-endpoint bundle design
Define VPC routing Logic: custom node tags - active SC2 VI For RFC-1918 ranges, pod-traf