Lesson 49: Multi-Cluster Security — Identity and Policy Governance Across a Fleet
What We’re Building Today
A multi-cluster RBAC federation model using ClusterRoles, aggregated bindings, and a centralized identity plane
A policy propagation pipeline that synchronizes OPA/Gatekeeper ConstraintTemplates and Kyverno policies across clusters without drift
A cross-cluster audit trail using a shared Loki/Vector log aggregation layer that captures authz decisions from every API server
A workload identity bridge linking Kubernetes ServiceAccounts to a cloud IAM provider (AWS IRSA / GCP Workload Identity) uniformly across regions
Why This Matters
The moment you cross the two-cluster boundary, your security posture fragments silently. RBAC that was elegantly scoped on a single cluster becomes a governance nightmare when replicated manually across twelve. Engineers copy-paste ClusterRoleBindings into new clusters during provisioning, get it wrong at 2 AM during an incident, and production access patterns diverge from what’s audited. At Spotify, this drove their investment in a centralized RBAC reconciliation controller long before it became a pattern in the ecosystem. Netflix’s Paved Road infrastructure mandates that no human modifies cluster-scoped RBAC objects directly — every change flows through a GitOps-controlled pipeline that enforces approval workflows and audit provenance.
The pattern we’re implementing today solves the exact problem that trips up engineering organizations at the 10-50 cluster scale: identity fragmentation, policy drift, and the absence of a unified authz audit surface. These aren’t theoretical risks — they’re the attack surface that the Capital One breach in 2019 exposed, where misconfigured IAM policies across cloud boundaries enabled lateral movement.


