Blue-Green Operator-less Istio Migration
Upon migration to Big Bang v3.0 in SmoothGlue v6.18.0, operatorless Istio becomes mandatory and the migration will happen immediately. If a blue-green migration to operatorless Istio is desired, it should be performed on SmoothGlue v6.17 prior to upgrading to v6.18.
All the information presented in the Operatorless Istio Migration guide is relevant to this guide, so read that guide first before attempting a blue-green migration.
At a high level, a Blue Green migration to Operatorless Istio is intended to minimize the amount of downtime incurred during the migration process and provide an easier pathway to rolling back the changes if necessary. This is done by deploying a set of new Ingress Gateways and load balancers alongside the existing ones, and ensuring that all Virtual Services support both the old and new Ingress Gateways during the migration, allowing the parallel traffic pathway to be inspected and verified prior to cutting over production traffic.
Broadly, the following steps should be taken to effect the migration:
- Modify or mutate all Virtual Services to support both the old Ingress Gateways and the new Ingress Gateways.
- Create parallel Ingress Gateways by
adding a Helm
keep
annotation to the existing resources, then deploying the new Ingress Gateways on a new set of Node Ports. - Create new load balancer target groups. If using Application Load Balancers (ALBs), update listener rules to point to the new target groups. If using Network Load Balancers (NLBs), all traffic must be pointed to the new target group at once.
Virtual Service Mutation
In order to migrate applications between Ingress Gateways in a controlled fashion, all Virtual Services should be configured to support both the old and new Ingress Gateways simultaneously. This poses a problem for both platform tools and for user applications:
- For platform tools, the Big Bang umbrella chart only provides support for configuring platform tools with a single Ingress Gateway. In order to support multiple gateways, overrides must be provided to individual Helm charts.
- In order to update all customer applications, overrides must be set for ArgoCD applications (if the application supports it), or the applications' Helm charts must be updated with the new Ingress Gateway name.
In order to ease the operational burden of updating Virtual Services, a mutating
Kyverno Cluster Policy can be created to automatically mutate the ingress
gateways on Virtual Services. For example, the following Cluster Policy will
update VirtualServices which are configured with only one of the
istio-system/public
or istio-gateway/public-ingressgateway
Gateways to also
add the other one. Likewise, for the istio-system/passthrough
and
istio-gateway/passthrough-ingressgateway
Gateways, the policy will add the
other if one is missing.
mutate-istio-virtual-service-gateways
Cluster Policy
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: mutate-istio-virtual-service-gateways
spec:
rules:
- name: mutate-virtual-service-istio-gateway-public-ingressgateway
match:
all:
- resources:
kinds:
- networking.istio.io/*/VirtualService
preconditions:
all:
- key: "{{ request.object.spec.gateways[] | [] }}"
operator: AnyIn
value: ["istio-system/public"]
- key: "{{ request.object.spec.gateways[] | [] }}"
operator: AllNotIn
value: ["istio-gateway/public-ingressgateway"]
mutate:
patchesJson6902: |-
- op: add
path: /spec/gateways/-
value: istio-gateway/public-ingressgateway
- name: mutate-virtual-service-istio-system-public
match:
all:
- resources:
kinds:
- networking.istio.io/*/VirtualService
preconditions:
all:
- key: "{{ request.object.spec.gateways[] | [] }}"
operator: AnyIn
value: ["istio-gateway/public-ingressgateway"]
- key: "{{ request.object.spec.gateways[] | [] }}"
operator: AllNotIn
value: ["istio-system/public"]
mutate:
patchesJson6902: |-
- op: add
path: /spec/gateways/-
value: istio-system/public
- name: mutate-virtual-service-istio-gateway-passthrough-ingressgateway
match:
all:
- resources:
kinds:
- networking.istio.io/*/VirtualService
preconditions:
all:
- key: "{{ request.object.spec.gateways[] | [] }}"
operator: AnyIn
value: ["istio-system/passthrough"]
- key: "{{ request.object.spec.gateways[] | [] }}"
operator: AllNotIn
value: ["istio-gateway/passthrough-ingressgateway"]
mutate:
patchesJson6902: |-
- op: add
path: /spec/gateways/-
value: istio-gateway/passthrough-ingressgateway
- name: mutate-virtual-service-istio-system-passthrough
match:
all:
- resources:
kinds:
- networking.istio.io/*/VirtualService
preconditions:
all:
- key: "{{ request.object.spec.gateways[] | [] }}"
operator: AnyIn
value: ["istio-gateway/passthrough-ingressgateway"]
- key: "{{ request.object.spec.gateways[] | [] }}"
operator: AllNotIn
value: ["istio-system/passthrough"]
mutate:
patchesJson6902: |-
- op: add
path: /spec/gateways/-
value: istio-system/passthrough
This manifest can be directly applied to the cluster using kubectl apply
.
Extend this Cluster Policy as necessary if using other custom Gateways.
The Cluster Policy mutation is only applied when Virtual Services are created or
updated, so if desired, the metadata.generation
can be incremented on existing
Virtual Services in order to force the mutation to be applied immediately.
Parallel Ingress Gateway Creation
In the Istio Operator deployment paradigm, Ingress Gateways are located in the
istio-system
namespace, but the operatorless Istio migration moves the
gateways to individual Helm charts under the istio-gateway
namespace. By
default, the MIGRATE_ISTIO
Zarf flag will remove the existing Istio Gateways,
but it is possible to retain the existing Ingress Gateways using Helm
annotations. First, identify all the existing Gateways on the cluster using
this command:
$ kubectl get gateways.networking.istio.io -A
NAMESPACE NAME AGE
istio-system passthrough 6h
istio-system public 6h
Additionally, the TLS certificates used by the Ingress Gateways must also be retained. Identify these using the following command:
$ kubectl get gateways.networking.istio.io -oyaml -A | yq '[.items[] | {"name": .metadata.name, "namespace": .metadata.namespace, "secrets": [.spec.servers[].tls.credentialName | select(.)]}]'
- name: passthrough
namespace: istio-system
secrets: []
- name: public
namespace: istio-system
secrets:
- public-cert
Use the following command to add the helm.sh/resource-policy=keep
annotation
to the existing Gateways and TLS Secrets, so that they are not removed when the
Istio Controlplane Helm chart is uninstalled. Extend the script as necessary if
using any custom Gateways or certificates.
#!/bin/sh
read -r -d '' gateways <<'EOF'
passthrough
public
EOF
for gateway in $gateways; do
kubectl annotate gateway -n istio-system "$gateway" --overwrite helm.sh/resource-policy=keep
done
read -r -d '' certs <<'EOF'
public-cert
EOF
for cert in $certs; do
kubectl annotate secret -n istio-system "$cert" --overwrite helm.sh/resource-policy=keep
done
At this point, if the MIGRATE_ISTIO
flag is passed to Zarf during the package
deploy, the existing Ingress Gateways will not be removed, but the new set of
Ingress Gateways will not be created due to conflicting Node Ports.
The new Ingress Gateways can be created on a new set of Node Ports by adding the
following values to the bigbang-values.yaml
file during the package deploy:
bigbang-values.yaml
for public
Ingress Gateway NodePort Modification (with NLB)
istioGateway:
values:
gateways:
public:
upstream:
autoscaling:
minReplicas: 2
maxReplicas: 5
service:
type: "NodePort"
ports:
- port: 15021
nodePort: 30022 # This is set to port 30021 by default.
targetPort: 15021
name: status-port
protocol: TCP
- port: 80
nodePort: 30081 # This is set to port 30080 by default.
targetPort: 8080
name: http
protocol: TCP
- port: 443
nodePort: 30444 # This is set to port 30443 by default.
targetPort: 8443
name: https
protocol: TCP
If using an ALB, the bigbang-values.yaml
file must also set a wildcard value
for .istioGateway.values.gateways.public.gateway.servers[].hosts
, as ALBs do
not support SNI.
bigbang-values.yaml
for public
Ingress Gateway NodePort Modification (with ALB)
istioGateway:
values:
gateways:
public:
# ALBs don't support SNI
gateway:
servers:
- hosts:
- '*'
port:
name: http
number: 8080
protocol: HTTP
tls:
httpsRedirect: true
- hosts:
- '*'
port:
name: https
number: 8443
protocol: HTTPS
tls:
credentialName: public-cert
mode: SIMPLE
upstream:
autoscaling:
minReplicas: 2
maxReplicas: 5
service:
type: "NodePort"
ports:
- port: 15021
nodePort: 30022 # This is set to port 30021 by default.
targetPort: 15021
name: status-port
protocol: TCP
- port: 80
nodePort: 30081 # This is set to port 30080 by default.
targetPort: 8080
name: http
protocol: TCP
- port: 443
nodePort: 30444 # This is set to port 30443 by default.
targetPort: 8443
name: https
protocol: TCP
In addition, if deploying a passthrough Ingress Gateway, modify its NodePort
by adding the following to the bigbang-values.yaml
file:
bigbang-values.yaml
for passthrough
Ingress Gateway NodePort Modification
istioGateway:
values:
gateways:
passthrough:
upstream:
autoscaling:
minReplicas: 2
maxReplicas: 5
service:
type: "NodePort"
ports:
- port: 15021
nodePort: 32022 # This is set to port 32021 by default.
targetPort: 15021
name: status-port
protocol: TCP
- port: 80
nodePort: 32081 # This is set to port 32080 by default.
targetPort: 8080
name: http
protocol: TCP
- port: 443
nodePort: 32444 # This is set to port 32443 by default.
targetPort: 8443
name: https
protocol: TCP
Once the bigbang-values.yaml
file has been modified with these values, deploy
the SmoothGlue package while passing the BIGBANG_VALUES_FILE
and
MIGRATE_ISTIO
Zarf variables in order to deploy the new Ingress Gateways. See
Upgrade SmoothGlue for more information.
Once the new Ingress Gateways have been deployed, test traffic can be sent to
them without modifying the load balancer configuration by overriding DNS
resolution using curl
:
VIRTUAL_SERVICE_HOSTNAME=virtual-service.hostname.domain
GATEWAY_HTTPS_NODEPORT=30444
NODE_IP=$(kubectl get nodes -oyaml | yq '.items[].status.addresses[] | select(.type=="InternalIP") | .address' | head -n 1)
curl -kv https://$VIRTUAL_SERVICE_HOSTNAME:$GATEWAY_HTTPS_NODEPORT --resolve $VIRTUAL_SERVICE_HOSTNAME:$GATEWAY_HTTPS_NODEPORT:$NODE_IP
Run this command from the CLI on one of the cluster's nodes or modify the Security Group rules for the cluster to allow access to the NodePort directly. Ensure that the Security Group rules for your cluster allow you
Load Balancer Modification
The load balancers used by the cluster's existing Ingress Gateways will affect migration efforts significantly. Network Load Balancers (NLBs) only deal in TCP level traffic and have no knowledge of HTTP traffic, whereas Application Load Balancers (ALBs) are able to granularly direct HTTP traffic for hosts to different target groups. As such, NLBs are only able to forward incoming TCP traffic on a particular port to a particular target group, meaning that all traffic for the Ingress Gateway must be cut over simultaneously. In contrast, ALBs allow more granular migration of hosts from one Ingress Gateway to another on a per-listener rule level.
In either case, new target groups must be created to point to the new Ingress Gateway, then the security group rules for the cluster nodes must be modified to allow ingress traffic for both the old and new Node Ports. If using the SmoothGlue IaC, it is possible to have the IaC create some of these resources for you, but some aspects of the migration will require manual intervention, such as overriding the cluster security group rules to allow ingress from both sets of Node Ports simultaneously, so modifying the resources manually may be less error prone in this situation.
Once the target groups and security group rules have been configured, for an NLB, the traffic for the entire NLB must be directed to the new target group at once. However, for an ALB, the traffic can be migrated to the new target group on a per-listener rule basis, allowing more gradual migration of individual Virtual Services.
Pre-Deploy Checks for SmoothGlue v6.18.0
- Once SmoothGlue v6.18.0 is deployed, roll back to the Istio Operator is no
longer possible, as the Big Bang 3.0.0 values schema validation will fail if
the
istio
oristioOperator
top-level keys are present.- Ensure that the top-level
istio
andistioOperator
keys have been removed from thebigbang-values.yaml
andbigbang-secrets.yaml
files after migrating the relevant configuration to theistiod
oristioGateway
keys.
- Ensure that the top-level
Post Migration Steps
-
Before removing the Kyverno Cluster Policy mutating Virtual Service Gateways, ensure that all Virtual Services have been updated so that the Cluster Policy is not needed. This can be done by querying the Policy Reports on the cluster using a
kubectl
command like this:$ k get policyreport -A -oyaml | yq '.items[] | select(.scope.kind == "VirtualService") | select(.results[].result != "skip") | {"name": .scope.name, "results": [.results[] | pick(["rule", "result"])]}'
name: python-example
results:
- rule: mutate-virtual-service-istio-gateway-passthrough-ingressgateway
result: skip
- rule: mutate-virtual-service-istio-gateway-public-ingressgateway
result: pass
- rule: mutate-virtual-service-istio-system-passthrough
result: skip
- rule: mutate-virtual-service-istio-system-public
result: skip- The output from this command can be reduced by removing the rules
mutate-virtual-service-istio-system-public
andmutate-virtual-service-istio-system-passthrough
from the Cluster Policy, since these rules should no longer have any effect.
- The output from this command can be reduced by removing the rules
-
Remove the Gateways and certificate Secrets in the
istio-system
namespace. -
Reconcile the SmoothGlue IaC with the modifications made to the load balancer configuration. Ensure that the following values are set in your
env.hcl
file:locals {
cluster_inputs = {
passthrough_ingress_gateway_http_port = 32081
passthrough_ingress_gateway_https_port = 32444
passthrough_ingress_gateway_status_port = 32022
public_ingress_gateway_http_port = 30081
public_ingress_gateway_https_port = 30444
public_ingress_gateway_status_port = 30022
}
}- Review the
terragrunt plan
output thoroughly, and import any manually created Terraform resources as appropriate before applying. resources
- Review the