Skip to main content
Version: 6.20.1

Blue-Green Operator-less Istio Migration

warning

Upon migration to Big Bang v3.0 in SmoothGlue v6.18.0, operatorless Istio becomes mandatory and the migration will happen immediately. If a blue-green migration to operatorless Istio is desired, it should be performed on SmoothGlue v6.17 prior to upgrading to v6.18.

info

All the information presented in the Operatorless Istio Migration guide is relevant to this guide, so read that guide first before attempting a blue-green migration.

At a high level, a Blue Green migration to Operatorless Istio is intended to minimize the amount of downtime incurred during the migration process and provide an easier pathway to rolling back the changes if necessary. This is done by deploying a set of new Ingress Gateways and load balancers alongside the existing ones, and ensuring that all Virtual Services support both the old and new Ingress Gateways during the migration, allowing the parallel traffic pathway to be inspected and verified prior to cutting over production traffic.

Broadly, the following steps should be taken to effect the migration:

  • Modify or mutate all Virtual Services to support both the old Ingress Gateways and the new Ingress Gateways.
  • Create parallel Ingress Gateways by adding a Helm keep annotation to the existing resources, then deploying the new Ingress Gateways on a new set of Node Ports.
  • Create new load balancer target groups. If using Application Load Balancers (ALBs), update listener rules to point to the new target groups. If using Network Load Balancers (NLBs), all traffic must be pointed to the new target group at once.

Virtual Service Mutation

In order to migrate applications between Ingress Gateways in a controlled fashion, all Virtual Services should be configured to support both the old and new Ingress Gateways simultaneously. This poses a problem for both platform tools and for user applications:

  • For platform tools, the Big Bang umbrella chart only provides support for configuring platform tools with a single Ingress Gateway. In order to support multiple gateways, overrides must be provided to individual Helm charts.
  • In order to update all customer applications, overrides must be set for ArgoCD applications (if the application supports it), or the applications' Helm charts must be updated with the new Ingress Gateway name.

In order to ease the operational burden of updating Virtual Services, a mutating Kyverno Cluster Policy can be created to automatically mutate the ingress gateways on Virtual Services. For example, the following Cluster Policy will update VirtualServices which are configured with only one of the istio-system/public or istio-gateway/public-ingressgateway Gateways to also add the other one. Likewise, for the istio-system/passthrough and istio-gateway/passthrough-ingressgateway Gateways, the policy will add the other if one is missing.

mutate-istio-virtual-service-gateways Cluster Policy
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: mutate-istio-virtual-service-gateways
spec:
rules:
- name: mutate-virtual-service-istio-gateway-public-ingressgateway
match:
all:
- resources:
kinds:
- networking.istio.io/*/VirtualService
preconditions:
all:
- key: "{{ request.object.spec.gateways[] | [] }}"
operator: AnyIn
value: ["istio-system/public"]
- key: "{{ request.object.spec.gateways[] | [] }}"
operator: AllNotIn
value: ["istio-gateway/public-ingressgateway"]
mutate:
patchesJson6902: |-
- op: add
path: /spec/gateways/-
value: istio-gateway/public-ingressgateway
- name: mutate-virtual-service-istio-system-public
match:
all:
- resources:
kinds:
- networking.istio.io/*/VirtualService
preconditions:
all:
- key: "{{ request.object.spec.gateways[] | [] }}"
operator: AnyIn
value: ["istio-gateway/public-ingressgateway"]
- key: "{{ request.object.spec.gateways[] | [] }}"
operator: AllNotIn
value: ["istio-system/public"]
mutate:
patchesJson6902: |-
- op: add
path: /spec/gateways/-
value: istio-system/public
- name: mutate-virtual-service-istio-gateway-passthrough-ingressgateway
match:
all:
- resources:
kinds:
- networking.istio.io/*/VirtualService
preconditions:
all:
- key: "{{ request.object.spec.gateways[] | [] }}"
operator: AnyIn
value: ["istio-system/passthrough"]
- key: "{{ request.object.spec.gateways[] | [] }}"
operator: AllNotIn
value: ["istio-gateway/passthrough-ingressgateway"]
mutate:
patchesJson6902: |-
- op: add
path: /spec/gateways/-
value: istio-gateway/passthrough-ingressgateway
- name: mutate-virtual-service-istio-system-passthrough
match:
all:
- resources:
kinds:
- networking.istio.io/*/VirtualService
preconditions:
all:
- key: "{{ request.object.spec.gateways[] | [] }}"
operator: AnyIn
value: ["istio-gateway/passthrough-ingressgateway"]
- key: "{{ request.object.spec.gateways[] | [] }}"
operator: AllNotIn
value: ["istio-system/passthrough"]
mutate:
patchesJson6902: |-
- op: add
path: /spec/gateways/-
value: istio-system/passthrough

This manifest can be directly applied to the cluster using kubectl apply. Extend this Cluster Policy as necessary if using other custom Gateways.

The Cluster Policy mutation is only applied when Virtual Services are created or updated, so if desired, the metadata.generation can be incremented on existing Virtual Services in order to force the mutation to be applied immediately.

Parallel Ingress Gateway Creation

In the Istio Operator deployment paradigm, Ingress Gateways are located in the istio-system namespace, but the operatorless Istio migration moves the gateways to individual Helm charts under the istio-gateway namespace. By default, the MIGRATE_ISTIO Zarf flag will remove the existing Istio Gateways, but it is possible to retain the existing Ingress Gateways using Helm annotations. First, identify all the existing Gateways on the cluster using this command:

$ kubectl get gateways.networking.istio.io -A
NAMESPACE NAME AGE
istio-system passthrough 6h
istio-system public 6h

Additionally, the TLS certificates used by the Ingress Gateways must also be retained. Identify these using the following command:

$ kubectl get gateways.networking.istio.io -oyaml -A | yq '[.items[] | {"name": .metadata.name, "namespace": .metadata.namespace, "secrets": [.spec.servers[].tls.credentialName | select(.)]}]'
- name: passthrough
namespace: istio-system
secrets: []
- name: public
namespace: istio-system
secrets:
- public-cert

Use the following command to add the helm.sh/resource-policy=keep annotation to the existing Gateways and TLS Secrets, so that they are not removed when the Istio Controlplane Helm chart is uninstalled. Extend the script as necessary if using any custom Gateways or certificates.

#!/bin/sh
read -r -d '' gateways <<'EOF'
passthrough
public
EOF
for gateway in $gateways; do
kubectl annotate gateway -n istio-system "$gateway" --overwrite helm.sh/resource-policy=keep
done

read -r -d '' certs <<'EOF'
public-cert
EOF
for cert in $certs; do
kubectl annotate secret -n istio-system "$cert" --overwrite helm.sh/resource-policy=keep
done

At this point, if the MIGRATE_ISTIO flag is passed to Zarf during the package deploy, the existing Ingress Gateways will not be removed, but the new set of Ingress Gateways will not be created due to conflicting Node Ports.

The new Ingress Gateways can be created on a new set of Node Ports by adding the following values to the bigbang-values.yaml file during the package deploy:

bigbang-values.yaml for public Ingress Gateway NodePort Modification (with NLB)
istioGateway:
values:
gateways:
public:
upstream:
autoscaling:
minReplicas: 2
maxReplicas: 5
service:
type: "NodePort"
ports:
- port: 15021
nodePort: 30022 # This is set to port 30021 by default.
targetPort: 15021
name: status-port
protocol: TCP
- port: 80
nodePort: 30081 # This is set to port 30080 by default.
targetPort: 8080
name: http
protocol: TCP
- port: 443
nodePort: 30444 # This is set to port 30443 by default.
targetPort: 8443
name: https
protocol: TCP

If using an ALB, the bigbang-values.yaml file must also set a wildcard value for .istioGateway.values.gateways.public.gateway.servers[].hosts, as ALBs do not support SNI.

bigbang-values.yaml for public Ingress Gateway NodePort Modification (with ALB)
istioGateway:
values:
gateways:
public:
# ALBs don't support SNI
gateway:
servers:
- hosts:
- '*'
port:
name: http
number: 8080
protocol: HTTP
tls:
httpsRedirect: true
- hosts:
- '*'
port:
name: https
number: 8443
protocol: HTTPS
tls:
credentialName: public-cert
mode: SIMPLE
upstream:
autoscaling:
minReplicas: 2
maxReplicas: 5
service:
type: "NodePort"
ports:
- port: 15021
nodePort: 30022 # This is set to port 30021 by default.
targetPort: 15021
name: status-port
protocol: TCP
- port: 80
nodePort: 30081 # This is set to port 30080 by default.
targetPort: 8080
name: http
protocol: TCP
- port: 443
nodePort: 30444 # This is set to port 30443 by default.
targetPort: 8443
name: https
protocol: TCP

In addition, if deploying a passthrough Ingress Gateway, modify its NodePort by adding the following to the bigbang-values.yaml file:

bigbang-values.yaml for passthrough Ingress Gateway NodePort Modification
istioGateway:
values:
gateways:
passthrough:
upstream:
autoscaling:
minReplicas: 2
maxReplicas: 5
service:
type: "NodePort"
ports:
- port: 15021
nodePort: 32022 # This is set to port 32021 by default.
targetPort: 15021
name: status-port
protocol: TCP
- port: 80
nodePort: 32081 # This is set to port 32080 by default.
targetPort: 8080
name: http
protocol: TCP
- port: 443
nodePort: 32444 # This is set to port 32443 by default.
targetPort: 8443
name: https
protocol: TCP

Once the bigbang-values.yaml file has been modified with these values, deploy the SmoothGlue package while passing the BIGBANG_VALUES_FILE and MIGRATE_ISTIO Zarf variables in order to deploy the new Ingress Gateways. See Upgrade SmoothGlue for more information.

info

Once the new Ingress Gateways have been deployed, test traffic can be sent to them without modifying the load balancer configuration by overriding DNS resolution using curl:

VIRTUAL_SERVICE_HOSTNAME=virtual-service.hostname.domain
GATEWAY_HTTPS_NODEPORT=30444
NODE_IP=$(kubectl get nodes -oyaml | yq '.items[].status.addresses[] | select(.type=="InternalIP") | .address' | head -n 1)
curl -kv https://$VIRTUAL_SERVICE_HOSTNAME:$GATEWAY_HTTPS_NODEPORT --resolve $VIRTUAL_SERVICE_HOSTNAME:$GATEWAY_HTTPS_NODEPORT:$NODE_IP

Run this command from the CLI on one of the cluster's nodes or modify the Security Group rules for the cluster to allow access to the NodePort directly. Ensure that the Security Group rules for your cluster allow you

Load Balancer Modification

The load balancers used by the cluster's existing Ingress Gateways will affect migration efforts significantly. Network Load Balancers (NLBs) only deal in TCP level traffic and have no knowledge of HTTP traffic, whereas Application Load Balancers (ALBs) are able to granularly direct HTTP traffic for hosts to different target groups. As such, NLBs are only able to forward incoming TCP traffic on a particular port to a particular target group, meaning that all traffic for the Ingress Gateway must be cut over simultaneously. In contrast, ALBs allow more granular migration of hosts from one Ingress Gateway to another on a per-listener rule level.

In either case, new target groups must be created to point to the new Ingress Gateway, then the security group rules for the cluster nodes must be modified to allow ingress traffic for both the old and new Node Ports. If using the SmoothGlue IaC, it is possible to have the IaC create some of these resources for you, but some aspects of the migration will require manual intervention, such as overriding the cluster security group rules to allow ingress from both sets of Node Ports simultaneously, so modifying the resources manually may be less error prone in this situation.

Once the target groups and security group rules have been configured, for an NLB, the traffic for the entire NLB must be directed to the new target group at once. However, for an ALB, the traffic can be migrated to the new target group on a per-listener rule basis, allowing more gradual migration of individual Virtual Services.

Pre-Deploy Checks for SmoothGlue v6.18.0

  • Once SmoothGlue v6.18.0 is deployed, roll back to the Istio Operator is no longer possible, as the Big Bang 3.0.0 values schema validation will fail if the istio or istioOperator top-level keys are present.
    • Ensure that the top-level istio and istioOperator keys have been removed from the bigbang-values.yaml and bigbang-secrets.yaml files after migrating the relevant configuration to the istiod or istioGateway keys.

Post Migration Steps

  • Before removing the Kyverno Cluster Policy mutating Virtual Service Gateways, ensure that all Virtual Services have been updated so that the Cluster Policy is not needed. This can be done by querying the Policy Reports on the cluster using a kubectl command like this:

    $ k get policyreport -A -oyaml | yq '.items[] | select(.scope.kind == "VirtualService") | select(.results[].result != "skip") | {"name": .scope.name, "results": [.results[] | pick(["rule", "result"])]}'
    name: python-example
    results:
    - rule: mutate-virtual-service-istio-gateway-passthrough-ingressgateway
    result: skip
    - rule: mutate-virtual-service-istio-gateway-public-ingressgateway
    result: pass
    - rule: mutate-virtual-service-istio-system-passthrough
    result: skip
    - rule: mutate-virtual-service-istio-system-public
    result: skip
    • The output from this command can be reduced by removing the rules mutate-virtual-service-istio-system-public and mutate-virtual-service-istio-system-passthrough from the Cluster Policy, since these rules should no longer have any effect.
  • Remove the Gateways and certificate Secrets in the istio-system namespace.

  • Reconcile the SmoothGlue IaC with the modifications made to the load balancer configuration. Ensure that the following values are set in your env.hcl file:

    locals {
    cluster_inputs = {
    passthrough_ingress_gateway_http_port = 32081
    passthrough_ingress_gateway_https_port = 32444
    passthrough_ingress_gateway_status_port = 32022
    public_ingress_gateway_http_port = 30081
    public_ingress_gateway_https_port = 30444
    public_ingress_gateway_status_port = 30022
    }
    }
    • Review the terragrunt plan output thoroughly, and import any manually created Terraform resources as appropriate before applying. resources