Skip to content

Cluster Upgrades

Upgrading Talos

  1. Check the current Talos cluster info and version.

    talosctl get info
    talosctl get version
    
  2. Review the new features, support matrix, and upgrade steps in the Talos Docs.

    The following links are for Talos v1.11:

  3. Locally update the value of local.version in platform-layer/terraform/talos-cluster.tf with the appropriate target version.

    locals {
        ...
        talos_install_version = "X.X.X"
        ...
    }
    

    Note

    Talos requires step-upgrading to the latest patch of the current minor release before the latest patch of the next minor version. Upgrading from v1.10.2 to v.1.11.3, for example, would require first upgrading to v1.10.7 before upgrading to v1.11.3.

  4. Generate and display the new Terraform outputs.

    cd platform-layer/terraform/
    TF_WORKSPACE=homeops-pf-layer terraform init
    TF_WORKSPACE=homeops-pf-layer terraform plan -refresh-only
    
  5. Upgrade one of the Talos nodes with the new installer image from the Terraform output.

    talosctl upgrade \
     --preserve \
     --image 'factory.talos.dev/nocloud-installer/88d1f7a5c4f1d3aba7df787c448c1d3d008ed29cfb34af53fa0df4336a56040b:v1.10.7' \
     --nodes '192.168.20.X'
    
  6. Repeat step 5 for each node in the cluster.

  7. Repeat steps 1-6, step-upgrading until the final target version is reached.
  8. Run terraform apply to view and reconcile any changes.

    TF_WORKSPACE=homeops-pf-layer terraform apply
    
  9. Commit, push, and merge the changes.

Replacing nodes

Should the need arise to remove a node from a Talos cluster, the following steps will ensure it is removed cleanly. These steps are also partially described in Scale down a Talos cluster.

  1. Review the current cluster info and nodes.

    talosctl get info
    talosctl etcd status
    kubectl get nodes -o wide
    
  2. Ensure all CNPG clusters have at least one replica on other nodes.

    kubectl get clusters -A
    kubectl get pods -l cnpg.io/cluster -A -o wide
    
  3. Reset the Talos node.

    talosctl -n <NODE-IP-TO-REMOVE> reset
    
  4. Remove the node from the Kubernetes cluster.

    kubectl delete node <NODE-NAME-TO-REMOVE>
    
  5. Validate the node was fully removed.

    talosctl get members
    talosctl etcd members
    kubectl get nodes -o wide
    

    Note

    Since your talosconfig still contains the removed node, expect a connection error for the talosctl commands. However, the output from the other nodes should no longer include the removed node.

  6. Install/provision the replacement Talos node.

  7. Delete the removed node's talos_machine_configuration_apply from Terraform state.

    This is required because the resource only applies machine configurations. Terraform state; it does not compare the defined configs with the current configs.

    cd platform-layer/terraform/
    TF_WORKSPACE=homeops-pf-layer terraform state rm 'module.talos_cluster.talos_machine_configuration_apply.controlplane["<NODE-IP-TO-REMOVE>"]'
    
  8. If needed, modify Terraform to account for the new nodes details.

  9. Run terraform apply locally to apply the changes.

    TF_WORKSPACE=homeops-pf-layer terraform init
    TF_WORKSPACE=homeops-pf-layer terraform apply
    
  10. Delete any CNPG instance PVCs waiting on the removed node

    First, check for failed CNPG cluster instances and PVCs associated with the removed node:

    kubectl get clusters -A -o json | jq '.items[]|{namespace:.metadata.namespace,cluster:.metadata.name,"failed-instances":.status.instancesStatus.failed}'
    kubectl get pvc -l "cnpg.io/cluster" -A -o json | jq '.items[].metadata|{pvc:"\(.namespace)/\(.name)","pvc-node":.annotations."volume.kubernetes.io/selected-node","cnpg-cluster":.labels."cnpg.io/cluster","cnpg-instance":.labels."cnpg.io/instanceName"}'
    

    Then, destroy any cluster instances associated with the removed node:

    kubectl cnpg destroy <CLUSTER> <INSTANCE> -n <NAMESPACE>
    kubectl cnpg destroy <CLUSTER> <INSTANCE> -n <NAMESPACE>
    ...
    kubectl cnpg destroy <CLUSTER> <INSTANCE> -n <NAMESPACE>