This is the multi-page printable view of this section. Click here to print.
Kubernetes Guides
- 1: Configuration
- 1.1: Ceph Storage cluster with Rook
- 1.2: Cluster Endpoint
- 1.3: Deploying Metrics Server
- 1.4: Discovery
- 1.5: Pod Security
- 1.6: Storage
- 2: Network
- 2.1: Deploying Cilium CNI
- 2.2: KubeSpan
- 3: Upgrading Kubernetes
1 - Configuration
1.1 - Ceph Storage cluster with Rook
Preparation
Talos Linux reserves an entire disk for the OS installation, so machines with multiple available disks are needed for a reliable Ceph cluster with Rook and Talos Linux.
Rook requires that the block devices or partitions used by Ceph have no partitions or formatted filesystems before use.
Rook also requires a minimum Kubernetes version of v1.16
and Helm v3.0
for installation of charts.
It is highly recommended that the Rook Ceph overview is read and understood before deploying a Ceph cluster with Rook.
Installation
Creating a Ceph cluster with Rook requires two steps; first the Rook Operator needs to be installed which can be done with a Helm Chart.
The example below installs the Rook Operator into the rook-ceph
namespace, which is the default for a Ceph cluster with Rook.
$ helm repo add rook-release https://charts.rook.io/release
"rook-release" has been added to your repositories
$ helm install --create-namespace --namespace rook-ceph rook-ceph rook-release/rook-ceph
W0327 17:52:44.277830 54987 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0327 17:52:44.612243 54987 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
NAME: rook-ceph
LAST DEPLOYED: Sun Mar 27 17:52:42 2022
NAMESPACE: rook-ceph
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
The Rook Operator has been installed. Check its status by running:
kubectl --namespace rook-ceph get pods -l "app=rook-ceph-operator"
Visit https://rook.io/docs/rook/latest for instructions on how to create and configure Rook clusters
Important Notes:
- You must customize the 'CephCluster' resource in the sample manifests for your cluster.
- Each CephCluster must be deployed to its own namespace, the samples use `rook-ceph` for the namespace.
- The sample manifests assume you also installed the rook-ceph operator in the `rook-ceph` namespace.
- The helm chart includes all the RBAC required to create a CephCluster CRD in the same namespace.
- Any disk devices you add to the cluster in the 'CephCluster' must be empty (no filesystem and no partitions).
Once that is complete, the Ceph cluster can be installed with the official Helm Chart. The Chart can be installed with default values, which will attempt to use all nodes in the Kubernetes cluster, and all unused disks on each node for Ceph storage, and make available block storage, object storage, as well as a shared filesystem. Generally more specific node/device/cluster configuration is used, and the Rook documentation explains all the available options in detail. For this example the defaults will be adequate.
$ helm install --create-namespace --namespace rook-ceph rook-ceph-cluster --set operatorNamespace=rook-ceph rook-release/rook-ceph-cluster
NAME: rook-ceph-cluster
LAST DEPLOYED: Sun Mar 27 18:12:46 2022
NAMESPACE: rook-ceph
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
The Ceph Cluster has been installed. Check its status by running:
kubectl --namespace rook-ceph get cephcluster
Visit https://rook.github.io/docs/rook/latest/ceph-cluster-crd.html for more information about the Ceph CRD.
Important Notes:
- You can only deploy a single cluster per namespace
- If you wish to delete this cluster and start fresh, you will also have to wipe the OSD disks using `sfdisk`
Now the Ceph cluster configuration has been created, the Rook operator needs time to install the Ceph cluster and bring all the components online. The progression of the Ceph cluster state can be followed with the following command.
$ watch kubectl --namespace rook-ceph get cephcluster rook-ceph
Every 2.0s: kubectl --namespace rook-ceph get cephcluster rook-ceph
NAME DATADIRHOSTPATH MONCOUNT AGE PHASE MESSAGE HEALTH EXTERNAL
rook-ceph /var/lib/rook 3 57s Progressing Configuring Ceph Mons
Depending on the size of the Ceph cluster and the availability of resources the Ceph cluster should become available, and with it the storage classes that can be used with Kubernetes Physical Volumes.
$ kubectl --namespace rook-ceph get cephcluster rook-ceph
NAME DATADIRHOSTPATH MONCOUNT AGE PHASE MESSAGE HEALTH EXTERNAL
rook-ceph /var/lib/rook 3 40m Ready Cluster created successfully HEALTH_OK
$ kubectl get storageclass
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
ceph-block (default) rook-ceph.rbd.csi.ceph.com Delete Immediate true 77m
ceph-bucket rook-ceph.ceph.rook.io/bucket Delete Immediate false 77m
ceph-filesystem rook-ceph.cephfs.csi.ceph.com Delete Immediate true 77m
Talos Linux Considerations
It is important to note that a Rook Ceph cluster saves cluster information directly onto the node (by default dataDirHostPath
is set to /var/lib/rook
).
If running only a single mon
instance, cluster management is little bit more involved, as any time a Talos Linux node is reconfigured or upgraded, the partition that stores the /var
file system is wiped, but the --preserve
option of talosctl upgrade
will ensure that doesn’t happen.
By default, Rook configues Ceph to have 3 mon
instances, in which case the data stored in dataDirHostPath
can be regenerated from the other mon
instances.
So when performing maintenance on a Talos Linux node with a Rook Ceph cluster (e.g. upgrading the Talos Linux version), it is imperative that care be taken to maintain the health of the Ceph cluster.
Before upgrading, you should always check the health status of the Ceph cluster to ensure that it is healthy.
$ kubectl --namespace rook-ceph get cephclusters.ceph.rook.io rook-ceph
NAME DATADIRHOSTPATH MONCOUNT AGE PHASE MESSAGE HEALTH EXTERNAL
rook-ceph /var/lib/rook 3 98m Ready Cluster created successfully HEALTH_OK
If it is, you can begin the upgrade process for the Talos Linux node, during which time the Ceph cluster will become unhealthy as the node is reconfigured. Before performing any other action on the Talos Linux nodes, the Ceph cluster must return to a healthy status.
$ talosctl upgrade --nodes 172.20.15.5 --image ghcr.io/talos-systems/installer:v0.14.3
NODE ACK STARTED
172.20.15.5 Upgrade request received 2022-03-27 20:29:55.292432887 +0200 CEST m=+10.050399758
$ kubectl --namespace rook-ceph get cephclusters.ceph.rook.io
NAME DATADIRHOSTPATH MONCOUNT AGE PHASE MESSAGE HEALTH EXTERNAL
rook-ceph /var/lib/rook 3 99m Progressing Configuring Ceph Mgr(s) HEALTH_WARN
$ kubectl --namespace rook-ceph wait --timeout=1800s --for=jsonpath='{.status.ceph.health}=HEALTH_OK' rook-ceph
cephcluster.ceph.rook.io/rook-ceph condition met
The above steps need to be performed for each Talos Linux node undergoing maintenance, one at a time.
Cleaning Up
Rook Ceph Cluster Removal
Removing a Rook Ceph cluster requires a few steps, starting with signalling to Rook that the Ceph cluster is really being destroyed. Then all Persistent Volumes (and Claims) backed by the Ceph cluster must be deleted, followed by the Storage Classes and the Ceph storage types.
$ kubectl --namespace rook-ceph patch cephcluster rook-ceph --type merge -p '{"spec":{"cleanupPolicy":{"confirmation":"yes-really-destroy-data"}}}'
cephcluster.ceph.rook.io/rook-ceph patched
$ kubectl delete storageclasses ceph-block ceph-bucket ceph-filesystem
storageclass.storage.k8s.io "ceph-block" deleted
storageclass.storage.k8s.io "ceph-bucket" deleted
storageclass.storage.k8s.io "ceph-filesystem" deleted
$ kubectl --namespace rook-ceph delete cephblockpools ceph-blockpool
cephblockpool.ceph.rook.io "ceph-blockpool" deleted
$ kubectl --namespace rook-ceph delete cephobjectstore ceph-objectstore
cephobjectstore.ceph.rook.io "ceph-objectstore" deleted
$ kubectl --namespace rook-ceph delete cephfilesystem ceph-filesystem
cephfilesystem.ceph.rook.io "ceph-filesystem" deleted
Once that is complete, the Ceph cluster itself can be removed, along with the Rook Ceph cluster Helm chart installation.
$ kubectl --namespace rook-ceph delete cephcluster rook-ceph
cephcluster.ceph.rook.io "rook-ceph" deleted
$ helm --namespace rook-ceph uninstall rook-ceph-cluster
release "rook-ceph-cluster" uninstalled
If needed, the Rook Operator can also be removed along with all the Custom Resource Definitions that it created.
$ helm --namespace rook-ceph uninstall rook-ceph
W0328 12:41:14.998307 147203 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
These resources were kept due to the resource policy:
[CustomResourceDefinition] cephblockpools.ceph.rook.io
[CustomResourceDefinition] cephbucketnotifications.ceph.rook.io
[CustomResourceDefinition] cephbuckettopics.ceph.rook.io
[CustomResourceDefinition] cephclients.ceph.rook.io
[CustomResourceDefinition] cephclusters.ceph.rook.io
[CustomResourceDefinition] cephfilesystemmirrors.ceph.rook.io
[CustomResourceDefinition] cephfilesystems.ceph.rook.io
[CustomResourceDefinition] cephfilesystemsubvolumegroups.ceph.rook.io
[CustomResourceDefinition] cephnfses.ceph.rook.io
[CustomResourceDefinition] cephobjectrealms.ceph.rook.io
[CustomResourceDefinition] cephobjectstores.ceph.rook.io
[CustomResourceDefinition] cephobjectstoreusers.ceph.rook.io
[CustomResourceDefinition] cephobjectzonegroups.ceph.rook.io
[CustomResourceDefinition] cephobjectzones.ceph.rook.io
[CustomResourceDefinition] cephrbdmirrors.ceph.rook.io
[CustomResourceDefinition] objectbucketclaims.objectbucket.io
[CustomResourceDefinition] objectbuckets.objectbucket.io
release "rook-ceph" uninstalled
$ kubectl delete crds cephblockpools.ceph.rook.io cephbucketnotifications.ceph.rook.io cephbuckettopics.ceph.rook.io \
cephclients.ceph.rook.io cephclusters.ceph.rook.io cephfilesystemmirrors.ceph.rook.io \
cephfilesystems.ceph.rook.io cephfilesystemsubvolumegroups.ceph.rook.io \
cephnfses.ceph.rook.io cephobjectrealms.ceph.rook.io cephobjectstores.ceph.rook.io \
cephobjectstoreusers.ceph.rook.io cephobjectzonegroups.ceph.rook.io cephobjectzones.ceph.rook.io \
cephrbdmirrors.ceph.rook.io objectbucketclaims.objectbucket.io objectbuckets.objectbucket.io
customresourcedefinition.apiextensions.k8s.io "cephblockpools.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephbucketnotifications.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephbuckettopics.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephclients.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephclusters.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephfilesystemmirrors.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephfilesystems.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephfilesystemsubvolumegroups.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephnfses.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephobjectrealms.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephobjectstores.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephobjectstoreusers.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephobjectzonegroups.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephobjectzones.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephrbdmirrors.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "objectbucketclaims.objectbucket.io" deleted
customresourcedefinition.apiextensions.k8s.io "objectbuckets.objectbucket.io" deleted
Talos Linux Rook Metadata Removal
If the Rook Operator is cleanly removed following the above process, the node metadata and disks should be clean and ready to be re-used.
In the case of an unclean cluster removal, there may be still a few instances of metadata stored on the system disk, as well as the partition information on the storage disks.
First the node metadata needs to be removed, make sure to update the nodeName
with the actual name of a storage node that needs cleaning, and path
with the Rook configuration dataDirHostPath
set when installing the chart.
The following will need to be repeated for each node used in the Rook Ceph cluster.
$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: disk-clean
spec:
restartPolicy: Never
nodeName: <storage-node-name>
volumes:
- name: rook-data-dir
hostPath:
path: <dataDirHostPath>
containers:
- name: disk-clean
image: busybox
securityContext:
privileged: true
volumeMounts:
- name: rook-data-dir
mountPath: /node/rook-data
command: ["/bin/sh", "-c", "rm -rf /node/rook-data/*"]
EOF
pod/disk-clean created
$ kubectl wait --timeout=900s --for=jsonpath='{.status.phase}=Succeeded' pod disk-clean
pod/disk-clean condition met
$ kubectl delete pod disk-clean
pod "disk-clean" deleted
Lastly, the disks themselves need the partition and filesystem data wiped before they can be reused.
Again, the following as to be repeated for each node and disk used in the Rook Ceph cluster, updating nodeName
and of=
in the command
as needed.
$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: disk-wipe
spec:
restartPolicy: Never
nodeName: <storage-node-name>
containers:
- name: disk-wipe
image: busybox
securityContext:
privileged: true
command: ["/bin/sh", "-c", "dd if=/dev/zero bs=1M count=100 oflag=direct of=<device>"]
EOF
pod/disk-wipe created
$ kubectl wait --timeout=900s --for=jsonpath='{.status.phase}=Succeeded' pod disk-wipe
pod/disk-wipe condition met
$ kubectl delete pod disk-clean
pod "disk-wipe" deleted
1.2 - Cluster Endpoint
In this section, we will step through the configuration of a Talos based Kubernetes cluster. There are three major components we will configure:
apid
andtalosctl
- the master nodes
- the worker nodes
Talos enforces a high level of security by using mutual TLS for authentication and authorization.
We recommend that the configuration of Talos be performed by a cluster owner. A cluster owner should be a person of authority within an organization, perhaps a director, manager, or senior member of a team. They are responsible for storing the root CA, and distributing the PKI for authorized cluster administrators.
Recommended settings
Talos runs great out of the box, but if you tweak some minor settings it will make your life a lot easier in the future. This is not a requirement, but rather a document to explain some key settings.
Endpoint
To configure the talosctl
endpoint, it is recommended you use a resolvable DNS name.
This way, if you decide to upgrade to a multi-controlplane cluster you only have to add the ip address to the hostname configuration.
The configuration can either be done on a Loadbalancer, or simply trough DNS.
For example:
This is in the config file for the cluster e.g. controlplane.yaml and worker.yaml. for more details, please see: v1alpha1 endpoint configuration
.....
cluster:
controlPlane:
endpoint: https://endpoint.example.local:6443
.....
If you have a DNS name as the endpoint, you can upgrade your talos cluster with multiple controlplanes in the future (if you don’t have a multi-controlplane setup from the start) Using a DNS name generates the corresponding Certificates (Kubernetes and Talos) for the correct hostname.
1.3 - Deploying Metrics Server
Metrics Server enables use of the Horizontal Pod Autoscaler and Vertical Pod Autoscaler. It does this by gathering metrics data from the kubelets in a cluster. By default, the certificates in use by the kubelets will not be recognized by metrics-server. This can be solved by either configuring metrics-server to do no validation of the TLS certificates, or by modifying the kubelet configuration to rotate its certificates and use ones that will be recognized by metrics-server.
Node Configuration
To enable kubelet certificate rotation, all nodes should have the following Machine Config snippet:
machine:
kubelet:
extraArgs:
rotate-server-certificates: true
Install During Bootstrap
We will want to ensure that new certificates for the kubelets are approved automatically. This can easily be done with the Kubelet Serving Certificate Approver, which will automatically approve the Certificate Signing Requests generated by the kubelets.
We can have Kubelet Serving Certificate Approver and metrics-server installed on the cluster automatically during bootstrap by adding the following snippet to the Cluster Config of the node that will be handling the bootstrap process:
cluster:
extraManifests:
- https://raw.githubusercontent.com/alex1989hu/kubelet-serving-cert-approver/main/deploy/standalone-install.yaml
- https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
Install After Bootstrap
If you choose not to use extraManifests
to install Kubelet Serving Certificate Approver and metrics-server during bootstrap, you can install them once the cluster is online using kubectl
:
kubectl apply -f https://raw.githubusercontent.com/alex1989hu/kubelet-serving-cert-approver/main/deploy/standalone-install.yaml
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
1.4 - Discovery
Video Walkthrough
To see a live demo of Cluster Discovery, see the video below:
Registries
Peers are aggregated from a number of optional registries.
By default, Talos will use the kubernetes
and service
registries.
Either one can be disabled.
To disable a registry, set disabled
to true
(this option is the same for all registries):
For example, to disable the service
registry:
cluster:
discovery:
enabled: true
registries:
service:
disabled: true
Disabling all registries effectively disables member discovery altogether.
Talos supports the
kubernetes
andservice
registries.
Kubernetes
registry uses Kubernetes Node
resource data and additional Talos annotations:
$ kubectl describe node <nodename>
Annotations: cluster.talos.dev/node-id: Utoh3O0ZneV0kT2IUBrh7TgdouRcUW2yzaaMl4VXnCd
networking.talos.dev/assigned-prefixes: 10.244.0.0/32,10.244.0.1/24
networking.talos.dev/self-ips: 172.20.0.2,fd83:b1f7:fcb5:2802:8c13:71ff:feaf:7c94
...
Service
registry uses external Discovery Service to exchange encrypted information about cluster members.
Resource Definitions
Talos provides seven resources that can be used to introspect the new discovery and KubeSpan features.
Discovery
Identities
The node’s unique identity (base62 encoded random 32 bytes) can be obtained with:
Note: Using base62 allows the ID to be URL encoded without having to use the ambiguous URL-encoding version of base64.
$ talosctl get identities -o yaml
...
spec:
nodeId: Utoh3O0ZneV0kT2IUBrh7TgdouRcUW2yzaaMl4VXnCd
Node identity is used as the unique Affiliate
identifier.
Node identity resource is preserved in the STATE partition in node-identity.yaml
file.
Node identity is preserved across reboots and upgrades, but it is regenerated if the node is reset (wiped).
Affiliates
An affiliate is a proposed member attributed to the fact that the node has the same cluster ID and secret.
$ talosctl get affiliates
ID VERSION HOSTNAME MACHINE TYPE ADDRESSES
2VfX3nu67ZtZPl57IdJrU87BMjVWkSBJiL9ulP9TCnF 2 talos-default-master-2 controlplane ["172.20.0.3","fd83:b1f7:fcb5:2802:986b:7eff:fec5:889d"]
6EVq8RHIne03LeZiJ60WsJcoQOtttw1ejvTS6SOBzhUA 2 talos-default-worker-1 worker ["172.20.0.5","fd83:b1f7:fcb5:2802:cc80:3dff:fece:d89d"]
NVtfu1bT1QjhNq5xJFUZl8f8I8LOCnnpGrZfPpdN9WlB 2 talos-default-worker-2 worker ["172.20.0.6","fd83:b1f7:fcb5:2802:2805:fbff:fe80:5ed2"]
Utoh3O0ZneV0kT2IUBrh7TgdouRcUW2yzaaMl4VXnCd 4 talos-default-master-1 controlplane ["172.20.0.2","fd83:b1f7:fcb5:2802:8c13:71ff:feaf:7c94"]
b3DebkPaCRLTLLWaeRF1ejGaR0lK3m79jRJcPn0mfA6C 2 talos-default-master-3 controlplane ["172.20.0.4","fd83:b1f7:fcb5:2802:248f:1fff:fe5c:c3f"]
One of the Affiliates
with the ID
matching node identity is populated from the node data, other Affiliates
are pulled from the registries.
Enabled discovery registries run in parallel and discovered data is merged to build the list presented above.
Details about data coming from each registry can be queried from the cluster-raw
namespace:
$ talosctl get affiliates --namespace=cluster-raw
ID VERSION HOSTNAME MACHINE TYPE ADDRESSES
k8s/2VfX3nu67ZtZPl57IdJrU87BMjVWkSBJiL9ulP9TCnF 3 talos-default-master-2 controlplane ["172.20.0.3","fd83:b1f7:fcb5:2802:986b:7eff:fec5:889d"]
k8s/6EVq8RHIne03LeZiJ60WsJcoQOtttw1ejvTS6SOBzhUA 2 talos-default-worker-1 worker ["172.20.0.5","fd83:b1f7:fcb5:2802:cc80:3dff:fece:d89d"]
k8s/NVtfu1bT1QjhNq5xJFUZl8f8I8LOCnnpGrZfPpdN9WlB 2 talos-default-worker-2 worker ["172.20.0.6","fd83:b1f7:fcb5:2802:2805:fbff:fe80:5ed2"]
k8s/b3DebkPaCRLTLLWaeRF1ejGaR0lK3m79jRJcPn0mfA6C 3 talos-default-master-3 controlplane ["172.20.0.4","fd83:b1f7:fcb5:2802:248f:1fff:fe5c:c3f"]
service/2VfX3nu67ZtZPl57IdJrU87BMjVWkSBJiL9ulP9TCnF 23 talos-default-master-2 controlplane ["172.20.0.3","fd83:b1f7:fcb5:2802:986b:7eff:fec5:889d"]
service/6EVq8RHIne03LeZiJ60WsJcoQOtttw1ejvTS6SOBzhUA 26 talos-default-worker-1 worker ["172.20.0.5","fd83:b1f7:fcb5:2802:cc80:3dff:fece:d89d"]
service/NVtfu1bT1QjhNq5xJFUZl8f8I8LOCnnpGrZfPpdN9WlB 20 talos-default-worker-2 worker ["172.20.0.6","fd83:b1f7:fcb5:2802:2805:fbff:fe80:5ed2"]
service/b3DebkPaCRLTLLWaeRF1ejGaR0lK3m79jRJcPn0mfA6C 14 talos-default-master-3 controlplane ["172.20.0.4","fd83:b1f7:fcb5:2802:248f:1fff:fe5c:c3f"]
Each Affiliate
ID is prefixed with k8s/
for data coming from the Kubernetes registry and with service/
for data coming from the discovery service.
Members
A member is an affiliate that has been approved to join the cluster. The members of the cluster can be obtained with:
$ talosctl get members
ID VERSION HOSTNAME MACHINE TYPE OS ADDRESSES
talos-default-master-1 2 talos-default-master-1 controlplane Talos (v1.0.6) ["172.20.0.2","fd83:b1f7:fcb5:2802:8c13:71ff:feaf:7c94"]
talos-default-master-2 1 talos-default-master-2 controlplane Talos (v1.0.6) ["172.20.0.3","fd83:b1f7:fcb5:2802:986b:7eff:fec5:889d"]
talos-default-master-3 1 talos-default-master-3 controlplane Talos (v1.0.6) ["172.20.0.4","fd83:b1f7:fcb5:2802:248f:1fff:fe5c:c3f"]
talos-default-worker-1 1 talos-default-worker-1 worker Talos (v1.0.6) ["172.20.0.5","fd83:b1f7:fcb5:2802:cc80:3dff:fece:d89d"]
talos-default-worker-2 1 talos-default-worker-2 worker Talos (v1.0.6) ["172.20.0.6","fd83:b1f7:fcb5:2802:2805:fbff:fe80:5ed2"]
1.5 - Pod Security
Kubernetes deprecated Pod Security Policy as of v1.21, and it is going to be removed in v1.25. Pod Security Policy was replaced with Pod Security Admission. Pod Security Admission is alpha in v1.22 (requires a feature gate) and beta in v1.23 (enabled by default).
In this guide we are going to enable and configure Pod Security Admission in Talos.
Configuration
Prepare the following machine configuration patch and store it in the pod-security-patch.yaml
:
- op: add
path: /cluster/apiServer/admissionControl
value:
- name: PodSecurity
configuration:
apiVersion: pod-security.admission.config.k8s.io/v1alpha1
kind: PodSecurityConfiguration
defaults:
enforce: "baseline"
enforce-version: "latest"
audit: "restricted"
audit-version: "latest"
warn: "restricted"
warn-version: "latest"
exemptions:
usernames: []
runtimeClasses: []
namespaces: [kube-system]
This is a cluster-wide configuration for the Pod Security Admission plugin:
- by default
baseline
Pod Security Standard profile is enforced - more strict
restricted
profile is not enforced, but API server warns about found issues
Generate Talos machine configuration applying the patch above:
talosctl gen config cluster1 https://<IP>:6443/ --config-patch-control-plane @../pod-security-patch.yaml
Deploy Talos using the generated machine configuration.
Verify current admission plugin configuration with:
$ talosctl get kubernetescontrolplaneconfigs apiserver-admission-control -o yaml
node: 172.20.0.2
metadata:
namespace: config
type: KubernetesControlPlaneConfigs.config.talos.dev
id: apiserver-admission-control
version: 1
owner: config.K8sControlPlaneController
phase: running
created: 2022-02-22T20:28:21Z
updated: 2022-02-22T20:28:21Z
spec:
config:
- name: PodSecurity
configuration:
apiVersion: pod-security.admission.config.k8s.io/v1alpha1
defaults:
audit: restricted
audit-version: latest
enforce: baseline
enforce-version: latest
warn: restricted
warn-version: latest
exemptions:
namespaces:
- kube-system
runtimeClasses: []
usernames: []
kind: PodSecurityConfiguration
Usage
Create a deployment that satisfies the baseline
policy but gives warnings on restricted
policy:
$ kubectl create deployment nginx --image=nginx
Warning: would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "nginx" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "nginx" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "nginx" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "nginx" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
deployment.apps/nginx created
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-85b98978db-j68l8 1/1 Running 0 2m3s
Create a daemonset which fails to meet requirements of the baseline
policy:
apiVersion: apps/v1
kind: DaemonSet
metadata:
labels:
app: debug-container
name: debug-container
namespace: default
spec:
revisionHistoryLimit: 10
selector:
matchLabels:
app: debug-container
template:
metadata:
creationTimestamp: null
labels:
app: debug-container
spec:
containers:
- args:
- "360000"
command:
- /bin/sleep
image: ubuntu:latest
imagePullPolicy: IfNotPresent
name: debug-container
resources: {}
securityContext:
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirstWithHostNet
hostIPC: true
hostPID: true
hostNetwork: true
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
updateStrategy:
rollingUpdate:
maxSurge: 0
maxUnavailable: 1
type: RollingUpdate
$ kubectl apply -f debug.yaml
Warning: would violate PodSecurity "restricted:latest": host namespaces (hostNetwork=true, hostPID=true, hostIPC=true), privileged (container "debug-container" must not set securityContext.privileged=true), allowPrivilegeEscalation != false (container "debug-container" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "debug-container" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "debug-container" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "debug-container" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
daemonset.apps/debug-container created
Daemonset debug-container
gets created, but no pods are scheduled:
$ kubectl get ds
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
debug-container 0 0 0 0 0 <none> 34s
Pod Security Admission plugin errors are in the daemonset events:
$ kubectl describe ds debug-container
...
Warning FailedCreate 92s daemonset-controller Error creating: pods "debug-container-kwzdj" is forbidden: violates PodSecurity "baseline:latest": host namespaces (hostNetwork=true, hostPID=true, hostIPC=true), privileged (container "debug-container" must not set securityContext.privileged=true)
Pod Security Admission configuration can also be overridden on a namespace level:
$ kubectl label ns default pod-security.kubernetes.io/enforce=privileged
namespace/default labeled
$ kubectl get ds
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
debug-container 2 2 0 2 0 <none> 4s
As enforce policy was updated to the privileged
for the default
namespace, debug-container
is now successfully running.
1.6 - Storage
In Kubernetes, using storage in the right way is well-facilitated by the API.
However, unless you are running in a major public cloud, that API may not be hooked up to anything.
This frequently sends users down a rabbit hole of researching all the various options for storage backends for their platform, for Kubernetes, and for their workloads.
There are a lot of options out there, and it can be fairly bewildering.
For Talos, we try to limit the options somewhat to make the decision-making easier.
Public Cloud
If you are running on a major public cloud, use their block storage. It is easy and automatic.
Storage Clusters
Sidero Labs recommends having separate disks (apart from the Talos install disk) to be used for storage.
Redundancy, scaling capabilities, reliability, speed, maintenance load, and ease of use are all factors you must consider when managing your own storage.
Running a storage cluster can be a very good choice when managing your own storage, and there are two projects we recommend, depending on your situation.
If you need vast amounts of storage composed of more than a dozen or so disks, we recommend you use Rook to manage Ceph. Also, if you need both mount-once and mount-many capabilities, Ceph is your answer. Ceph also bundles in an S3-compatible object store. The down side of Ceph is that there are a lot of moving parts.
Please note that most people should never use mount-many semantics. NFS is pervasive because it is old and easy, not because it is a good idea. While it may seem like a convenience at first, there are all manner of locking, performance, change control, and reliability concerns inherent in any mount-many situation, so we strongly recommend you avoid this method.
If your storage needs are small enough to not need Ceph, use Mayastor.
Rook/Ceph
Ceph is the grandfather of open source storage clusters. It is big, has a lot of pieces, and will do just about anything. It scales better than almost any other system out there, open source or proprietary, being able to easily add and remove storage over time with no downtime, safely and easily. It comes bundled with RadosGW, an S3-compatible object store; CephFS, a NFS-like clustered filesystem; and RBD, a block storage system.
With the help of Rook, the vast majority of the complexity of Ceph is hidden away by a very robust operator, allowing you to control almost everything about your Ceph cluster from fairly simple Kubernetes CRDs.
So if Ceph is so great, why not use it for everything?
Ceph can be rather slow for small clusters. It relies heavily on CPUs and massive parallelisation to provide good cluster performance, so if you don’t have much of those dedicated to Ceph, it is not going to be well-optimised for you. Also, if your cluster is small, just running Ceph may eat up a significant amount of the resources you have available.
Troubleshooting Ceph can be difficult if you do not understand its architecture. There are lots of acronyms and the documentation assumes a fair level of knowledge. There are very good tools for inspection and debugging, but this is still frequently seen as a concern.
Mayastor
Mayastor is an OpenEBS project built in Rust utilising the modern NVMEoF system. (Despite the name, Mayastor does not require you to have NVME drives.) It is fast and lean but still cluster-oriented and cloud native. Unlike most of the other OpenEBS project, it is not built on the ancient iSCSI system.
Unlike Ceph, Mayastor is just a block store. It focuses on block storage and does it well. It is much less complicated to set up than Ceph, but you probably wouldn’t want to use it for more than a few dozen disks.
Mayastor is new, maybe too new. If you’re looking for something well-tested and battle-hardened, this is not it. However, if you’re looking for something lean, future-oriented, and simpler than Ceph, it might be a great choice.
Video Walkthrough
To see a live demo of this section, see the video below:
Prep Nodes
Either during initial cluster creation or on running worker nodes, several machine config values should be edited.
(This information is gathered from the Mayastor documentation.)
We need to set the vm.nr_hugepages
sysctl and add openebs.io/engine=mayastor
labels to the nodes which are meant to be storage nodes.
This can be done with talosctl patch machineconfig
or via config patches during talosctl gen config
.
Some examples are shown below: modify as needed.
Using gen config
talosctl gen config my-cluster https://mycluster.local:6443 --config-patch '[{"op": "add", "path": "/machine/sysctls", "value": {"vm.nr_hugepages": "1024"}}, {"op": "add", "path": "/machine/kubelet/extraArgs", "value": {"node-labels": "openebs.io/engine=mayastor"}}]'
Patching an existing node
talosctl patch --mode=no-reboot machineconfig -n <node ip> --patch '[{"op": "add", "path": "/machine/sysctls", "value": {"vm.nr_hugepages": "1024"}}, {"op": "add", "path": "/machine/kubelet/extraArgs", "value": {"node-labels": "openebs.io/engine=mayastor"}}]'
Note: If you are adding/updating the
vm.nr_hugepages
on a node which already had theopenebs.io/engine=mayastor
label set, you’d need to restart kubelet so that it picks up the new value, by issuing the following command
talosctl -n <node ip> service kubelet restart
Deploy Mayastor
Continue setting up Mayastor using the official documentation.
NFS
NFS is an old pack animal long past its prime. NFS is slow, has all kinds of bottlenecks involving contention, distributed locking, single points of service, and more. However, it is supported by a wide variety of systems. You don’t want to use it unless you have to, but unfortunately, that “have to” is too frequent.
The NFS client is part of the kubelet
image maintained by the Talos team.
This means that the version installed in your running kubelet
is the version of NFS supported by Talos.
You can reduce some of the contention problems by parceling Persistent Volumes from separate underlying directories.
Object storage
Ceph comes with an S3-compatible object store, but there are other options, as well. These can often be built on top of other storage backends. For instance, you may have your block storage running with Mayastor but assign a Pod a large Persistent Volume to serve your object store.
One of the most popular open source add-on object stores is MinIO.
Others (iSCSI)
The most common remaining systems involve iSCSI in one form or another. These include the original OpenEBS, Rancher’s Longhorn, and many proprietary systems. Unfortunately, Talos does not support iSCSI-based systems. iSCSI in Linux is facilitated by open-iscsi. This system was designed long before containers caught on, and it is not well suited to the task, especially when coupled with a read-only host operating system.
One day, we hope to work out a solution for facilitating iSCSI-based systems, but this is not yet available.
2 - Network
2.1 - Deploying Cilium CNI
From v1.9 onwards Cilium does no longer provide a one-liner install manifest that can be used to install Cilium on a node via kubectl apply -f
or passing it in as an extra url in the urls
part in the Talos machine configuration.
Installing Cilium the new way via the
cilium
cli is broken, so we’ll be usinghelm
to install Cilium. For more information: Install with CLI fails, works with Helm
Refer to Installing with Helm for more information.
First we’ll need to add the helm repo for Cilium.
helm repo add cilium https://helm.cilium.io/
helm repo update
This documentation will outline installing Cilium CNI v1.11.2 on Talos in four different ways. Adhering to Talos principles we’ll deploy Cilium with IPAM mode set to Kubernetes. Each method can either install Cilium using kube proxy (default) or without: Kubernetes Without kube-proxy
Machine config preparation
When generating the machine config for a node set the CNI to none. For example using a config patch:
talosctl gen config \
my-cluster https://mycluster.local:6443 \
--config-patch '[{"op":"add", "path": "/cluster/network/cni", "value": {"name": "none"}}]'
Or if you want to deploy Cilium in strict mode without kube-proxy, you also need to disable kube proxy:
talosctl gen config \
my-cluster https://mycluster.local:6443 \
--config-patch '[{"op": "add", "path": "/cluster/proxy", "value": {"disabled": true}}, {"op":"add", "path": "/cluster/network/cni", "value": {"name": "none"}}]'
Method 1: Helm install
After applying the machine config and bootstrapping Talos will appear to hang on phase 18/19 with the message: retrying error: node not ready. This happens because nodes in Kubernetes are only marked as ready once the CNI is up. As there is no CNI defined, the boot process is pending and will reboot the node to retry after 10 minutes, this is expected behavior.
During this window you can install Cilium manually by running the following:
helm install cilium cilium/cilium \
--version 1.11.2 \
--namespace kube-system \
--set ipam.mode=kubernetes
Or if you want to deploy Cilium in strict mode without kube-proxy, also set some extra paramaters:
export KUBERNETES_API_SERVER_ADDRESS=<>
export KUBERNETES_API_SERVER_PORT=6443
helm install cilium cilium/cilium \
--version 1.11.2 \
--namespace kube-system \
--set ipam.mode=kubernetes \
--set kubeProxyReplacement=strict \
--set k8sServiceHost="${KUBERNETES_API_SERVER_ADDRESS}" \
--set k8sServicePort="${KUBERNETES_API_SERVER_PORT}"
After Cilium is installed the boot process should continue and complete successfully.
Method 2: Helm manifests install
Instead of directly installing Cilium you can instead first generate the manifest and then apply it:
helm template cilium cilium/cilium \
--version 1.11.2 \
--namespace kube-system \
--set ipam.mode=kubernetes > cilium.yaml
kubectl apply -f cilium.yaml
Without kube-proxy:
export KUBERNETES_API_SERVER_ADDRESS=<>
export KUBERNETES_API_SERVER_PORT=6443
helm template cilium cilium/cilium \
--version 1.11.2 \
--namespace kube-system \
--set ipam.mode=kubernetes \
--set kubeProxyReplacement=strict \
--set k8sServiceHost="${KUBERNETES_API_SERVER_ADDRESS}" \
--set k8sServicePort="${KUBERNETES_API_SERVER_PORT}" > cilium.yaml
kubectl apply -f cilium.yaml
Method 3: Helm manifests hosted install
After generating cilium.yaml
using helm template
, instead of applying this manifest directly during the Talos boot window (before the reboot timeout).
You can also host this file somewhere and patch the machine config to apply this manifest automatically during bootstrap.
To do this patch your machine configuration to include this config instead of the above:
talosctl gen config \
my-cluster https://mycluster.local:6443 \
--config-patch '[{"op":"add", "path": "/cluster/network/cni", "value": {"name": "custom", "urls": ["https://server.yourdomain.tld/some/path/cilium.yaml"]}}]'
Resulting in a config that look like this:
name: custom # Name of CNI to use.
# URLs containing manifests to apply for the CNI.
urls:
- https://server.yourdomain.tld/some/path/cilium.yaml
However, beware of the fact that the helm generated Cilium manifest contains sensitive key material. As such you should definitely not host this somewhere publicly accessible.
Method 4: Helm manifests inline install
A more secure option would be to include the helm template
output manifest inside the machine configuration.
The machine config should be generated with CNI set to none
talosctl gen config \
my-cluster https://mycluster.local:6443 \
--config-patch '[{"op":"add", "path": "/cluster/network/cni", "value": {"name": "none"}}]'
if deploying Cilium with kube-proxy
disabled, you can also include the following:
talosctl gen config \
my-cluster https://mycluster.local:6443 \
--config-patch '[{"op": "add", "path": "/cluster/proxy", "value": {"disabled": true}}, {"op":"add", "path": "/cluster/network/cni", "value": {"name": "none"}}]'
To do so patch this into your machine configuration:
cluster:
inlineManifests:
- name: cilium
contents: |
--
# Source: cilium/templates/cilium-agent/serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: "cilium"
namespace: kube-system
---
# Source: cilium/templates/cilium-operator/serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
-> Your cilium.yaml file will be pretty long....
This will install the Cilium manifests at just the right time during bootstrap.
Beware though:
- Changing the namespace when templating with Helm does not generate a manifest containing the yaml to create that namespace. As the inline manifest is processed from top to bottom make sure to manually put the namespace yaml at the start of the inline manifest.
- Only add the Cilium inline manifest to the control plane nodes machine configuration.
- Make sure all control plane nodes have an identical configuration.
- If you delete any of the generated resources they will be restored whenever a control plane node reboots.
- As a safety measure, Talos only creates missing resources from inline manifests, it never deletes or updates anything.
- If you need to update a manifest make sure to first edit all control plane machine configurations and then run
talosctl upgrade-k8s
as it will take care of updating inline manifests.
Known issues
Currently there is an interaction between a Kubespan enabled Talos cluster and Cilium that results in the cluster going down during bootstrap after applying the Cilium manifests. For more details: Kubespan and Cilium compatiblity: etcd is failing
There are some gotchas when using Talos and Cilium on the Google cloud platform when using internal load balancers. For more details: GCP ILB support / support scope local routes to be configured
Some kernel values changed by kube-proxy are not set to good defaults when running the cilium kernel-proxy alternative. For more details: Kernel default values (sysctl)
Other things to know
Talos has full kernel module support for eBPF, See:
Talos also includes the modules:
CONFIG_NETFILTER_XT_TARGET_TPROXY=m
CONFIG_NETFILTER_XT_TARGET_CT=m
CONFIG_NETFILTER_XT_MATCH_MARK=m
CONFIG_NETFILTER_XT_MATCH_SOCKET=m
This allows you to set
--set enableXTSocketFallback=false
on the helm install/template command preventing Cilium from disabling theip_early_demux
kernel feature. This will win back some performance.
2.2 - KubeSpan
KubeSpan is a feature of Talos that automates the setup and maintenance of a full mesh WireGuard network for your cluster, giving you the ability to operate hybrid Kubernetes clusters that can span the edge, datacenter, and cloud. Management of keys and discovery of peers can be completely automated for a zero-touch experience that makes it simple and easy to create hybrid clusters.
KubeSpan consists of client code in Talos Linux, as well as a discovery service that enables clients to securely find each other. Sidero Labs operates a free Discovery Service, but the discovery service may be operated by your organization and can be downloaded here.
Video Walkthrough
To learn more about KubeSpan, see the video below:
To see a live demo of KubeSpan, see one the videos below:
Enabling
Creating a New Cluster
To generate configuration files for a new cluster, we can use the --with-kubespan
flag in talosctl gen config
.
This will enable peer discovery and KubeSpan.
...
# Provides machine specific network configuration options.
network:
# Configures KubeSpan feature.
kubespan:
enabled: true # Enable the KubeSpan feature.
...
# Configures cluster member discovery.
discovery:
enabled: true # Enable the cluster membership discovery feature.
# Configure registries used for cluster member discovery.
registries:
# Kubernetes registry uses Kubernetes API server to discover cluster members and stores additional information
kubernetes: {}
# Service registry is using an external service to push and pull information about cluster members.
service: {}
...
# Provides cluster specific configuration options.
cluster:
id: yui150Ogam0pdQoNZS2lZR-ihi8EWxNM17bZPktJKKE= # Globally unique identifier for this cluster.
secret: dAmFcyNmDXusqnTSkPJrsgLJ38W8oEEXGZKM0x6Orpc= # Shared secret of cluster.
The default discovery service is an external service hosted for free by Sidero Labs. The default value is
https://discovery.talos.dev/
. Contact Sidero Labs if you need to run this service privately.
Upgrading an Existing Cluster
In order to enable KubeSpan for an existing cluster, upgrade to the latest version of Talos (v1.0.6). Once your cluster is upgraded, the configuration of each node must contain the globally unique identifier, the shared secret for the cluster, and have KubeSpan and discovery enabled.
Note: Discovery can be used without KubeSpan, but KubeSpan requires at least one discovery registry.
Talos v0.11 or Less
If you are migrating from Talos v0.11 or less, we need to generate a cluster ID and secret.
To generate an id
:
$ openssl rand -base64 32
EUsCYz+oHNuBppS51P9aKSIOyYvIPmbZK944PWgiyMQ=
To generate a secret
:
$ openssl rand -base64 32
AbdsWjY9i797kGglghKvtGdxCsdllX9CemLq+WGVeaw=
Now, update the configuration of each node with the cluster with the generated id
and secret
.
You should end up with the addition of something like this (your id
and secret
should be different):
cluster:
id: EUsCYz+oHNuBppS51P9aKSIOyYvIPmbZK944PWgiyMQ=
secret: AbdsWjY9i797kGglghKvtGdxCsdllX9CemLq+WGVeaw=
Note: This can be applied in immediate mode (no reboot required).
Talos v0.12 or More
Enable kubespan
and discovery
.
machine:
network:
kubespan:
enabled: true
cluster:
discovery:
enabled: true
Resource Definitions
KubeSpanIdentities
A node’s WireGuard identities can be obtained with:
$ talosctl get kubespanidentities -o yaml
...
spec:
address: fd83:b1f7:fcb5:2802:8c13:71ff:feaf:7c94/128
subnet: fd83:b1f7:fcb5:2802::/64
privateKey: gNoasoKOJzl+/B+uXhvsBVxv81OcVLrlcmQ5jQwZO08=
publicKey: NzW8oeIH5rJyY5lefD9WRoHWWRr/Q6DwsDjMX+xKjT4=
Talos automatically configures unique IPv6 address for each node in the cluster-specific IPv6 ULA prefix.
Wireguard private key is generated for the node, private key never leaves the node while public key is published through the cluster discovery.
KubeSpanIdentity
is persisted across reboots and upgrades in STATE partition in the file kubespan-identity.yaml
.
KubeSpanPeerSpecs
A node’s WireGuard peers can be obtained with:
$ talosctl get kubespanpeerspecs
ID VERSION LABEL ENDPOINTS
06D9QQOydzKrOL7oeLiqHy9OWE8KtmJzZII2A5/FLFI= 2 talos-default-master-2 ["172.20.0.3:51820"]
THtfKtfNnzJs1nMQKs5IXqK0DFXmM//0WMY+NnaZrhU= 2 talos-default-master-3 ["172.20.0.4:51820"]
nVHu7l13uZyk0AaI1WuzL2/48iG8af4WRv+LWmAax1M= 2 talos-default-worker-2 ["172.20.0.6:51820"]
zXP0QeqRo+CBgDH1uOBiQ8tA+AKEQP9hWkqmkE/oDlc= 2 talos-default-worker-1 ["172.20.0.5:51820"]
The peer ID is the Wireguard public key.
KubeSpanPeerSpecs
are built from the cluster discovery data.
KubeSpanPeerStatuses
The status of a node’s WireGuard peers can be obtained with:
$ talosctl get kubespanpeerstatuses
ID VERSION LABEL ENDPOINT STATE RX TX
06D9QQOydzKrOL7oeLiqHy9OWE8KtmJzZII2A5/FLFI= 63 talos-default-master-2 172.20.0.3:51820 up 15043220 17869488
THtfKtfNnzJs1nMQKs5IXqK0DFXmM//0WMY+NnaZrhU= 62 talos-default-master-3 172.20.0.4:51820 up 14573208 18157680
nVHu7l13uZyk0AaI1WuzL2/48iG8af4WRv+LWmAax1M= 60 talos-default-worker-2 172.20.0.6:51820 up 130072 46888
zXP0QeqRo+CBgDH1uOBiQ8tA+AKEQP9hWkqmkE/oDlc= 60 talos-default-worker-1 172.20.0.5:51820 up 130044 46556
KubeSpan peer status includes following information:
- the actual endpoint used for peer communication
- link state:
unknown
: the endpoint was just changed, link state is not known yetup
: there is a recent handshake from the peerdown
: there is no handshake from the peer
- number of bytes sent/received over the Wireguard link with the peer
If the connection state goes down
, Talos will be cycling through the available endpoints until it finds the one which works.
Peer status information is updated every 30 seconds.
KubeSpanEndpoints
A node’s WireGuard endpoints (peer addresses) can be obtained with:
$ talosctl get kubespanendpoints
ID VERSION ENDPOINT AFFILIATE ID
06D9QQOydzKrOL7oeLiqHy9OWE8KtmJzZII2A5/FLFI= 1 172.20.0.3:51820 2VfX3nu67ZtZPl57IdJrU87BMjVWkSBJiL9ulP9TCnF
THtfKtfNnzJs1nMQKs5IXqK0DFXmM//0WMY+NnaZrhU= 1 172.20.0.4:51820 b3DebkPaCRLTLLWaeRF1ejGaR0lK3m79jRJcPn0mfA6C
nVHu7l13uZyk0AaI1WuzL2/48iG8af4WRv+LWmAax1M= 1 172.20.0.6:51820 NVtfu1bT1QjhNq5xJFUZl8f8I8LOCnnpGrZfPpdN9WlB
zXP0QeqRo+CBgDH1uOBiQ8tA+AKEQP9hWkqmkE/oDlc= 1 172.20.0.5:51820 6EVq8RHIne03LeZiJ60WsJcoQOtttw1ejvTS6SOBzhUA
The endpoint ID is the base64 encoded WireGuard public key.
The observed endpoints are submitted back to the discovery service (if enabled) so that other peers can try additional endpoints to establish the connection.
3 - Upgrading Kubernetes
This guide covers upgrading Kubernetes on Talos Linux clusters. For upgrading the Talos Linux operating system, see Upgrading Talos
Video Walkthrough
To see a demo of this process, watch this video:
Automated Kubernetes Upgrade
The recommended method to upgrade Kubernetes is to use the talosctl upgrade-k8s
command.
This will automatically update the components needed to upgrade Kubernetes safely.
Upgrading Kubernetes is non-disruptive to the cluster workloads.
To trigger a Kubernetes upgrade, issue a command specifiying the version of Kubernetes to ugprade to, such as:
talosctl --nodes <master node> upgrade-k8s --to 1.23.5
Note that the --nodes
parameter specifies the control plane node to send the API call to, but all members of the cluster will be upgraded.
To check what will be upgraded you can run talosctl upgrade-k8s
with the --dry-run
flag:
$ talosctl --nodes <master node> upgrade-k8s --to 1.23.5 --dry-run
WARNING: found resources which are going to be deprecated/migrated in the version 1.23.5
RESOURCE COUNT
validatingwebhookconfigurations.v1beta1.admissionregistration.k8s.io 4
mutatingwebhookconfigurations.v1beta1.admissionregistration.k8s.io 3
customresourcedefinitions.v1beta1.apiextensions.k8s.io 25
apiservices.v1beta1.apiregistration.k8s.io 54
leases.v1beta1.coordination.k8s.io 4
automatically detected the lowest Kubernetes version 1.23.1
checking for resource APIs to be deprecated in version 1.23.5
discovered master nodes ["172.20.0.2" "172.20.0.3" "172.20.0.4"]
discovered worker nodes ["172.20.0.5" "172.20.0.6"]
updating "kube-apiserver" to version "1.23.5"
> "172.20.0.2": starting update
> update kube-apiserver: v1.23.1 -> 1.23.5
> skipped in dry-run
> "172.20.0.3": starting update
> update kube-apiserver: v1.23.1 -> 1.23.5
> skipped in dry-run
> "172.20.0.4": starting update
> update kube-apiserver: v1.23.1 -> 1.23.5
> skipped in dry-run
updating "kube-controller-manager" to version "1.23.5"
> "172.20.0.2": starting update
> update kube-controller-manager: v1.23.1 -> 1.23.5
> skipped in dry-run
> "172.20.0.3": starting update
<snip>
updating manifests
> apply manifest Secret bootstrap-token-3lb63t
> apply skipped in dry run
> apply manifest ClusterRoleBinding system-bootstrap-approve-node-client-csr
> apply skipped in dry run
<snip>
To upgrade Kubernetes from v1.23.1 to v1.23.5 run:
$ talosctl --nodes <master node> upgrade-k8s --to 1.23.5
automatically detected the lowest Kubernetes version 1.23.1
checking for resource APIs to be deprecated in version 1.23.5
discovered master nodes ["172.20.0.2" "172.20.0.3" "172.20.0.4"]
discovered worker nodes ["172.20.0.5" "172.20.0.6"]
updating "kube-apiserver" to version "1.23.5"
> "172.20.0.2": starting update
> update kube-apiserver: v1.23.1 -> 1.23.5
> "172.20.0.2": machine configuration patched
> "172.20.0.2": waiting for API server state pod update
< "172.20.0.2": successfully updated
> "172.20.0.3": starting update
> update kube-apiserver: v1.23.1 -> 1.23.5
<snip>
This command runs in several phases:
- Every control plane node machine configuration is patched with the new image version for each control plane component. Talos renders new static pod definitions on the configuration update which is picked up by the kubelet. The command waits for the change to propagate to the API server state.
- The command updates the
kube-proxy
daemonset with the new image version. - On every node in the cluster, the
kubelet
version is updated. The command then waits for thekubelet
service to be restarted and become healthy. The update is verified by checking theNode
resource state. - Kubernetes bootstrap manifests are re-applied to the cluster.
Updated bootstrap manifests might come with a new Talos version (e.g. CoreDNS version update), or might be the result of machine configuration change.
Note: The
upgrade-k8s
command never deletes any resources from the cluster: they should be deleted manually.
If the command fails for any reason, it can be safely restarted to continue the upgrade process from the moment of the failure.
Manual Kubernetes Upgrade
Kubernetes can be upgraded manually by following the steps outlined below.
They are equivalent to the steps performed by the talosctl upgrade-k8s
command.
Kubeconfig
In order to edit the control plane, you need a working kubectl
config.
If you don’t already have one, you can get one by running:
talosctl --nodes <master node> kubeconfig
API Server
Patch machine configuration using talosctl patch
command:
$ talosctl -n <CONTROL_PLANE_IP_1> patch mc --mode=no-reboot -p '[{"op": "replace", "path": "/cluster/apiServer/image", "value": "k8s.gcr.io/kube-apiserver:v1.23.5"}]'
patched mc at the node 172.20.0.2
The JSON patch might need to be adjusted if current machine configuration is missing .cluster.apiServer.image
key.
Also the machine configuration can be edited manually with talosctl -n <IP> edit mc --mode=no-reboot
.
Capture the new version of kube-apiserver
config with:
$ talosctl -n <CONTROL_PLANE_IP_1> get kcpc kube-apiserver -o yaml
node: 172.20.0.2
metadata:
namespace: config
type: KubernetesControlPlaneConfigs.config.talos.dev
id: kube-apiserver
version: 5
phase: running
spec:
image: k8s.gcr.io/kube-apiserver:v1.23.5
cloudProvider: ""
controlPlaneEndpoint: https://172.20.0.1:6443
etcdServers:
- https://127.0.0.1:2379
localPort: 6443
serviceCIDR: 10.96.0.0/12
extraArgs: {}
extraVolumes: []
In this example, the new version is 5
.
Wait for the new pod definition to propagate to the API server state (replace talos-default-master-1
with the node name):
$ kubectl get pod -n kube-system -l k8s-app=kube-apiserver --field-selector spec.nodeName=talos-default-master-1 -o jsonpath='{.items[0].metadata.annotations.talos\.dev/config\-version}'
5
Check that the pod is running:
$ kubectl get pod -n kube-system -l k8s-app=kube-apiserver --field-selector spec.nodeName=talos-default-master-1
NAME READY STATUS RESTARTS AGE
kube-apiserver-talos-default-master-1 1/1 Running 0 16m
Repeat this process for every control plane node, verifying that state got propagated successfully between each node update.
Controller Manager
Patch machine configuration using talosctl patch
command:
$ talosctl -n <CONTROL_PLANE_IP_1> patch mc --mode=no-reboot -p '[{"op": "replace", "path": "/cluster/controllerManager/image", "value": "k8s.gcr.io/kube-controller-manager:v1.23.5"}]'
patched mc at the node 172.20.0.2
The JSON patch might need be adjusted if current machine configuration is missing .cluster.controllerManager.image
key.
Capture new version of kube-controller-manager
config with:
$ talosctl -n <CONTROL_PLANE_IP_1> get kcpc kube-controller-manager -o yaml
node: 172.20.0.2
metadata:
namespace: config
type: KubernetesControlPlaneConfigs.config.talos.dev
id: kube-controller-manager
version: 3
phase: running
spec:
image: k8s.gcr.io/kube-controller-manager:v1.23.5
cloudProvider: ""
podCIDR: 10.244.0.0/16
serviceCIDR: 10.96.0.0/12
extraArgs: {}
extraVolumes: []
In this example, new version is 3
.
Wait for the new pod definition to propagate to the API server state (replace talos-default-master-1
with the node name):
$ kubectl get pod -n kube-system -l k8s-app=kube-controller-manager --field-selector spec.nodeName=talos-default-master-1 -o jsonpath='{.items[0].metadata.annotations.talos\.dev/config\-version}'
3
Check that the pod is running:
$ kubectl get pod -n kube-system -l k8s-app=kube-controller-manager --field-selector spec.nodeName=talos-default-master-1
NAME READY STATUS RESTARTS AGE
kube-controller-manager-talos-default-master-1 1/1 Running 0 35m
Repeat this process for every control plane node, verifying that state propagated successfully between each node update.
Scheduler
Patch machine configuration using talosctl patch
command:
$ talosctl -n <CONTROL_PLANE_IP_1> patch mc --mode=no-reboot -p '[{"op": "replace", "path": "/cluster/scheduler/image", "value": "k8s.gcr.io/kube-scheduler:v1.23.5"}]'
patched mc at the node 172.20.0.2
JSON patch might need be adjusted if current machine configuration is missing .cluster.scheduler.image
key.
Capture new version of kube-scheduler
config with:
$ talosctl -n <CONTROL_PLANE_IP_1> get kcpc kube-scheduler -o yaml
node: 172.20.0.2
metadata:
namespace: config
type: KubernetesControlPlaneConfigs.config.talos.dev
id: kube-scheduler
version: 3
phase: running
spec:
image: k8s.gcr.io/kube-scheduler:v1.23.5
extraArgs: {}
extraVolumes: []
In this example, new version is 3
.
Wait for the new pod definition to propagate to the API server state (replace talos-default-master-1
with the node name):
$ kubectl get pod -n kube-system -l k8s-app=kube-scheduler --field-selector spec.nodeName=talos-default-master-1 -o jsonpath='{.items[0].metadata.annotations.talos\.dev/config\-version}'
3
Check that the pod is running:
$ kubectl get pod -n kube-system -l k8s-app=kube-scheduler --field-selector spec.nodeName=talos-default-master-1
NAME READY STATUS RESTARTS AGE
kube-scheduler-talos-default-master-1 1/1 Running 0 39m
Repeat this process for every control plane node, verifying that state got propagated successfully between each node update.
Proxy
In the proxy’s DaemonSet
, change:
kind: DaemonSet
...
spec:
...
template:
...
spec:
containers:
- name: kube-proxy
image: k8s.gcr.io/kube-proxy:v1.23.5
tolerations:
- ...
to:
kind: DaemonSet
...
spec:
...
template:
...
spec:
containers:
- name: kube-proxy
image: k8s.gcr.io/kube-proxy:v1.23.5
tolerations:
- ...
- key: node-role.kubernetes.io/control-plane
operator: Exists
effect: NoSchedule
To edit the DaemonSet
, run:
kubectl edit daemonsets -n kube-system kube-proxy
Bootstrap Manifests
Bootstrap manifests can be retrieved in a format which works for kubectl
with the following command:
talosctl -n <master IP> get manifests -o yaml | yq eval-all '.spec | .[] | splitDoc' - > manifests.yaml
Diff the manifests with the cluster:
kubectl diff -f manifests.yaml
Apply the manifests:
kubectl apply -f manifests.yaml
Note: if some boostrap resources were removed, they have to be removed from the cluster manually.
kubelet
For every node, patch machine configuration with new kubelet version, wait for the kubelet to restart with new version:
$ talosctl -n <IP> patch mc --mode=no-reboot -p '[{"op": "replace", "path": "/machine/kubelet/image", "value": "ghcr.io/siderolabs/kubelet:v1.23.5"}]'
patched mc at the node 172.20.0.2
Once kubelet
restarts with the new configuration, confirm upgrade with kubectl get nodes <name>
:
$ kubectl get nodes talos-default-master-1
NAME STATUS ROLES AGE VERSION
talos-default-master-1 Ready control-plane,master 123m v1.23.5