If you’re interested in this project and would like to help in engineering efforts, or have general usage questions, we are happy to have you!
We hold a weekly meeting that all audiences are welcome to attend.
We would appreciate your feedback so that we can make Talos even better!
To do so, you can take our survey.
You can subscribe to this meeting by joining the community forum above.
Note: You can convert the meeting hours to your local time.
Enterprise
If you are using Talos in a production setting, and need consulting services to get started or to integrate Talos into your existing environment, we can help.
Sidero Labs, Inc. offers support contracts with SLA (Service Level Agreement)-bound terms for mission-critical environments.
A quick introduction in to what Talos is and why it should be used.
Talos is a container optimized Linux distro; a reimagining of Linux for distributed systems such as Kubernetes.
Designed to be as minimal as possible while still maintaining practicality.
For these reasons, Talos has a number of features unique to it:
it is immutable
it is atomic
it is ephemeral
it is minimal
it is secure by default
it is managed via a single declarative configuration file and gRPC API
Talos can be deployed on container, cloud, virtualized, and bare metal platforms.
Why Talos
In having less, Talos offers more.
Security.
Efficiency.
Resiliency.
Consistency.
All of these areas are improved simply by having less.
1.2 - Quickstart
A short guide on setting up a simple Talos Linux cluster locally with Docker.
Local Docker Cluster
The easiest way to try Talos is by using the CLI (talosctl) to create a cluster on a machine with docker installed.
Prerequisites
talosctl
Download talosctl (macOS or Linux):
brew install siderolabs/tap/talosctl
kubectl
Download kubectl via one of methods outlined in the documentation.
Create the Cluster
Now run the following:
talosctl cluster create
Note
If you are using Docker Desktop on a macOS computer, if you encounter the error: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running? you may need to manually create the link for the Docker socket:
sudo ln -s "$HOME/.docker/run/docker.sock" /var/run/docker.sock
You can explore using Talos API commands:
talosctl dashboard --nodes 10.5.0.2
Verify that you can reach Kubernetes:
kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
talos-default-controlplane-1 Ready master 115s v1.32.0 10.5.0.2 <none> Talos (v1.9.0) <host kernel> containerd://1.5.5
talos-default-worker-1 Ready <none> 115s v1.32.0 10.5.0.3 <none> Talos (v1.9.0) <host kernel> containerd://1.5.5
Destroy the Cluster
When you are all done, remove the cluster:
talosctl cluster destroy
1.3 - Getting Started
A guide to setting up a Talos Linux cluster.
This document will walk you through installing a simple Talos Cluster with a single control plane node and one or more worker nodes, explaining some of the concepts.
If this is your first use of Talos Linux, we recommend the Quickstart first, to quickly create a local virtual cluster in containers on your workstation.
For a production cluster, extra steps are needed - see Production Notes.
Regardless of where you run Talos, the steps to create a Kubernetes cluster are:
boot machines off the Talos Linux image
define the endpoint for the Kubernetes API and generate your machine configurations
configure Talos Linux by applying machine configurations to the machines
configure talosctl
bootstrap Kubernetes
Prerequisites
talosctl
talosctl is a CLI tool which interfaces with the Talos API.
Talos Linux has no SSH access: talosctl is the tool you use to interact with the operating system on the machines.
Note: If you boot systems off the ISO, Talos on the ISO image runs in RAM and acts as an installer.
The version of talosctl that is used to create the machine configurations controls the version of Talos Linux that is installed on the machines - NOT the image that the machines are initially booted off.
For example, booting a machine off the Talos 1.3.7 ISO, but creating the initial configuration with talosctl binary of version 1.4.1, will result in a machine running Talos Linux version 1.4.1.
It is advisable to use the same version of talosctl as the version of the boot media used.
Network access
This guide assumes that the systems being installed have outgoing access to the internet, allowing them to pull installer and container images, query NTP, etc.
If needed, see the documentation on registry proxies, local registries, and airgapped installation.
Acquire the Talos Linux image and boot machines
The most general way to install Talos Linux is to use the ISO image.
The latest ISO image can be found on the Github Releases page:
When booted from the ISO, Talos will run in RAM and will not install to disk until provided a configuration.
Thus, it is safe to boot any machine from the ISO.
At this point, you should:
boot one machine off the ISO to be the control plane node
boot one or more machines off the same ISO to be the workers
Alternative Booting
For network booting and self-built media, see Production Notes.
There are installation methods specific to specific platforms, such as pre-built AMIs for AWS - check the specific Installation Guides.)
Define the Kubernetes Endpoint
In order to configure Kubernetes, Talos needs to know
what the endpoint of the Kubernetes API Server will be.
Because we are only creating a single control plane node in this guide, we can use the control plane node directly as the Kubernetes API endpoint.
Identify the IP address or DNS name of the control plane node that was booted above, and convert it to a fully-qualified HTTPS URL endpoint address for the Kubernetes API Server which (by default) runs on port 6443.
The endpoint should be formatted like:
https://192.168.0.2:6443
https://kube.mycluster.mydomain.com:6443
NOTE: For a production cluster, you should have three control plane nodes, and have the endpoint allocate traffic to all three - see Production Notes.
Configure Talos Linux
When Talos boots without a configuration, such as when booting off the Talos ISO, it
enters maintenance mode and waits for a configuration to be provided.
A configuration can be passed in on boot via kernel parameters or metadata servers.
See Production Notes.
Unlike traditional Linux, Talos Linux is not configured by SSHing to the server and issuing commands.
Instead, the entire state of the machine is defined by a machine config file which is passed to the server.
This allows machines to be managed in a declarative way, and lends itself to GitOps and modern operations paradigms.
The state of a machine is completely defined by, and can be reproduced from, the machine configuration file.
To generate the machine configurations for a cluster, run this command on the workstation where you installed talosctl:
talosctl gen config <cluster-name> <cluster-endpoint>
cluster-name is an arbitrary name, used as a label in your local client configuration.
It should be unique in the configuration on your local workstation.
cluster-endpoint is the Kubernetes Endpoint you constructed from the control plane node’s IP address or DNS name above.
It should be a complete URL, with https://
and port.
For example:
$ talosctl gen config mycluster https://192.168.0.2:6443
generating PKI and tokens
created /Users/taloswork/controlplane.yaml
created /Users/taloswork/worker.yaml
created /Users/taloswork/talosconfig
When you run this command, three files are created in your current
directory:
controlplane.yaml
worker.yaml
talosconfig
The .yaml files are Machine Configs: they describe everything from what disk Talos should be installed on, to network settings.
The controlplane.yaml file also describes how Talos should form a Kubernetes cluster.
The talosconfig file is your local client configuration file, used to connect to and authenticate access to the cluster.
Controlplane and Worker
The two types of Machine Configs correspond to the two roles of Talos nodes, control plane nodes (which run both the Talos and Kubernetes control planes) and worker nodes (which run the workloads).
The main difference between Controlplane Machine Config files and Worker Machine Config files is that the former contains information about how to form the
Kubernetes cluster.
Modifying the Machine configs
The generated Machine Configs have defaults that work for most cases.
They use DHCP for interface configuration, and install to /dev/sda.
Sometimes, you will need to modify the generated files to work with your systems.
A common case is needing to change the installation disk.
If you try to to apply the machine config to a node, and get an error like the below, you need to specify a different installation disk:
$ talosctl apply-config --insecure -n 192.168.0.2 --file controlplane.yaml
error applying new configuration: rpc error: code= InvalidArgument desc= configuration validation failed: 1 error occurred:
* specified install disk does not exist: "/dev/sda"
You can verify which disks your nodes have by using the talosctl get disks --insecure command.
Insecure mode is needed at this point as the PKI infrastructure has not yet been set up.
For example, the talosctl get disks command below shows that the system has a vda drive, not an sda:
$ talosctl -n 192.168.0.2 get disks --insecure
DEV MODEL SERIAL TYPE UUID WWID MODALIAS NAME SIZE BUS_PATH
/dev/vda - - HDD - - virtio:d00000002v00001AF4 - 69 GB /pci0000:00/0000:00:06.0/virtio2/
In this case, you would modify the controlplane.yaml and worker.yaml files and edit the line:
install:
disk: /dev/sda # The disk used for installations.
to reflect vda instead of sda.
For information on customizing your machine configurations (such as to specify the version of Kubernetes), using machine configuration patches, or customizing configurations for individual machines (such as setting static IP addresses), see the Production Notes.
Accessing the Talos API
Administrative tasks are performed by calling the Talos API (usually with talosctl) on Talos Linux control plane nodes, who may forward the requests to other nodes.
Thus:
ensure your control plane node is directly reachable on TCP port 50000 from the workstation where you run the talosctl client.
until a node is a member of the cluster, it does not have the PKI infrastructure set up, and so will not accept API requests that are proxied through a control plane node.
Thus you will need direct access to the worker nodes on port 50000 from the workstation where you run talosctl in order to apply the initial configuration.
Once the cluster is established, you will no longer need port 50000 access to the workers.
(You can avoid requiring such access by passing in the initial configuration in one of other methods, such as by cloud userdata or via talos.config= kernel argument on a metal platform)
This may require changing firewall rules or cloud provider access-lists.
Understand how talosctl treats endpoints and nodes
In short: endpoints are where talosctlsends commands to, but the command operates on the specified nodes.
The endpoint will forward the command to the nodes, if needed.
Endpoints
Endpoints are the IP addresses of control plane nodes, to which the talosctl client directly talks.
Endpoints automatically proxy requests destined to another node in the cluster.
This means that you only need access to the control plane nodes in order to manage the rest of the cluster.
You can pass in --endpoints <Control Plane IP Address> or -e <Control Plane IP Address> to the current talosctl command.
In this tutorial setup, the endpoint will always be the single control plane node.
Nodes
Nodes are the target(s) you wish to perform the operation on.
When specifying nodes, the IPs and/or hostnames are as seen by the endpoint servers, not as from the client.
This is because all connections are proxied through the endpoints.
You may provide -n or --nodes to any talosctl command to supply the node or (comma-separated) nodes on which you wish to perform the operation.
For example, to see the containers running on node 192.168.0.200, by routing the containers command through the control plane endpoint 192.168.0.2:
For a more in-depth discussion of Endpoints and Nodes, please see talosctl.
Apply Configuration
To apply the Machine Configs, you need to know the machines’ IP addresses.
Talos prints the IP addresses of the machines on the console during the boot process:
[4.605369] [talos] task loadConfig (1/1): this machine is reachable at:
[4.607358] [talos] task loadConfig (1/1): 192.168.0.2
If you do not have console access, the IP address may also be discoverable from your DHCP server.
Once you have the IP address, you can then apply the correct configuration.
Apply the controlplane.yaml file to the control plane node, and the worker.yaml file to all the worker node(s).
The --insecure flag is necessary because the PKI infrastructure has not yet been made available to the node.
Note: the connection will be encrypted, but not authenticated.
When using the --insecure flag, you cannot specify an endpoint, and must directly access the node on port 50000.
Default talosconfig configuration file
You reference which configuration file to use by the --talosconfig parameter:
talosctl --talosconfig=./talosconfig \
--nodes 192.168.0.2 -e 192.168.0.2 version
Note that talosctl comes with tooling to help you integrate and merge this configuration into the default talosctl configuration file.
See Production Notes for more information.
While getting started, a common mistake is referencing a configuration context for a different cluster, resulting in authentication or connection failures.
Thus it is recommended to explicitly pass in the configuration file while becoming familiar with Talos Linux.
Kubernetes Bootstrap
Bootstrapping your Kubernetes cluster with Talos is as simple as calling talosctl bootstrap on your control plane node:
The bootstrap operation should only be called ONCE on a SINGLE control plane node.
(If you have multiple control plane nodes, it doesn’t matter which one you issue the bootstrap command against.)
At this point, Talos will form an etcd cluster, and start the Kubernetes control plane components.
After a few moments, you will be able to download your Kubernetes client configuration and get started:
Note that to use alternate booting, there are a number of required kernel parameters.
Please see the kernel docs for more information.
Control plane nodes
For a production, highly available Kubernetes cluster, it is recommended to use three control plane nodes.
Using five nodes can provide greater fault tolerance, but imposes more replication overhead and can result in worse performance.
Boot all three control plane nodes at this point.
They will boot Talos Linux, and come up in maintenance mode, awaiting a configuration.
Decide the Kubernetes Endpoint
The Kubernetes API Server endpoint, in order to be highly available, should be configured in a way that uses all available control plane nodes.
There are three common ways to do this: using a load-balancer, using Talos Linux’s built in VIP functionality, or using multiple DNS records.
Dedicated Load-balancer
If you are using a cloud provider or have your own load-balancer
(such as HAProxy, Nginx reverse proxy, or an F5 load-balancer), a dedicated load balancer is a natural choice.
Create an appropriate frontend for the endpoint, listening on TCP port 6443, and point the backends at the addresses of each of the Talos control plane nodes.
Your Kubernetes endpoint will be the IP address or DNS name of the load balancer front end, with the port appended (e.g. https://myK8s.mydomain.io:6443).
Note: an HTTP load balancer can’t be used, as Kubernetes API server does TLS termination and mutual TLS authentication.
Layer 2 VIP Shared IP
Talos has integrated support for serving Kubernetes from a shared/virtual IP address.
This requires Layer 2 connectivity between control plane nodes.
Choose an unused IP address on the same subnet as the control plane nodes for the VIP.
For instance, if your control plane node IPs are:
192.168.0.10
192.168.0.11
192.168.0.12
you could choose the IP 192.168.0.15 as your VIP IP address.
(Make sure that 192.168.0.15 is not used by any other machine and is excluded from DHCP ranges.)
Once chosen, form the full HTTPS URL from this IP:
https://192.168.0.15:6443
If you create a DNS record for this IP, note you will need to use the IP address itself, not the DNS name, to configure the shared IP (machine.network.interfaces[].vip.ip) in the Talos configuration.
After the machine configurations are generated, you will want to edit the controlplane.yaml file to activate the VIP:
For more information about using a shared IP, see the related
Guide
DNS records
Add multiple A or AAAA records (one for each control plane node) to a DNS name.
For instance, you could add:
kube.cluster1.mydomain.com IN A 192.168.0.10
kube.cluster1.mydomain.com IN A 192.168.0.11
kube.cluster1.mydomain.com IN A 192.168.0.12
where the IP addresses are those of the control plane nodes.
Then, your endpoint would be:
https://kube.cluster1.mydomain.com:6443
Multihoming
If your machines are multihomed, i.e., they have more than one IPv4 and/or IPv6 addresses other than loopback, then additional configuration is required.
A point to note is that the machines may become multihomed via privileged workloads.
Multihoming and etcd
The etcd cluster needs to establish a mesh of connections among the members.
It is done using the so-called advertised address - each node learns the others’ addresses as they are advertised.
It is crucial that these IP addresses are stable, i.e., that each node always advertises the same IP address.
Moreover, it is beneficial to control them to establish the correct routes between the members and, e.g., avoid congested paths.
In Talos, these addresses are controlled using the cluster.etcd.advertisedSubnets configuration key.
Multihoming and kubelets
Stable IP addressing for kubelets (i.e., nodeIP) is not strictly necessary but highly recommended as it ensures that, e.g., kube-proxy and CNI routing take the desired routes.
Analogously to etcd, for kubelets this is controlled via machine.kubelet.nodeIP.validSubnets.
Example
Let’s assume that we have a cluster with two networks:
public network
private network 192.168.0.0/16
We want to use the private network for etcd and kubelet communication:
machine:
kubelet:
nodeIP:
validSubnets:
- 192.168.0.0/16
#...cluster:
etcd:
advertisedSubnets: # listenSubnets defaults to advertisedSubnets if not set explicitly - 192.168.0.0/16
This way we ensure that the etcd cluster will use the private network for communication and the kubelets will use the private network for communication with the control plane.
Load balancing the Talos API
The talosctl tool provides built-in client-side load-balancing across control plane nodes, so usually you do not need to configure a load balancer for the Talos API.
However, if the control plane nodes are not directly reachable from the workstation where you run talosctl, then configure a load balancer to forward TCP port 50000 to the control plane nodes.
Note: Because the Talos Linux API uses gRPC and mutual TLS, it cannot be proxied by a HTTP/S proxy, but only by a TCP load balancer.
If you create a load balancer to forward the Talos API calls, the load balancer IP or hostname will be used as the endpoint for talosctl.
Add the load balancer IP or hostname to the .machine.certSANs field of the machine configuration file.
Do not use Talos Linux’s built in VIP function for accessing the Talos API.
In the event of an error in etcd, the VIP will not function, and you will not be able to access the Talos API to recover.
Configure Talos
In many installation methods, a configuration can be passed in on boot.
For example, Talos can be booted with the talos.config kernel
argument set to an HTTP(s) URL from which it should receive its
configuration.
Where a PXE server is available, this is much more efficient than
manually configuring each node.
If you do use this method, note that Talos requires a number of other
kernel commandline parameters.
See required kernel parameters.
Similarly, if creating EC2 kubernetes clusters, the configuration file can be passed in as --user-data to the aws ec2 run-instances command.
See generally the Installation Guide for the platform being deployed.
Separating out secrets
When generating the configuration files for a Talos Linux cluster, it is recommended to start with generating a secrets bundle which should be saved in a secure location.
This bundle can be used to generate machine or client configurations at any time:
talosctl gen secrets -o secrets.yaml
The secrets.yaml can also be extracted from the existing controlplane machine configuration with
talosctl gen secrets --from-controlplane-config controlplane.yaml -o secrets.yaml command.
Now, we can generate the machine configuration for each node:
talosctl gen config --with-secrets secrets.yaml <cluster-name> <cluster-endpoint>
Here, cluster-name is an arbitrary name for the cluster, used
in your local client configuration as a label.
It should be unique in the configuration on your local workstation.
The cluster-endpoint is the Kubernetes Endpoint you
selected from above.
This is the Kubernetes API URL, and it should be a complete URL, with https://
and port.
(The default port is 6443, but you may have configured your load balancer to forward a different port.)
For example:
$ talosctl gen config --with-secrets secrets.yaml my-cluster https://192.168.64.15:6443
generating PKI and tokens
created controlplane.yaml
created worker.yaml
created talosconfig
Customizing Machine Configuration
The generated machine configuration provides sane defaults for most cases, but can be modified to fit specific needs.
Some machine configuration options are available as flags for the talosctl gen config command,
for example setting a specific Kubernetes version:
talosctl gen config --with-secrets secrets.yaml --kubernetes-version 1.25.4 my-cluster https://192.168.64.15:6443
Other modifications are done with machine configuration patches.
Machine configuration patches can be applied with talosctl gen config command:
talosctl gen config --with-secrets secrets.yaml --config-patch-control-plane @cni.patch my-cluster https://192.168.64.15:6443
Note: @cni.patch means that the patch is read from a file named cni.patch.
Machine Configs as Templates
Individual machines may need different settings: for instance, each may have a
different static IP address.
When different files are needed for machines of the same type, there are two supported flows:
Use the talosctl gen config command to generate a template, and then patch
the template for each machine with talosctl machineconfig patch.
Generate each machine configuration file separately with talosctl gen config while applying patches.
For example, given a machine configuration patch which sets the static machine hostname:
Using the fingerprint allows you to be sure you are sending the configuration to the correct machine, but is completely optional.
After the configuration is applied to a node, it will reboot.
Repeat this process for each of the nodes in your cluster.
Further details about talosctl, endpoints and nodes
Endpoints
When passed multiple endpoints, talosctl will automatically load balance requests to, and fail over between, all endpoints.
You can pass in --endpoints <IP Address1>,<IP Address2> as a comma separated list of IP/DNS addresses to the current talosctl command.
You can also set the endpoints in your talosconfig, by calling talosctl config endpoint <IP Address1> <IP Address2>.
Note: these are space separated, not comma separated.
As an example, if the IP addresses of our control plane nodes are:
The node is the target you wish to perform the API call on.
It is possible to set a default set of nodes in the talosconfig file, but our recommendation is to explicitly pass in the node or nodes to be operated on with each talosctl command.
For a more in-depth discussion of Endpoints and Nodes, please see talosctl.
Default configuration file
You can reference which configuration file to use directly with the --talosconfig parameter:
talosctl --talosconfig=./talosconfig \
--nodes 192.168.0.2 version
However, talosctl comes with tooling to help you integrate and merge this configuration into the default talosctl configuration file.
This is done with the merge option.
talosctl config merge ./talosconfig
This will merge your new talosconfig into the default configuration file ($XDG_CONFIG_HOME/talos/config.yaml), creating it if necessary.
Like Kubernetes, the talosconfig configuration files has multiple “contexts” which correspond to multiple clusters.
The <cluster-name> you chose above will be used as the context name.
Kubernetes Bootstrap
Bootstrapping your Kubernetes cluster by simply calling the bootstrap command against any of your control plane nodes (or the loadbalancer, if used for the Talos API endpoint).:
talosctl bootstrap --nodes 192.168.0.2
The bootstrap operation should only be called ONCE and only on a SINGLE control plane node!
At this point, Talos will form an etcd cluster, generate all of the core Kubernetes assets, and start the Kubernetes control plane components.
After a few moments, you will be able to download your Kubernetes client configuration and get started:
talosctl kubeconfig
Running this command will add (merge) you new cluster into your local Kubernetes configuration.
If you would prefer the configuration to not be merged into your default Kubernetes configuration file, pass in a filename:
talosctl kubeconfig alternative-kubeconfig
You should now be able to connect to Kubernetes and see your nodes:
kubectl get nodes
And use talosctl to explore your cluster:
talosctl -n <NODEIP> dashboard
For a list of all the commands and operations that talosctl provides, see the CLI reference.
1.5 - System Requirements
Hardware requirements for running Talos Linux.
Minimum Requirements
Role
Memory
Cores
System Disk
Control Plane
2 GiB
2
10 GiB
Worker
1 GiB
1
10 GiB
Recommended
Role
Memory
Cores
System Disk
Control Plane
4 GiB
4
100 GiB
Worker
2 GiB
2
100 GiB
These requirements are similar to that of Kubernetes.
Storage
Talos Linux itself only requires less than 100 MB of disk space, but the EPHEMERAL partition is used to store pulled images, container work directories, and so on.
Thus a minimum is 10 GiB of disk space is required.
100 GiB is desired.
Note, however, that because Talos Linux assumes complete control of the disk it is installed on, so that it can control the partition table for image based upgrades, you cannot partition the rest of the disk for use by workloads.
Thus it is recommended to install Talos Linux on a small, dedicated disk - using a Terabyte sized SSD for the Talos install disk would be wasteful.
Sidero Labs recommends having separate disks (apart from the Talos install disk) to be used for storage.
Please read this section carefully before upgrading to Talos 1.9.0.
Direct Rendering Manager (DRM)
Starting with Talos 1.9, the i915 and amdgpu DRM drivers have been removed from the Talos base image.
These drivers, along with their firmware, are now included in two new system extensions named i915 and amdgpu.
The previously available extensions i915-ucode and amdgpu-firmware have been retired.
Upgrades via Image Factory or Omni will automatically include the new extensions if the i915-ucode or amdgpu-firmware extensions were previously used.
udevd
Talos previously used eudev to provide udevd, now it uses systemd-udevd instead.
The systemd-udevd might change the names of network interfaces with predictable names, potentially causing issues with existing configurations.
Image Cache
Talos now supports providing a local Image Cache for container images.
The Image Cache feature can be used to avoid downloading the required images over the network, which can be useful in air-gapped or weak connectivity environments.
Networking
Custom DNS Search Domains
Talos now allows to supports specifying custom search domains for Talos nodes using
new machine configuration field .machine.network.searchDomains.
For the host the /etc/resolve.conf would look like:
Talos now supports matching on permanent hardware (MAC) address of the network interfaces.
This is specifically useful to match bond members, as they change their hardware addresses when they become part of the bond.
Node Address Ordering
Talos supports new experimental address sort algorithm for NodeAddress which are used to pick up default addresses for kubelet, etcd, etc.
It can be enabled with the following config patch:
machine:
features:
nodeAddressSortAlgorithm: v2
The new algorithm prefers more specific prefixes, which is specifically useful for IPv6 addresses.
Control Groups Analysis
The talosctl cgroups command has been added to the talosctl tool.
This command allows you to view the cgroup resource consumption and limits for a machine, e.g.
talosctl cgroups --preset memory.
Kubernetes
APIServer Authorization Config
Starting with Talos 1.9, .cluster.apiServer.authorizationConfig field supports setting Kubernetes API server authorization modes
using the --authorization-config flag.
The machine config field supports a list of authorizers.
For instance:
For new cluster if the Kubernetes API server supports the --authorization-config flag, it’ll be used by default instead of the --authorization-mode flag.
By default Talos will always add the Node and RBAC authorizers to the list.
When upgrading if either a user-provided authorization-mode or authorization-webhook-* flag is set via .cluster.apiServer.extraArgs, it’ll be used instead of the new AuthorizationConfig.
Current authorization config can be viewed by running: talosctl get authorizationconfigs.kubernetes.talos.dev -o yaml.
User Namespaces
Talos Linux now supports running Kubernetes pods with user namespaces enabled.
Please refer to the documentation for more information.
In versions before Talos 1.9, there was a discrepancy between the way Talos itself and CRI plugin resolves registry mirrors:
Talos will never fall back to the default registry if endpoints are configured, while CRI plugin will.
Note: Talos Linux pulls images for the installer, kubelet, etcd, while all workload images are pulled by the CRI plugin.
In Talos 1.9 this was fixed, so that by default an upstream registry is used as a fallback in all cases, while new registry mirror
configuration option.skipFallback can be used to disable this behavior both for Talos and CRI plugin.
Miscellaneous
auditd
Talos Linux now starts an auditd service by default.
Linux kernel audit logs can be fetched with talosctl logs auditd.
talosctl disks
The command talosctl disks was removed, please use talosctl get disks, talosctl get systemdisk, and talosctl get blockdevices instead.
talosctl wipe
The new command talosctl wipe disk allows to wipe a disk or a partition which is not used as a volume.
x86: BIOS, UEFI; arm64: UEFI; boot: ISO, PXE, disk image
- virtualized
VMware, Hyper-V, KVM, Proxmox, Xen
VMware, Hyper-V, KVM, Proxmox, Xen
- SBCs
Banana Pi M64, Jetson Nano, Libre Computer Board ALL-H3-CC, Nano Pi R4S, Pine64, Pine64 Rock64, Radxa ROCK Pi 4c, Radxa Rock4c+, Raspberry Pi 4B, Raspberry Pi Compute Module 4, Turing RK1
Banana Pi M64, Jetson Nano, Libre Computer Board ALL-H3-CC, Nano Pi R4S, Orange Pi R1 Plus LTS, Pine64, Pine64 Rock64, Radxa ROCK Pi 4c, Raspberry Pi 4B, Raspberry Pi Compute Module 4
Tier 2: Tested from time to time, medium-priority bugfixes.
Tier 3: Not tested by core Talos team, community tested.
Tier 1
Metal
AWS
Azure
GCP
Tier 2
Digital Ocean
OpenStack
VMWare
Tier 3
Akamai
CloudStack
Exoscale
Hetzner
nocloud
OpenNebula
Oracle Cloud
Scaleway
Vultr
Upcloud
1.8 - Troubleshooting
Troubleshoot control plane and other failures for Talos Linux clusters.
In this guide we assume that Talos is configured with default features enabled, such as Discovery Service and KubePrism.
If these features are disabled, some of the troubleshooting steps may not apply or may need to be adjusted.
This guide is structured so that it can be followed step-by-step, skip sections which are not relevant to your issue.
Network Configuration
As Talos Linux is an API-based operating system, it is important to have networking configured so that the API can be accessed.
Some information can be gathered from the Interactive Dashboard which is available on the machine console.
When running in the cloud the networking should be configured automatically.
Whereas when running on bare-metal it may need more specific configuration, see networking metal configuration guide.
Talos API
The Talos API runs on port 50000.
Control plane nodes should always serve the Talos API, while worker nodes require access to the control plane nodes to issue TLS certificates for the workers.
Firewall Issues
Make sure that the firewall is not blocking port 50000, and communication on ports 50000/50001 inside the cluster.
Client Configuration Issues
Make sure to use correct talosconfig client configuration file matching your cluster.
See getting started for more information.
The most common issue is that talosctl gen config writes talosconfig to the file in the current directory, while talosctl by default picks up the configuration from the default location (~/.talos/config).
The path to the configuration file can be specified with --talosconfig flag to talosctl.
Conflict on Kubernetes and Host Subnets
If talosctl returns an error saying that certificate IPs are empty, it might be due to a conflict between Kubernetes and host subnets.
The Talos API runs on the host network, but it automatically excludes Kubernetes pod & network subnets from the useable set of addresses.
Talos default machine configuration specifies the following Kubernetes pod and subnet IPv4 CIDRs: 10.244.0.0/16 and 10.96.0.0/12.
If the host network is configured with one of these subnets, change the machine configuration to use a different subnet.
Wrong Endpoints
The talosctl CLI connects to the Talos API via the specified endpoints, which should be a list of control plane machine addresses.
The client will automatically retry on other endpoints if there are unavailable endpoints.
Worker nodes should not be used as the endpoint, as they are not able to forward request to other nodes.
The VIP should never be used as Talos API endpoint.
TCP Loadbalancer
When using a TCP loadbalancer, make sure the loadbalancer endpoint is included in the .machine.certSANs list in the machine configuration.
System Requirements
If minimum system requirements are not met, this might manifest itself in various ways, such as random failures when starting services, or failures to pull images from the container registry.
Running Health Checks
Talos Linux provides a set of basic health checks with talosctl health command which can be used to check the health of the cluster.
In the default mode, talosctl health uses information from the discovery to get the information about cluster members.
This can be overridden with command line flags --control-plane-nodes and --worker-nodes.
Gathering Logs
While the logs and state of the system can be queried via the Talos API, it is often useful to gather the logs from all nodes in the cluster, and analyze them offline.
The talosctl support command can be used to gather logs and other information from the nodes specified with --nodes flag (multiple nodes are supported).
Discovery and Cluster Membership
Talos Linux uses Discovery Service to discover other nodes in the cluster.
The list of members on each machine should be consistent: talosctl -n <IP> get members.
Some Members are Missing
Ensure connectivity to the discovery service (default is discovery.talos.dev:443), and that the discovery registry is not disabled.
Duplicate Members
Don’t use same base secrets to generate machine configuration for multiple clusters, as some secrets are used to identify members of the same cluster.
So if the same machine configuration (or secrets) are used to repeatedly create and destroy clusters, the discovery service will see the same nodes as members of different clusters.
Removed Members are Still Present
Talos Linux removes itself from the discovery service when it is reset.
If the machine was not reset, it might show up as a member of the cluster for the maximum TTL of the discovery service (30 minutes), and after that it will be automatically removed.
etcd Issues
etcd is the distributed key-value store used by Kubernetes to store its state.
Talos Linux provides automation to manage etcd members running on control plane nodes.
If etcd is not healthy, the Kubernetes API server will not be able to function correctly.
It is always recommended to run an odd number of etcd members, as with 3 or more members it provides fault tolerance for less than quorum member failures.
Common troubleshooting steps:
check etcd service state with talosctl -n IP service etcd for each control plane node
check etcd membership on each control plane node with talosctl -n IP etcd member list
check etcd logs with talosctl -n IP logs etcd
check etcd alarms with talosctl -n IP etcd alarm list
etcd will only run on control plane nodes.
If a node is designated as a worker node, you should not expect etcd to be running on it.
When a node boots for the first time, the etcd data directory (/var/lib/etcd) is empty, and it will only be populated when etcd is launched.
If the etcd service is crashing and restarting, check its logs with talosctl -n <IP> logs etcd.
The most common reasons for crashes are:
wrong arguments passed via extraArgs in the configuration;
booting Talos on non-empty disk with an existing Talos installation, /var/lib/etcd contains data from the old cluster.
kubelet and Kubernetes Node Issues
The kubelet service should be running on all Talos nodes, and it is responsible for running Kubernetes pods,
static pods (including control plane components), and registering the node with the Kubernetes API server.
If the kubelet doesn’t run on a control plane node, it will block the control plane components from starting.
The node will not be registered in Kubernetes until the Kubernetes API server is up and initial Kubernetes manifests are applied.
kubelet is not running
Check that kubelet image is available (talosctl image ls --namespace system).
Check kubelet logs with talosctl -n IP logs kubelet for startup errors:
make sure Kubernetes version is supported with this Talos release
make sure kubelet extra arguments and extra configuration supplied with Talos machine configuration is valid
Talos Complains about Node Not Found
kubelet hasn’t yet registered the node with the Kubernetes API server, this is expected during initial cluster bootstrap, the error will go away.
If the message persists, check Kubernetes API health.
The Kubernetes controller manager (kube-controller-manager) is responsible for monitoring the certificate
signing requests (CSRs) and issuing certificates for each of them.
The kubelet is responsible for generating and submitting the CSRs for its
associated node.
The state of any CSRs can be checked with kubectl get csr:
$ kubectl get csr
NAME AGE SIGNERNAME REQUESTOR CONDITION
csr-jcn9j 14m kubernetes.io/kube-apiserver-client-kubelet system:bootstrap:q9pyzr Approved,Issued
csr-p6b9q 14m kubernetes.io/kube-apiserver-client-kubelet system:bootstrap:q9pyzr Approved,Issued
csr-sw6rm 14m kubernetes.io/kube-apiserver-client-kubelet system:bootstrap:q9pyzr Approved,Issued
csr-vlghg 14m kubernetes.io/kube-apiserver-client-kubelet system:bootstrap:q9pyzr Approved,Issued
Talos Linux doesn’t manage the external IP, it is managed with the Kubernetes Cloud Controller Manager.
kubectl get nodes Reports Wrong Node Name
By default, the Kubernetes node name is derived from the hostname.
Update the hostname using the machine configuration, cloud configuration, or via DHCP server.
Node Is Not Ready
A Node in Kubernetes is marked as Ready only once its CNI is up.
It takes a minute or two for the CNI images to be pulled and for the CNI to start.
If the node is stuck in this state for too long, check CNI pods and logs with kubectl.
Usually, CNI-related resources are created in kube-system namespace.
For example, for the default Talos Flannel CNI:
$ kubectl -n kube-system get pods
NAME READY STATUS RESTARTS AGE
...
kube-flannel-25drx 1/1 Running 0 23m
kube-flannel-8lmb6 1/1 Running 0 23m
kube-flannel-gl7nx 1/1 Running 0 23m
kube-flannel-jknt9 1/1 Running 0 23m
...
Duplicate/Stale Nodes
Talos Linux doesn’t remove Kubernetes nodes automatically, so if a node is removed from the cluster, it will still be present in Kubernetes.
Remove the node from Kubernetes with kubectl delete node <node-name>.
Talos Complains about Certificate Errors on kubelet API
This error might appear during initial cluster bootstrap, and it will go away once the Kubernetes API server is up and the node is registered.
The example of Talos logs:
[talos] controller failed {"component": "controller-runtime", "controller": "k8s.KubeletStaticPodController", "error": "error refreshing pod status: error fetching pod status: Get \"https://127.0.0.1:10250/pods/?timeout=30s\": remote error: tls: internal error"}
By default configuration, kubelet issues a self-signed server certificate, but when rotate-server-certificates feature is enabled,
kubelet issues its certificate using kube-apiserver.
Make sure the kubelet CSR is approved by the Kubernetes API server.
In either case, this error is not critical, as it only affects reporting of the pod status to Talos Linux.
Kubernetes Control Plane
The Kubernetes control plane consists of the following components:
kube-apiserver - the Kubernetes API server
kube-controller-manager - the Kubernetes controller manager
kube-scheduler - the Kubernetes scheduler
Optionally, kube-proxy runs as a DaemonSet to provide pod-to-service communication.
coredns provides name resolution for the cluster.
CNI is not part of the control plane, but it is required for Kubernetes pods using pod networking.
Troubleshooting should always start with kube-apiserver, and then proceed to other components.
Talos Linux configures kube-apiserver to talk to the etcd running on the same node, so etcd must be healthy before kube-apiserver can start.
The kube-controller-manager and kube-scheduler are configured to talk to the kube-apiserver on the same node, so they will not start until kube-apiserver is healthy.
Control Plane Static Pods
Talos should generate the static pod definitions for the Kubernetes control plane
as resources:
$ talosctl -n <IP> get staticpods
NODE NAMESPACE TYPE ID VERSION
172.20.0.2 k8s StaticPod kube-apiserver 1172.20.0.2 k8s StaticPod kube-controller-manager 1172.20.0.2 k8s StaticPod kube-scheduler 1
Talos should report that the static pod definitions are rendered for the kubelet:
$ talosctl -n <IP> dmesg | grep 'rendered new'172.20.0.2: user: warning: [2023-04-26T19:17:52.550527204Z]: [talos] rendered new static pod {"component": "controller-runtime", "controller": "k8s.StaticPodServerController", "id": "kube-apiserver"}172.20.0.2: user: warning: [2023-04-26T19:17:52.552186204Z]: [talos] rendered new static pod {"component": "controller-runtime", "controller": "k8s.StaticPodServerController", "id": "kube-controller-manager"}172.20.0.2: user: warning: [2023-04-26T19:17:52.554607204Z]: [talos] rendered new static pod {"component": "controller-runtime", "controller": "k8s.StaticPodServerController", "id": "kube-scheduler"}
If the static pod definitions are not rendered, check etcd and kubelet service health (see above)
and the controller runtime logs (talosctl logs controller-runtime).
Control Plane Pod Status
Initially the kube-apiserver component will not be running, and it takes some time before it becomes fully up
during bootstrap (image should be pulled from the Internet, etc.)
The status of the control plane components on each of the control plane nodes can be checked with talosctl containers -k:
If the control plane component reports error on startup, check that:
make sure Kubernetes version is supported with this Talos release
make sure extra arguments and extra configuration supplied with Talos machine configuration is valid
Kubernetes Bootstrap Manifests
As part of the bootstrap process, Talos injects bootstrap manifests into Kubernetes API server.
There are two kinds of these manifests: system manifests built-in into Talos and extra manifests downloaded (custom CNI, extra manifests in the machine config):
Once the Kubernetes API server is up, other control plane components issues can be troubleshooted with kubectl:
kubectl get nodes -o wide
kubectl get pods -o wide --all-namespaces
kubectl describe pod -n NAMESPACE POD
kubectl logs -n NAMESPACE POD
Kubernetes API
The Kubernetes API client configuration (kubeconfig) can be retrieved using Talos API with talosctl -n <IP> kubeconfig command.
Talos Linux mostly doesn’t depend on the Kubernetes API endpoint for the cluster, but Kubernetes API endpoint should be configured
correctly for external access to the cluster.
Kubernetes Control Plane Endpoint
The Kubernetes control plane endpoint is the single canonical URL by which the
Kubernetes API is accessed.
Especially with high-availability (HA) control planes, this endpoint may point to a load balancer or a DNS name which may
have multiple A and AAAA records.
Like Talos’ own API, the Kubernetes API uses mutual TLS, client
certs, and a common Certificate Authority (CA).
Unlike general-purpose websites, there is no need for an upstream CA, so tools
such as cert-manager, Let’s Encrypt, or products such
as validated TLS certificates are not required.
Encryption, however, is, and hence the URL scheme will always be https://.
By default, the Kubernetes API server in Talos runs on port 6443.
As such, the control plane endpoint URLs for Talos will almost always be of the form
https://endpoint:6443.
(The port, since it is not the https default of 443 is required.)
The endpoint above may be a DNS name or IP address, but it should be
directed to the set of all controlplane nodes, as opposed to a
single one.
As mentioned above, this can be achieved by a number of strategies, including:
BGP peering of a shared IP (such as with kube-vip)
Using a DNS name here is a good idea, since it allows any other option, while offering
a layer of abstraction.
It allows the underlying IP addresses to change without impacting the
canonical URL.
Unlike most services in Kubernetes, the API server runs with host networking,
meaning that it shares the network namespace with the host.
This means you can use the IP address(es) of the host to refer to the Kubernetes
API server.
For availability of the API, it is important that any load balancer be aware of
the health of the backend API servers, to minimize disruptions during
common node operations like reboots and upgrades.
Miscellaneous
Checking Controller Runtime Logs
Talos runs a set of controllers which operate on resources to build and support machine operations.
Some debugging information can be queried from the controller logs with talosctl logs controller-runtime:
talosctl -n <IP> logs controller-runtime
Controllers continuously run a reconcile loop, so at any time, they may be starting, failing, or restarting.
This is expected behavior.
If there are no new messages in the controller-runtime log, it means that the controllers have successfully finished reconciling, and that the current system state is the desired system state.
2 - Talos Linux Guides
Documentation on how to manage Talos Linux
2.1 - Installation
How to install Talos Linux on various platforms
2.1.1 - Bare Metal Platforms
Installation of Talos Linux on various bare-metal platforms.
2.1.1.1 - Equinix Metal
Creating Talos clusters with Equinix Metal.
You can create a Talos Linux cluster on Equinix Metal in a variety of ways, such as through the EM web UI, or the metal command line tool.
Regardless of the method, the process is:
Create a DNS entry for your Kubernetes endpoint.
Generate the configurations using talosctl.
Provision your machines on Equinix Metal.
Push the configurations to your servers (if not done as part of the machine provisioning).
Configure your Kubernetes endpoint to point to the newly created control plane nodes.
Bootstrap the cluster.
Define the Kubernetes Endpoint
There are a variety of ways to create an HA endpoint for the Kubernetes cluster.
Some of the ways are:
DNS
Load Balancer
BGP
Whatever way is chosen, it should result in an IP address/DNS name that routes traffic to all the control plane nodes.
We do not know the control plane node IP addresses at this stage, but we should define the endpoint DNS entry so that we can use it in creating the cluster configuration.
After the nodes are provisioned, we can use their addresses to create the endpoint A records, or bind them to the load balancer, etc.
Create the Machine Configuration Files
Generating Configurations
Using the DNS name of the loadbalancer defined above, generate the base configuration files for the Talos machines:
$ talosctl gen config talos-k8s-em-tutorial https://<load balancer IP or DNS>:<port>
created controlplane.yaml
created worker.yaml
created talosconfig
The port used above should be 6443, unless your load balancer maps a different port to port 6443 on the control plane nodes.
Validate the Configuration Files
talosctl validate --config controlplane.yaml --mode metal
talosctl validate --config worker.yaml --mode metal
Note: Validation of the install disk could potentially fail as validation
is performed on your local machine and the specified disk may not exist.
Passing in the configuration as User Data
You can use the metadata service provide by Equinix Metal to pass in the machines configuration.
It is required to add a shebang to the top of the configuration file.
The convention we use is #!talos.
Provision the machines in Equinix Metal
Talos Linux can be PXE-booted on Equinix Metal using Image Factory, using the equinixMetal platform: e.g.
https://pxe.factory.talos.dev/pxe/376567988ad370138ad8b2698212367b8edcb69b5fd68c80be1f2ec7d603b4ba/v1.9.0/equinixMetal-amd64 (this URL references the default schematic and amd64 architecture).
Follow the Image Factory guide to create a custom schematic, e.g. with CPU microcode updates.
The PXE boot URL can be used as the iPXE script URL.
Using the Equinix Metal UI
Simply select the location and type of machines in the Equinix Metal web interface.
Select ‘Custom iPXE’ as the Operating System and enter the Image Factory PXE URL as the iPXE script URL, then select the number of servers to create, and name them (in lowercase only.)
Under optional settings, you can optionally paste in the contents of controlplane.yaml that was generated, above (ensuring you add a first line of #!talos).
You can repeat this process to create machines of different types for control plane and worker nodes (although you would pass in worker.yaml for the worker nodes, as user data).
If you did not pass in the machine configuration as User Data, you need to provide it to each machine, with the following command:
e.g. metal device create -p <projectID> -f da11 -O custom_ipxe -P c3.small.x86 -H steve.test.11 --userdata-file ./controlplane.yaml --ipxe-script-url "https://pxe.factory.talos.dev/pxe/376567988ad370138ad8b2698212367b8edcb69b5fd68c80be1f2ec7d603b4ba/v1.9.0/equinixMetal-amd64"
Repeat this to create each control plane node desired: there should usually be 3 for a HA cluster.
Update the Kubernetes endpoint
Now our control plane nodes have been created, and we know their IP addresses, we can associate them with the Kubernetes endpoint.
Configure your load balancer to route traffic to these nodes, or add A records to your DNS entry for the endpoint, for each control plane node.
e.g.
host endpoint.mydomain.com
endpoint.mydomain.com has address 145.40.90.201
endpoint.mydomain.com has address 147.75.109.71
endpoint.mydomain.com has address 145.40.90.177
This only needs to be issued to one control plane node.
Retrieve the kubeconfig
At this point we can retrieve the admin kubeconfig by running:
talosctl --talosconfig talosconfig kubeconfig .
2.1.1.2 - ISO
Booting Talos on bare-metal with ISO.
Talos can be installed on bare-metal machine using an ISO image.
ISO images for amd64 and arm64 architectures are available on the Talos releases page.
Talos doesn’t install itself to disk when booted from an ISO until the machine configuration is applied.
Please follow the getting started guide for the generic steps on how to install Talos.
Note: If there is already a Talos installation on the disk, the machine will boot into that installation when booting from a Talos ISO.
The boot order should prefer disk over ISO, or the ISO should be removed after the installation to make Talos boot from disk.
metal-<arch>.iso supports booting on BIOS and UEFI systems (for x86, UEFI only for arm64)
metal-<arch>-secureboot.iso supports booting on only UEFI systems in SecureBoot mode (via Image Factory)
2.1.1.3 - Matchbox
In this guide we will create an HA Kubernetes cluster with 3 worker nodes using an existing load balancer and matchbox deployment.
Creating a Cluster
In this guide we will create an HA Kubernetes cluster with 3 worker nodes.
We assume an existing load balancer, matchbox deployment, and some familiarity with iPXE.
We leave it up to the user to decide if they would like to use static networking, or DHCP.
The setup and configuration of DHCP will not be covered.
Create the Machine Configuration Files
Generating Base Configurations
Using the DNS name of the load balancer, generate the base configuration files for the Talos machines:
$ talosctl gen config talos-k8s-metal-tutorial https://<load balancer IP or DNS>:<port>
created controlplane.yaml
created worker.yaml
created talosconfig
At this point, you can modify the generated configs to your liking.
Optionally, you can specify --config-patch with RFC6902 jsonpatch which will be applied during the config generation.
Validate the Configuration Files
$ talosctl validate --config controlplane.yaml --mode metal
controlplane.yaml is valid for metal mode
$ talosctl validate --config worker.yaml --mode metal
worker.yaml is valid for metal mode
Publishing the Machine Configuration Files
In bare-metal setups it is up to the user to provide the configuration files over HTTP(S).
A special kernel parameter (talos.config) must be used to inform Talos about where it should retrieve its configuration file.
To keep things simple we will place controlplane.yaml, and worker.yaml into Matchbox’s assets directory.
This directory is automatically served by Matchbox.
Create the Matchbox Configuration Files
The profiles we will create will reference vmlinuz, and initramfs.xz.
Download these files from the release of your choice, and place them in /var/lib/matchbox/assets.
Now that we have our configuration files in place, boot all the machines.
Talos will come up on each machine, grab its configuration file, and bootstrap itself.
At this point we can retrieve the admin kubeconfig by running:
talosctl --talosconfig talosconfig kubeconfig .
2.1.1.4 - Network Configuration
In this guide we will describe how network can be configured on bare-metal platforms.
By default, Talos will run DHCP client on all interfaces which have a link, and that might be enough for most of the cases.
If some advanced network configuration is required, it can be done via the machine configuration file.
But sometimes it is required to apply network configuration even before the machine configuration can be fetched from the network.
Kernel Command Line
Talos supports some kernel command line parameters to configure network before the machine configuration is fetched.
Note: Kernel command line parameters are not persisted after Talos installation, so proper network configuration should be done via the machine configuration.
Address, default gateway and DNS servers can be configured via ip= kernel command line parameter:
Some platforms (e.g. AWS, Google Cloud, etc.) have their own network configuration mechanisms, which can be used to perform the initial network configuration.
There is no such mechanism for bare-metal platforms, so Talos provides a way to use platform network config on the metal platform to submit the initial network configuration.
The platform network configuration is a YAML document which contains resource specifications for various network resources.
For the metal platform, the interactive dashboard can be used to edit the platform network configuration, also the configuration can be
created manually.
The current value of the platform network configuration can be retrieved using the MetaKeys resource (key 0x0a):
talosctl get meta 0x0a
The platform network configuration can be updated using the talosctl meta command for the running node:
talosctl meta write 0x0a '{"externalIPs": ["1.2.3.4"]}'talosctl meta delete 0x0a
The initial platform network configuration for the metal platform can be also included into the generated Talos image:
docker run --rm -i ghcr.io/siderolabs/imager:v1.9.0 iso --arch amd64 --tar-to-stdout --meta 0x0a='{...}' | tar xz
docker run --rm -i --privileged ghcr.io/siderolabs/imager:v1.9.0 image --platform metal --arch amd64 --tar-to-stdout --meta 0x0a='{...}' | tar xz
The platform network configuration gets merged with other sources of network configuration, the details can be found in the network resources guide.
nocloud Network Configuration
Some bare-metal providers provide a way to configure network via the nocloud data source.
Talos Linux can automatically pick up this configuration when nocloud image is used.
2.1.1.5 - PXE
Booting Talos over the network on bare-metal with PXE.
Talos can be installed on bare-metal using PXE service.
There are more detailed guides for PXE booting using Matchbox.
This guide describes generic steps for PXE booting Talos on bare-metal.
First, download the vmlinuz and initramfs assets from the Talos releases page.
Set up the machines to PXE boot from the network (usually by setting the boot order in the BIOS).
There might be options specific to the hardware being used, booting in BIOS or UEFI mode, using iPXE, etc.
Talos requires the following kernel parameters to be set on the initial boot:
talos.platform=metal
slab_nomerge
pti=on
When booted from the network without machine configuration, Talos will start in maintenance mode.
Please follow the getting started guide for the generic steps on how to install Talos.
Note: If there is already a Talos installation on the disk, the machine will boot into that installation when booting from network.
The boot order should prefer disk over network.
Talos can automatically fetch the machine configuration from the network on the initial boot using talos.config kernel parameter.
A metadata service (HTTP service) can be implemented to deliver customized configuration to each node for example by using the MAC address of the node:
Note: The talos.config kernel parameter supports other substitution variables, see kernel parameters reference for the full list.
PXE booting can be also performed via Image Factory.
2.1.1.6 - SecureBoot
Booting Talos in SecureBoot mode on UEFI platforms.
Talos now supports booting on UEFI systems in SecureBoot mode.
When combined with TPM-based disk encryption, this provides Trusted Boot experience.
Note: SecureBoot is not supported on x86 platforms in BIOS mode.
The implementation is using systemd-boot as a boot menu implementation, while the
Talos kernel, initramfs and cmdline arguments are combined into the Unified Kernel Image (UKI) format.
UEFI firmware loads the systemd-boot bootloader, which then loads the UKI image.
Both systemd-boot and Talos UKI image are signed with the key, which is enrolled into the UEFI firmware.
As Talos Linux is fully contained in the UKI image, the full operating system is verified and booted by the UEFI firmware.
Note: There is no support at the moment to upgrade non-UKI (GRUB-based) Talos installation to use UKI/SecureBoot, so a fresh installation is required.
Note: The SecureBoot images are available for Talos releases starting from v1.5.0.
The easiest way to get started with SecureBoot is to download the ISO, and
boot it on a UEFI-enabled system which has SecureBoot enabled in setup mode.
The ISO bootloader will enroll the keys in the UEFI firmware, and boot the Talos Linux in SecureBoot mode.
The install should performed using SecureBoot installer (put it Talos machine configuration): factory.talos.dev/installer-secureboot/376567988ad370138ad8b2698212367b8edcb69b5fd68c80be1f2ec7d603b4ba:v1.9.0.
Note: SecureBoot images can also be generated with custom keys.
Booting Talos Linux in SecureBoot Mode
In this guide we will use the ISO image to boot Talos Linux in SecureBoot mode, followed by submitting machine configuration to the machine in maintenance mode.
We will use one the ways to generate and submit machine configuration to the node, please refer to the Production Notes for the full guide.
First, make sure SecureBoot is enabled in the UEFI firmware.
For the first boot, the UEFI firmware should be in the setup mode, so that the keys can be enrolled into the UEFI firmware automatically.
If the UEFI firmware does not support automatic enrollment, you may need to hit Esc to force the boot menu to appear, and select the Enroll Secure Boot keys: auto option.
Note: There are other ways to enroll the keys into the UEFI firmware, but this is out of scope of this guide.
Once Talos is running in maintenance mode, verify that secure boot is enabled:
$ talosctl -n <IP> get securitystate --insecure
NODE NAMESPACE TYPE ID VERSION SECUREBOOT
runtime SecurityState securitystate 1true
Now we will generate the machine configuration for the node supplying the installer-secureboot container image, and applying the patch to enable TPM-based disk encryption (requires TPM 2.0):
Talos will perform the installation to the disk and reboot the node.
Please make sure that the ISO image is not attached to the node anymore, otherwise the node will boot from the ISO image again.
Once the node is rebooted, verify that the node is running in secure boot mode:
talosctl -n <IP> --talosconfig=talosconfig get securitystate
Upgrading Talos Linux
Any change to the boot asset (kernel, initramfs, kernel command line) requires the UKI to be regenerated and the installer image to be rebuilt.
Follow the steps above to generate new installer image updating the boot assets: use new Talos version, add a system extension, or modify the kernel command line.
Once the new installer image is pushed to the registry, upgrade the node using the new installer image.
It is important to preserve the UKI signing key and the PCR signing key, otherwise the node will not be able to boot with the new UKI and unlock the encrypted partitions.
Disk Encryption with TPM
When encrypting the disk partition for the first time, Talos Linux generates a random disk encryption key and seals (encrypts) it with the TPM device.
The TPM unlock policy is configured to trust the expected policy signed by the PCR signing key.
This way TPM unlocking doesn’t depend on the exact PCR measurements, but rather on the expected policy signed by the PCR signing key and the state of SecureBoot (PCR 7 measurement, including secureboot status and the list of enrolled keys).
When the UKI image is generated, the UKI is measured and expected measurements are combined into TPM unlock policy and signed with the PCR signing key.
During the boot process, systemd-stub component of the UKI performs measurements of the UKI sections into the TPM device.
Talos Linux during the boot appends to the PCR register the measurements of the boot phases, and once the boot reaches the point of mounting the encrypted disk partition,
the expected signed policy from the UKI is matched against measured values to unlock the TPM, and TPM unseals the disk encryption key which is then used to unlock the disk partition.
During the upgrade, as long as the new UKI is contains PCR policy signed with the same PCR signing key, and SecureBoot state has not changed the disk partition will be unlocked successfully.
Disk encryption is also tied to the state of PCR register 7, so that it unlocks only if SecureBoot is enabled and the set of enrolled keys hasn’t changed.
Other Boot Options
Unified Kernel Image (UKI) is a UEFI-bootable image which can be booted directly from the UEFI firmware skipping the systemd-boot bootloader.
In network boot mode, the UKI can be used directly as well, as it contains the full set of boot assets required to boot Talos Linux.
When SecureBoot is enabled, the UKI image ignores any kernel command line arguments passed to it, but rather uses the kernel command line arguments embedded into the UKI image itself.
If kernel command line arguments need to be changed, the UKI image needs to be rebuilt with the new kernel command line arguments.
SecureBoot with Custom Keys
Generating the Keys
Talos requires two set of keys to be used for the SecureBoot process:
SecureBoot key is used to sign the boot assets and it is enrolled into the UEFI firmware.
PCR Signing Key is used to sign the TPM policy, which is used to seal the disk encryption key.
The same key might be used for both, but it is recommended to use separate keys for each purpose.
Talos provides a utility to generate the keys, but existing PKI infrastructure can be used as well:
The generated certificate and private key are written to disk in PEM-encoded format (RSA 4096-bit key).
The certificate is also written in DER format for the systems which expect the certificate in DER format.
PCR signing key can be generated with:
$ talosctl gen secureboot pcr
writing _out/pcr-signing-key.pem
The file containing the private key is written to disk in PEM-encoded format (RSA 2048-bit key).
Optionally, UEFI automatic key enrollment database can be generated using the _out/uki-signing-* files as input:
These files can be used to enroll the keys into the UEFI firmware automatically when booting from a SecureBoot ISO while UEFI firmware is in the setup mode.
Generating the SecureBoot Assets
Once the keys are generated, they can be used to sign the Talos boot assets to generate required ISO images, PXE boot assets, disk images, installer containers, etc.
In this guide we will generate a SecureBoot ISO image and an installer image.
The generated ISO and installer images might be further customized with system extensions, extra kernel command line arguments, etc.
2.1.2 - Virtualized Platforms
Installation of Talos Linux for virtualization platforms.
2.1.2.1 - Hyper-V
Creating a Talos Kubernetes cluster using Hyper-V.
Pre-requisities
Download the latest metal-amd64.iso ISO from github releases page
Create a New-TalosVM folder in any of your PS Module Path folders $env:PSModulePath -split ';' and save the New-TalosVM.psm1 there
Plan Overview
Here we will create a basic 3 node cluster with a single control-plane node and two worker nodes.
The only difference between control plane and worker node is the amount of RAM and an additional storage VHD.
This is personal preference and can be configured to your liking.
We are using a VMNamePrefix argument for a VM Name prefix and not the full hostname.
This command will find any existing VM with that prefix and “+1” the highest suffix it finds.
For example, if VMs talos-cp01 and talos-cp02 exist, this will create VMs starting from talos-cp03, depending on NumberOfVMs argument.
Setup a Control Plane Node
Use the following command to create a single control plane node:
This will create two VMs: talos-worker01 and talos-wworker02 and attach an additional VHD of 50GB for storage (which in my case will be passed to Mayastor).
Pushing Config to the Nodes
Now that our VMs are ready, find their IP addresses from console of VM.
With that information, push config to the control plane node with:
# set control plane IP variable$CONTROL_PLANE_IP='10.10.10.x'# Generate talos configtalosctl gen config talos-cluster https://$($CONTROL_PLANE_IP):6443 --output-dir .
# Apply config to control plane nodetalosctl apply-config --insecure --nodes $CONTROL_PLANE_IP --file .\controlplane.yaml
Now that our nodes are ready, we are ready to bootstrap the Kubernetes cluster.
# Use following command to set node and endpoint permanantly in config so you dont have to type it everytimetalosctl config endpoint $CONTROL_PLANE_IPtalosctl config node $CONTROL_PLANE_IP# Bootstrap clustertalosctl bootstrap
# Generate kubeconfigtalosctl kubeconfig .
This will generate the kubeconfig file, you can use to connect to the cluster.
2.1.2.2 - KVM
Talos is known to work on KVM.
We don’t yet have a documented guide specific to KVM; however, you can have a look at our
Vagrant & Libvirt guide which uses KVM for virtualization.
If you run into any issues, our community can probably help!
Scroll down and select your Talos version (v1.9.0 for example)
Then tick the box for siderolabs/qemu-guest-agent and submit
This will provide you with a link to the bare metal ISO
The lines we’re interested in are as follows
Metal ISO
amd64 ISO
https://factory.talos.dev/image/ce4c980550dd2ab1b17bbf2b08801c7eb59418eafe8f279833297925d67c7515/v1.9.0/metal-amd64.iso
arm64 ISO
https://factory.talos.dev/image/ce4c980550dd2ab1b17bbf2b08801c7eb59418eafe8f279833297925d67c7515/v1.9.0/metal-arm64.iso
Installer Image
For the initial Talos install or upgrade use the following installer image:
factory.talos.dev/installer/ce4c980550dd2ab1b17bbf2b08801c7eb59418eafe8f279833297925d67c7515:v1.9.0
Download the above ISO (this will most likely be amd64 for you)
Take note of the factory.talos.dev/installer URL as you’ll need it later
Upload ISO
From the Proxmox UI, select the “local” storage and enter the “Content” section.
Click the “Upload” button:
Select the ISO you downloaded previously, then hit “Upload”
Create VMs
Before starting, familiarise yourself with the
system requirements for Talos and assign VM
resources accordingly.
Create a new VM by clicking the “Create VM” button in the Proxmox UI:
Fill out a name for the new VM:
In the OS tab, select the ISO we uploaded earlier:
Keep the defaults set in the “System” tab.
Keep the defaults in the “Hard Disk” tab as well, only changing the size if desired.
In the “CPU” section, give at least 2 cores to the VM:
Note: As of Talos v1.0 (which requires the x86-64-v2 microarchitecture), prior to Proxmox V8.0, booting with the
default Processor Type kvm64 will not work.
You can enable the required CPU features after creating the VM by
adding the following line in the corresponding /etc/pve/qemu-server/<vmid>.conf file:
Alternatively, you can set the Processor Type to host if your Proxmox host supports these CPU features,
this however prevents using live VM migration.
Verify that the RAM is set to at least 2GB:
Keep the default values for networking, verifying that the VM is set to come up on the bridge interface:
Finish creating the VM by clicking through the “Confirm” tab and then “Finish”.
Repeat this process for a second VM to use as a worker node.
You can also repeat this for additional nodes desired.
Note: Talos doesn’t support memory hot plugging, if creating the VM programmatically don’t enable memory hotplug on your
Talos VM’s.
Doing so will cause Talos to be unable to see all available memory and have insufficient memory to complete
installation of the cluster.
Start Control Plane Node
Once the VMs have been created and updated, start the VM that will be the first control plane node.
This VM will boot the ISO image specified earlier and enter “maintenance mode”.
With DHCP server
Once the machine has entered maintenance mode, there will be a console log that details the IP address that the node received.
Take note of this IP address, which will be referred to as $CONTROL_PLANE_IP for the rest of this guide.
If you wish to export this IP as a bash variable, simply issue a command like export CONTROL_PLANE_IP=1.2.3.4.
Without DHCP server
To apply the machine configurations in maintenance mode, VM has to have IP on the network.
So you can set it on boot time manually.
Press e on the boot time.
And set the IP parameters for the VM.
Format is:
With the IP address above, you can now generate the machine configurations to use for installing Talos and Kubernetes.
Issue the following command, updating the output directory, cluster name, and control plane IP as you see fit:
talosctl gen config talos-proxmox-cluster https://$CONTROL_PLANE_IP:6443 --output-dir _out
This will create several files in the _out directory: controlplane.yaml, worker.yaml, and talosconfig.
Note: The Talos config by default will install to /dev/sda.
Depending on your setup the virtual disk may be mounted differently Eg: /dev/vda.
You can check for disks running the following command:
talosctl get disks --insecure --nodes $CONTROL_PLANE_IP
Update controlplane.yaml and worker.yaml config files to point to the correct disk location.
QEMU guest agent support
For QEMU guest agent support, you can generate the config with the custom install image:
talosctl gen config talos-proxmox-cluster https://$CONTROL_PLANE_IP:6443 --output-dir _out --install-image factory.talos.dev/installer/ce4c980550dd2ab1b17bbf2b08801c7eb59418eafe8f279833297925d67c7515:v1.9.0
In Proxmox, go to your VM –> Options and ensure that QEMU Guest Agent is Enabled
The QEMU agent is now configured
Create Control Plane Node
Using the controlplane.yaml generated above, you can now apply this config using talosctl.
Issue:
You should now see some action in the Proxmox console for this VM.
Talos will be installed to disk, the VM will reboot, and then Talos will configure the Kubernetes control plane on this VM.
Note: This process can be repeated multiple times to create an HA control plane.
Create Worker Node
Create at least a single worker node using a process similar to the control plane creation above.
Start the worker node VM and wait for it to enter “maintenance mode”.
Take note of the worker node’s IP address, which will be referred to as $WORKER_IP
Note: This process can be repeated multiple times to add additional workers.
Using the Cluster
Once the cluster is available, you can make use of talosctl and kubectl to interact with the cluster.
For example, to view current running containers, run talosctl containers for a list of containers in the system namespace, or talosctl containers -k for the k8s.io namespace.
To view the logs of a container, use talosctl logs <container> or talosctl logs -k <container>.
First, configure talosctl to talk to your control plane node by issuing the following, updating paths and IPs as necessary:
We will use Vagrant and its libvirt plugin to create a KVM-based cluster with 3 control plane nodes and 1 worker node.
For this, we will mount Talos ISO into the VMs using a virtual CD-ROM,
and configure the VMs to attempt to boot from the disk first with the fallback to the CD-ROM.
We will also configure a virtual IP address on Talos to achieve high-availability on kube-apiserver.
Preparing the environment
First, we download the latest metal-amd64.iso ISO from GitHub releases into the /tmp directory.
Current machine states:
control-plane-node-1 not created (libvirt)
control-plane-node-2 not created (libvirt)
control-plane-node-3 not created (libvirt)
worker-node-1 not created (libvirt)
Congratulations, you have a highly-available Talos cluster running!
Cleanup
You can destroy the vagrant environment by running:
vagrant destroy -f
And remove the ISO image you downloaded:
sudo rm -f /tmp/metal-amd64.iso
2.1.2.6 - VMware
Creating Talos Kubernetes cluster using VMware.
Creating a Cluster via the govc CLI
In this guide we will create an HA Kubernetes cluster with 2 worker nodes.
We will use the govc cli which can be downloaded here.
Prereqs/Assumptions
This guide will use the virtual IP (“VIP”) functionality that is built into Talos in order to provide a stable, known IP for the Kubernetes control plane.
This simply means the user should pick an IP on their “VM Network” to designate for this purpose and keep it handy for future steps.
Create the Machine Configuration Files
Generating Base Configurations
Using the VIP chosen in the prereq steps, we will now generate the base configuration files for the Talos machines.
This can be done with the talosctl gen config ... command.
Take note that we will also use a JSON6902 patch when creating the configs so that the control plane nodes get some special information about the VIP we chose earlier, as well as a daemonset to install vmware tools on talos nodes.
First, download cp.patch.yaml to your local machine and edit the VIP to match your chosen IP.
You can do this by issuing: curl -fsSLO https://raw.githubusercontent.com/siderolabs/talos/master/website/content/v1.9/talos-guides/install/virtualized-platforms/vmware/cp.patch.yaml.
It’s contents should look like the following:
With the patch in hand, generate machine configs with:
$ talosctl gen config vmware-test https://<VIP>:<port> --config-patch-control-plane @cp.patch.yaml
created controlplane.yaml
created worker.yaml
created talosconfig
At this point, you can modify the generated configs to your liking if needed.
Optionally, you can specify additional patches by adding to the cp.patch.yaml file downloaded earlier, or create your own patch files.
Validate the Configuration Files
$ talosctl validate --config controlplane.yaml --mode cloud
controlplane.yaml is valid for cloud mode
$ talosctl validate --config worker.yaml --mode cloud
worker.yaml is valid for cloud mode
Set Environment Variables
govc makes use of the following environment variables
As part of this guide, we have a more automated install script that handles some of the complexity of importing OVAs and creating VMs.
If you wish to use this script, we will detail that next.
If you wish to carry out the manual approach, simply skip ahead to the “Manual Approach” section.
Scripted Install
Download the vmware.sh script to your local machine.
You can do this by issuing curl -fsSL "https://raw.githubusercontent.com/siderolabs/talos/master/website/content/v1.9/talos-guides/install/virtualized-platforms/vmware/vmware.sh" | sed s/latest/v1.9.0/ > vmware.sh.
This script has default variables for things like Talos version and cluster name that may be interesting to tweak before deploying.
The script downloads VMWare OVA with talos-vmtoolsd from Image Factory extension pre-installed.
Import OVA
To create a content library and import the Talos OVA corresponding to the mentioned Talos version, simply issue:
./vmware.sh upload_ova
Create Cluster
With the OVA uploaded to the content library, you can create a 5 node (by default) cluster with 3 control plane and 2 worker nodes:
./vmware.sh create
This step will create a VM from the OVA, edit the settings based on the env variables used for VM size/specs, then power on the VMs.
You may now skip past the “Manual Approach” section down to “Bootstrap Cluster”.
Manual Approach
Import the OVA into vCenter
A talos.ova asset is available from Image Factory.
We will refer to the version of the release as $TALOS_VERSION below.
It can be easily exported with export TALOS_VERSION="v0.3.0-alpha.10" or similar.
The download link already includes the talos-vmtoolsd extension.
Talos makes use of the guestinfo facility of VMware to provide the machine/cluster configuration.
This can be set using the govc vm.change command.
To facilitate persistent storage using the vSphere cloud provider integration with Kubernetes, disk.enableUUID=1 is used.
In the vSphere UI, open a console to one of the control plane nodes.
You should see some output stating that etcd should be bootstrapped.
This text should look like:
"etcd is waiting to join the cluster, if this node is the first node in the cluster, please run `talosctl bootstrap` against one of the following IPs:
The talos-vmtoolsd application was deployed as a daemonset as part of the cluster creation; however, we must now provide a talos credentials file for it to use.
Once configured, you should now see these daemonset pods go into “Running” state and in vCenter, you will now see IPs and info from the Talos nodes present in the UI.
2.1.2.7 - Xen
Talos is known to work on Xen.
We don’t yet have a documented guide specific to Xen; however, you can follow the General Getting Started Guide.
If you run into any issues, our community can probably help!
2.1.3 - Cloud Platforms
Installation of Talos Linux on many cloud platforms.
2.1.3.1 - Akamai
Creating a cluster via the CLI on Akamai Cloud (Linode).
Creating a Talos Linux Cluster on Akamai Connected Cloud via the CLI
This guide will demonstrate how to create a highly available Kubernetes cluster with one worker using the Akamai Connected Cloud provider.
Akamai Connected Cloud has a very well-documented REST API, and an open-source CLI tool to interact with the API which will be used in this guide.
Make sure to follow installation and authentication instructions for the linode-cli tool.
Using the IP address (or DNS name, if you have created one) of the load balancer, generate the base configuration files for the Talos machines.
Also note that the load balancer forwards port 443 to port 6443 on the associated nodes, so we should use 443 as the port in the config definition:
exportNODEBALANCER_IP=$(linode-cli nodebalancers list --label talos --format ipv4 --text --no-headers)talosctl gen config talos-kubernetes-akamai https://${NODEBALANCER_IP} --with-examples=false
Create the Linodes
Create the Control Plane Nodes
Although root passwords are not used by Talos, Linode requires that a root password be associated with a linode during creation.
Run the following commands to create three control plane nodes:
exportIMAGE_ID=$(linode-cli images list --label talos --format id --text --no-headers)exportNODEBALANCER_ID=$(linode-cli nodebalancers list --label talos --format id --text --no-headers)exportNODEBALANCER_CONFIG_ID=$(linode-cli nodebalancers configs-list ${NODEBALANCER_ID} --format id --text --no-headers)exportREGION=us-ord
exportLINODE_TYPE=g6-standard-4
exportROOT_PW=$(pwgen 16)for id in $(seq 3); dolinode_label="talos-control-plane-${id}"# create linode linode-cli linodes create \
--no-defaults \
--root_pass ${ROOT_PW}\
--type ${LINODE_TYPE}\
--region ${REGION}\
--image ${IMAGE_ID}\
--label ${linode_label}\
--private_ip true\
--tags talos-control-plane \
--group "talos-control-plane"\
--metadata.user_data "$(base64 -i ./controlplane.yaml)"# change kernel to "direct disk"linode_id=$(linode-cli linodes list --label ${linode_label} --format id --text --no-headers)confiig_id=$(linode-cli linodes configs-list ${linode_id} --format id --text --no-headers) linode-cli linodes config-update ${linode_id}${confiig_id} --kernel "linode/direct-disk"# add machine to nodebalancerprivate_ip=$(linode-cli linodes list --label ${linode_label} --format ipv4 --json | jq -r ".[0].ipv4[1]") linode-cli nodebalancers node-create ${NODEBALANCER_ID}${NODEBALANCER_CONFIG_ID} --label ${linode_label} --address ${private_ip}:6443
done
Create the Worker Nodes
Although root passwords are not used by Talos, Linode requires that a root password be associated with a linode during creation.
At this point, we can retrieve the admin kubeconfig by running:
talosctl --talosconfig talosconfig kubeconfig .
We can also watch the cluster bootstrap via:
talosctl --talosconfig talosconfig health
Alternatively, we can also watch the node overview, logs and real-time metrics dashboard via:
talosctl --talosconfig talosconfig dashboard
2.1.3.2 - AWS
Creating a cluster via the AWS CLI.
Creating a Cluster via the AWS CLI
In this guide we will create an HA Kubernetes cluster with 3 control plane nodes across 3 availability zones.
You should have an existing AWS account and have the AWS CLI installed and configured.
If you need more information on AWS specifics, please see the official AWS documentation.
To install the dependencies for this tutorial you can use homebrew on macOS or Linux:
If you would like to create infrastructure via terraform or opentofu please see the example in the contrib repository.
Note: this guide is not a production set up and steps were tested in bash and zsh shells.
Create AWS Resources
We will be creating a control plane with 3 Ec2 instances spread across 3 availability zones.
It is recommended to not use the default VPC so we will create a new one for this tutorial.
Change to your desired region and CIDR block and create a VPC:
AWS_REGION="us-west-2"IPV4_CIDR="10.1.0.0/18"VPC_ID=$(aws ec2 create-vpc \
--cidr-block $IPV4_CIDR\
--output text --query 'Vpc.VpcId')
Create the Subnets
Create 3 smaller CIDRs to use for each subnet in different availability zones.
Make sure to adjust these CIDRs if you changed the default value from the last command.
In this guide we will create an HA Kubernetes cluster with 1 worker node.
We assume existing Blob Storage, and some familiarity with Azure.
If you need more information on Azure specifics, please see the official Azure documentation.
Environment Setup
We’ll make use of the following environment variables throughout the setup.
Edit the variables below with your correct information.
# Storage account to useexportSTORAGE_ACCOUNT="StorageAccountName"# Storage container to upload toexportSTORAGE_CONTAINER="StorageContainerName"# Resource group nameexportGROUP="ResourceGroupName"# LocationexportLOCATION="centralus"# Get storage account connection string based on info aboveexportCONNECTION=$(az storage account show-connection-string \
-n $STORAGE_ACCOUNT\
-g $GROUP\
-o tsv)
Choose an Image
There are two methods of deployment in this tutorial.
If you would like to use the official Talos image uploaded to Azure Community Galleries by SideroLabs, you may skip ahead to setting up your network infrastructure.
Now that the image is present in our blob storage, we’ll register it.
az image create \
--name talos \
--source https://$STORAGE_ACCOUNT.blob.core.windows.net/$STORAGE_CONTAINER/talos-azure.vhd \
--os-type linux \
-g $GROUP
Network Infrastructure
Virtual Networks and Security Groups
Once the image is prepared, we’ll want to work through setting up the network.
Issue the following to create a network security group and add rules to it.
In Azure, we have to pre-create the NICs for our control plane so that they can be associated with our load balancer.
for i in $( seq 012); do# Create public IP for each nic az network public-ip create \
--resource-group $GROUP\
--name talos-controlplane-public-ip-$i\
--allocation-method static
# Create nic az network nic create \
--resource-group $GROUP\
--name talos-controlplane-nic-$i\
--vnet-name talos-vnet \
--subnet talos-subnet \
--network-security-group talos-sg \
--public-ip-address talos-controlplane-public-ip-$i\
--lb-name talos-lb \
--lb-address-pools talos-be-pool
done# NOTES:# Talos can detect PublicIPs automatically if PublicIP SKU is Basic.# Use `--sku Basic` to set SKU to Basic.
Cluster Configuration
With our networking bits setup, we’ll fetch the IP for our load balancer and create our configuration files.
LB_PUBLIC_IP=$(az network public-ip show \
--resource-group $GROUP\
--name talos-public-ip \
--query "ipAddress"\
--output tsv)talosctl gen config talos-k8s-azure-tutorial https://${LB_PUBLIC_IP}:6443
Compute Creation
We are now ready to create our azure nodes.
Azure allows you to pass Talos machine configuration to the virtual machine at bootstrap time via
user-data or custom-data methods.
Talos supports only custom-data method, machine configuration is available to the VM only on the first boot.
Use the steps below depending on whether you have manually uploaded a Talos image or if you are using the Community Gallery image.
# Create availability setaz vm availability-set create \
--name talos-controlplane-av-set \
-g $GROUP# Create the controlplane nodesfor i in $( seq 012); do az vm create \
--name talos-controlplane-$i\
--image talos \
--custom-data ./controlplane.yaml \
-g $GROUP\
--admin-username talos \
--generate-ssh-keys \
--verbose \
--boot-diagnostics-storage $STORAGE_ACCOUNT\
--os-disk-size-gb 20\
--nics talos-controlplane-nic-$i\
--availability-set talos-controlplane-av-set \
--no-wait
done# Create worker node az vm create \
--name talos-worker-0 \
--image talos \
--vnet-name talos-vnet \
--subnet talos-subnet \
--custom-data ./worker.yaml \
-g $GROUP\
--admin-username talos \
--generate-ssh-keys \
--verbose \
--boot-diagnostics-storage $STORAGE_ACCOUNT\
--nsg talos-sg \
--os-disk-size-gb 20\
--no-wait
# NOTES:# `--admin-username` and `--generate-ssh-keys` are required by the az cli,# but are not actually used by talos# `--os-disk-size-gb` is the backing disk for Kubernetes and any workload containers# `--boot-diagnostics-storage` is to enable console output which may be necessary# for troubleshooting
Azure Community Gallery Image
Talos is updated in Azure’s Community Galleries (Preview) on every release.
To use the Talos image for the current release create the following environment variables.
Edit the variables below if you would like to use a different architecture or version.
# The architecture you would like to use. Options are "talos-x64" or "talos-arm64"ARCHITECTURE="talos-x64"# This will use the latest version of Talos. The version must be "latest" or in the format Major(int).Minor(int).Patch(int), e.g. 1.5.0VERSION="latest"
Create the Virtual Machines.
# Create availability setaz vm availability-set create \
--name talos-controlplane-av-set \
-g $GROUP# Create the controlplane nodesfor i in $( seq 012); do az vm create \
--name talos-controlplane-$i\
--image /CommunityGalleries/siderolabs-c4d707c0-343e-42de-b597-276e4f7a5b0b/Images/${ARCHITECTURE}/Versions/${VERSION}\
--custom-data ./controlplane.yaml \
-g $GROUP\
--admin-username talos \
--generate-ssh-keys \
--verbose \
--boot-diagnostics-storage $STORAGE_ACCOUNT\
--os-disk-size-gb 20\
--nics talos-controlplane-nic-$i\
--availability-set talos-controlplane-av-set \
--no-wait
done# Create worker node az vm create \
--name talos-worker-0 \
--image /CommunityGalleries/siderolabs-c4d707c0-343e-42de-b597-276e4f7a5b0b/Images/${ARCHITECTURE}/Versions/${VERSION}\
--vnet-name talos-vnet \
--subnet talos-subnet \
--custom-data ./worker.yaml \
-g $GROUP\
--admin-username talos \
--generate-ssh-keys \
--verbose \
--boot-diagnostics-storage $STORAGE_ACCOUNT\
--nsg talos-sg \
--os-disk-size-gb 20\
--no-wait
# NOTES:# `--admin-username` and `--generate-ssh-keys` are required by the az cli,# but are not actually used by talos# `--os-disk-size-gb` is the backing disk for Kubernetes and any workload containers# `--boot-diagnostics-storage` is to enable console output which may be necessary# for troubleshooting
Bootstrap Etcd
You should now be able to interact with your cluster with talosctl.
We will need to discover the public IP for our first control plane node first.
Download the Talos CloudStack image cloudstack-amd64.raw.gz from the Image Factory.
Note: the minimum version of Talos required to support Apache CloudStack is v1.8.0.
Using an upload method of your choice, upload the image to a Apache CloudStack.
You might be able to use the “Register Template from URL” to download the image directly from the Image Factory.
Note: CloudStack does not seem to like compressed images, so you might have to download the image to a local webserver, uncompress it and let CloudStack fetch the image from there instead.
Alternatively, you can try to remove .gz from URL to fetch an uncompressed image from the Image Factory.
Get Required Variables
Next we will get a number of required variables and export them for later use:
Finally it’s time to generate the Talos configuration files, using the Public IP address assigned to the loadbalancer.
$ talosctl gen config talos-cloudstack https://${PUBLIC_IPADDRESS}:6443 --with-docs=false --with-examples=falsecreated controlplane.yaml
created worker.yaml
created talosconfig
Make any adjustments to the controlplane.yaml and/or worker.yaml as you like.
Note: Remember to validate!
Create Talos VM
Next we will create the actual VM and supply the controlplane.yaml as base64 encoded userdata.
At this point we can retrieve the admin kubeconfig by running:
talosctl --talosconfig talosconfig kubeconfig .
We can also watch the cluster bootstrap via:
talosctl --talosconfig talosconfig dashboard
2.1.3.5 - DigitalOcean
Creating a cluster via the CLI on DigitalOcean.
Creating a Talos Linux Cluster on Digital Ocean via the CLI
In this guide we will create an HA Kubernetes cluster with 1 worker node, in the NYC region.
We assume an existing Space, and some familiarity with DigitalOcean.
If you need more information on DigitalOcean specifics, please see the official DigitalOcean documentation.
Create the Image
Download the DigitalOcean image digital-ocean-amd64.raw.gz from the Image Factory.
Note: the minimum version of Talos required to support Digital Ocean is v1.3.3.
Using an upload method of your choice (doctl does not have Spaces support), upload the image to a space.
(It’s easy to drag the image file to the space using DigitalOcean’s web console.)
Note: Make sure you upload the file as public.
Now, create an image using the URL of the uploaded image:
We will need the IP of the load balancer.
Using the ID of the load balancer, run:
doctl compute load-balancer get --format IP <load balancer ID>
Note that it may take a few minutes before the load balancer is provisioned, so repeat this command until it returns with the IP address.
Create the Machine Configuration Files
Using the IP address (or DNS name, if you have created one) of the loadbalancer, generate the base configuration files for the Talos machines.
Also note that the load balancer forwards port 443 to port 6443 on the associated nodes, so we should use 443 as the port in the config definition:
$ talosctl gen config talos-k8s-digital-ocean-tutorial https://<load balancer IP or DNS>:443
created controlplane.yaml
created worker.yaml
created talosconfig
Create the Droplets
Create a dummy SSH key
Although SSH is not used by Talos, DigitalOcean requires that an SSH key be associated with a droplet during creation.
We will create a dummy key that can be used to satisfy this requirement.
At this point we can retrieve the admin kubeconfig by running:
talosctl --talosconfig talosconfig kubeconfig .
We can also watch the cluster bootstrap via:
talosctl --talosconfig talosconfig health
2.1.3.6 - Exoscale
Creating a cluster via the CLI using exoscale.com
Talos is known to work on exoscale.com; however, it is currently undocumented.
2.1.3.7 - GCP
Creating a cluster via the CLI on Google Cloud Platform.
Creating a Cluster via the CLI
In this guide, we will create an HA Kubernetes cluster in GCP with 1 worker node.
We will assume an existing Cloud Storage bucket, and some familiarity with Google Cloud.
If you need more information on Google Cloud specifics, please see the official Google documentation.
Once the image is prepared, we’ll want to work through setting up the network.
Issue the following to create a firewall, load balancer, and their required components.
Using GCP deployment manager automatically creates a Google Storage bucket and uploads the Talos image to it.
Once the deployment is complete the generated talosconfig and kubeconfig files are uploaded to the bucket.
By default this setup creates a three node control plane and a single worker in us-west1-b
First we need to create a folder to store our deployment manifests and perform all subsequent operations from that folder.
mkdir -p talos-gcp-deployment
cd talos-gcp-deployment
Getting the deployment manifests
We need to download two deployment manifests for the deployment from the Talos github repository.
curl -fsSLO "https://raw.githubusercontent.com/siderolabs/talos/master/website/content/v1.9/talos-guides/install/cloud-platforms/gcp/config.yaml"curl -fsSLO "https://raw.githubusercontent.com/siderolabs/talos/master/website/content/v1.9/talos-guides/install/cloud-platforms/gcp/talos-ha.jinja"# if using ccmcurl -fsSLO "https://raw.githubusercontent.com/siderolabs/talos/master/website/content/v1.9/talos-guides/install/cloud-platforms/gcp/gcp-ccm.yaml"
Updating the config
Now we need to update the local config.yaml file with any required changes such as changing the default zone, Talos version, machine sizes, nodes count etc.
Note: The externalCloudProvider property is set to false by default.
The manifest used for deploying the ccm (cloud controller manager) is currently using the GCP ccm provided by openshift since there are no public images for the ccm yet.
Since the routes controller is disabled while deploying the CCM, the CNI pods needs to be restarted after the CCM deployment is complete to remove the node.kubernetes.io/network-unavailable taint.
See Nodes network-unavailable taint not removed after installing ccm for more information
Use a custom built image for the ccm deployment if required.
Creating the deployment
Now we are ready to create the deployment.
Confirm with y for any prompts.
Run the following command to create the deployment:
# use a unique name for the deployment, resources are prefixed with the deployment nameexportDEPLOYMENT_NAME="<deployment name>"gcloud deployment-manager deployments create "${DEPLOYMENT_NAME}" --config config.yaml
Retrieving the outputs
First we need to get the deployment outputs.
# first get the outputsOUTPUTS=$(gcloud deployment-manager deployments describe "${DEPLOYMENT_NAME}" --format json | jq '.outputs[]')BUCKET_NAME=$(jq -r '. | select(.name == "bucketName").finalValue'<<<"${OUTPUTS}")# used when cloud controller is enabledSERVICE_ACCOUNT=$(jq -r '. | select(.name == "serviceAccount").finalValue'<<<"${OUTPUTS}")PROJECT=$(jq -r '. | select(.name == "project").finalValue'<<<"${OUTPUTS}")
Note: If cloud controller manager is enabled, the below command needs to be run to allow the controller custom role to access cloud resources
In addition to the talosconfig and kubeconfig files, the storage bucket contains the controlplane.yaml and worker.yaml files used to join additional nodes to the cluster.
kubectl \
--kubeconfig kubeconfig \
--namespace kube-system \
apply \
--filename gcp-ccm.yaml
# wait for the ccm to be upkubectl \
--kubeconfig kubeconfig \
--namespace kube-system \
rollout status \
daemonset cloud-controller-manager
If the cloud controller manager is enabled, we need to restart the CNI pods to remove the node.kubernetes.io/network-unavailable taint.
# restart the CNI pods, in this case flannelkubectl \
--kubeconfig kubeconfig \
--namespace kube-system \
rollout restart \
daemonset kube-flannel
# wait for the pods to be restartedkubectl \
--kubeconfig kubeconfig \
--namespace kube-system \
rollout status \
daemonset kube-flannel
Check cluster status
kubectl \
--kubeconfig kubeconfig \
get nodes
Cleanup deployment
Warning: This will delete the deployment and all resources associated with it.
# delete the objects in the bucket firstgsutil -m rm -r "gs://${BUCKET_NAME}"gcloud deployment-manager deployments delete "${DEPLOYMENT_NAME}" --quiet
2.1.3.8 - Hetzner
Creating a cluster via the CLI (hcloud) on Hetzner.
Upload image
Hetzner Cloud does not support uploading custom images.
You can email their support to get a Talos ISO uploaded by following issues:3599 or you can prepare image snapshot by yourself.
There are three options to upload your own.
Run an instance in rescue mode and replace the system OS with the Talos image
Create a new Server in the Hetzner console.
Enable the Hetzner Rescue System for this server and reboot.
Upon a reboot, the server will boot a special minimal Linux distribution designed for repair and reinstall.
Once running, login to the server using ssh to prepare the system disk by doing the following:
# Check that you in Rescue modedf
### Result is like:# udev 987432 0 987432 0% /dev# 213.133.99.101:/nfs 308577696 247015616 45817536 85% /root/.oldroot/nfs# overlay 995672 8340 987332 1% /# tmpfs 995672 0 995672 0% /dev/shm# tmpfs 398272 572 397700 1% /run# tmpfs 5120 0 5120 0% /run/lock# tmpfs 199132 0 199132 0% /run/user/0# Download the Talos imagecd /tmp
wget -O /tmp/talos.raw.xz https://factory.talos.dev/image/376567988ad370138ad8b2698212367b8edcb69b5fd68c80be1f2ec7d603b4ba/v1.9.0/hcloud-amd64.raw.xz
# Replace systemxz -d -c /tmp/talos.raw.xz | dd of=/dev/sda && sync
# shutdown the instanceshutdown -h now
To make sure disk content is consistent, it is recommended to shut the server down before taking an image (snapshot).
Once shutdown, simply create an image (snapshot) from the console.
You can now use this snapshot to run Talos on the cloud.
Create a new image by issuing the commands shown below.
Note that to create a new API token for your Project, switch into the Hetzner Cloud Console choose a Project, go to Access → Security, and create a new token.
# First you need set API TokenexportHCLOUD_TOKEN=${TOKEN}# Upload imagepacker init .
packer build .
# Save the image IDexportIMAGE_ID=<image-id-in-packer-output>
After doing this, you can find the snapshot in the console interface.
hcloud-upload-image
Install process described here (you can download binary or build from source, it is also possible to use Docker).
For process simplification you can use this bash script:
#!/usr/bin/env bash
exportTALOS_IMAGE_VERSION=v1.9.0 # You can change to the current versionexportTALOS_IMAGE_ARCH=amd64 # You can change to arm architectureexportHCLOUD_SERVER_ARCH=x86 # HCloud server architecture can be x86 or armwget https://factory.talos.dev/image/376567988ad370138ad8b2698212367b8edcb69b5fd68c80be1f2ec7d603b4ba/${TALOS_IMAGE_VERSION}/hcloud-${TALOS_IMAGE_ARCH}.raw.xz
hcloud-upload-image upload \
--image-path *.xz \
--architecture $HCLOUD_SERVER_ARCH\
--compression xz
After these actions, you can find the snapshot in the console interface.
Using the IP/DNS name of the loadbalancer created earlier, generate the base configuration files for the Talos machines by issuing:
$ talosctl gen config talos-k8s-hcloud-tutorial https://<load balancer IP or DNS>:6443 \
--with-examples=false --with-docs=falsecreated controlplane.yaml
created worker.yaml
created talosconfig
Generating the config without examples and docs is necessary because otherwise you can easily exceed the 32 kb limit on uploadable userdata (see issue 8805).
At this point, you can modify the generated configs to your liking.
Optionally, you can specify --config-patch with RFC6902 jsonpatches which will be applied during the config generation.
Validate the Configuration Files
Validate any edited machine configs with:
$ talosctl validate --config controlplane.yaml --mode cloud
controlplane.yaml is valid for cloud mode
$ talosctl validate --config worker.yaml --mode cloud
worker.yaml is valid for cloud mode
Create the Servers
We can now create our servers.
Note that you can find IMAGE_ID in the snapshot section of the console: https://console.hetzner.cloud/projects/$PROJECT_ID/servers/snapshots.
talosctl --talosconfig talosconfig patch machineconfig --patch-file patch.yaml --nodes <comma separated list of all your nodes' IP addresses>
With that in place, we can now follow the official instructions, ignoring the kubeadm related steps.
2.1.3.9 - Kubernetes
Running Talos Linux as a pod in Kubernetes.
Talos Linux can be run as a pod in Kubernetes similar to running Talos in Docker.
This can be used e.g. to run controlplane nodes inside an existing Kubernetes cluster.
Talos Linux running in Kubernetes is not full Talos Linux experience, as it is running in a container using the host’s kernel and network stack.
Some operations like upgrades and reboots are not supported.
Prerequisites
a running Kubernetes cluster
a talos container image: ghcr.io/siderolabs/talos:v1.9.0
Machine Configuration
Machine configuration can be generated using Getting Started guide.
Machine install disk will ge ignored, as the install image.
The Talos version will be driven by the container image being used.
The required machine configuration patch to enable using container runtime DNS:
Initial machine configuration can be submitted using talosctl apply-config --insecure when the pod is running, or it can be submitted
via an environment variable USERDATA with base64-encoded machine configuration.
Volume Mounts
Three ephemeral mounts are required for /run, /system, and /tmp directories:
volumeMounts:
- mountPath: /run
name: run
- mountPath: /system
name: system
- mountPath: /tmp
name: tmp
volumes:
- emptyDir: {}
name: run
- emptyDir: {}
name: system
- emptyDir: {}
name: tmp
Several other mountpoints are required, and they should persist across pod restarts, so one should use PersistentVolume for them:
Where serial holds the base64-encoded string version of ds=nocloud-net;s=http://10.10.0.1/configs/.
The serial can also be set from a root shell on the Proxmox server:
# qm set $VM --smbios1 "uuid=5b0f7dcf-cfe3-4bf3-87a2-1cad29bd51f9,serial=$(printf '%s' 'ds=nocloud-net;s=http://10.10.0.1/configs/' | base64),base64=1"
update VM 105: -smbios1 uuid=5b0f7dcf-cfe3-4bf3-87a2-1cad29bd51f9,serial=ZHM9bm9jbG91ZC1uZXQ7cz1odHRwOi8vMTAuMTAuMC4xL2NvbmZpZ3Mv,base64=1
Keep in mind that if you set the serial from the command line, you must encode it as base64, and you must include the UUID and any other settings that are already set for the smbios1 option or they will be removed.
CDROM/USB
Talos can also get machine config from local attached storage without any prior network connection being established.
You can provide configs to the server via files on a VFAT or ISO9660 filesystem.
The filesystem volume label must be cidata or CIDATA.
Example: QEMU
Create and prepare Talos machine config:
exportCONTROL_PLANE_IP=192.168.1.10
talosctl gen config talos-nocloud https://$CONTROL_PLANE_IP:6443 --output-dir _out
Proxmox can create cloud-init disk for you.
Edit the cloud-init config information in Proxmox as follows, substitute your own information as necessary:
and then add a cicustom param to the virtual machine’s configuration from a root shell:
# qm set 100 --cicustom user=local:snippets/controlplane-1.yml
update VM 100: -cicustom user=local:snippets/controlplane-1.yml
Note: snippets/controlplane-1.yml is Talos machine config.
It is usually located at /var/lib/vz/snippets/controlplane-1.yml.
This file must be placed to this path manually, as Proxmox does not support snippet uploading via API/GUI.
Click on Regenerate Image button after the above changes are made.
2.1.3.11 - OpenStack
Creating a cluster via the CLI on OpenStack.
Creating a Cluster via the CLI
In this guide, we will create an HA Kubernetes cluster in OpenStack with 1 worker node.
We will assume an existing some familiarity with OpenStack.
If you need more information on OpenStack specifics, please see the official OpenStack documentation.
Environment Setup
You should have an existing openrc file.
This file will provide environment variables necessary to talk to your OpenStack cloud.
See here for instructions on fetching this file.
Create the Image
First, download the OpenStack image from Image Factory.
These images are called openstack-$ARCH.tar.gz.
Untar this file with tar -xvf openstack-$ARCH.tar.gz.
The resulting file will be called disk.raw.
Upload the Image
Once you have the image, you can upload to OpenStack with:
openstack image create --public --disk-format raw --file disk.raw talos
Network Infrastructure
Load Balancer and Network Ports
Once the image is prepared, you will need to work through setting up the network.
Issue the following to create a load balancer, the necessary network ports for each control plane node, and associations between the two.
Creating loadbalancer:
# Create load balancer, updating vip-subnet-id if necessaryopenstack loadbalancer create --name talos-control-plane --vip-subnet-id public
# Create listeneropenstack loadbalancer listener create --name talos-control-plane-listener --protocol TCP --protocol-port 6443 talos-control-plane
# Pool and health monitoringopenstack loadbalancer pool create --name talos-control-plane-pool --lb-algorithm ROUND_ROBIN --listener talos-control-plane-listener --protocol TCP
openstack loadbalancer healthmonitor create --delay 5 --max-retries 4 --timeout 10 --type TCP talos-control-plane-pool
Creating ports:
# Create ports for control plane nodes, updating network name if necessaryopenstack port create --network shared talos-control-plane-1
openstack port create --network shared talos-control-plane-2
openstack port create --network shared talos-control-plane-3
# Create floating IPs for the ports, so that you will have talosctl connectivity to each control planeopenstack floating ip create --port talos-control-plane-1 public
openstack floating ip create --port talos-control-plane-2 public
openstack floating ip create --port talos-control-plane-3 public
Note: Take notice of the private and public IPs associated with each of these ports, as they will be used in the next step.
Additionally, take node of the port ID, as it will be used in server creation.
Associate port’s private IPs to loadbalancer:
# Create members for each port IP, updating subnet-id and address as necessary.openstack loadbalancer member create --subnet-id shared-subnet --address <PRIVATE IP OF talos-control-plane-1 PORT> --protocol-port 6443 talos-control-plane-pool
openstack loadbalancer member create --subnet-id shared-subnet --address <PRIVATE IP OF talos-control-plane-2 PORT> --protocol-port 6443 talos-control-plane-pool
openstack loadbalancer member create --subnet-id shared-subnet --address <PRIVATE IP OF talos-control-plane-3 PORT> --protocol-port 6443 talos-control-plane-pool
Security Groups
This example uses the default security group in OpenStack.
Ports have been opened to ensure that connectivity from both inside and outside the group is possible.
You will want to allow, at a minimum, ports 6443 (Kubernetes API server) and 50000 (Talos API) from external sources.
It is also recommended to allow communication over all ports from within the subnet.
Cluster Configuration
With our networking bits setup, we’ll fetch the IP for our load balancer and create our configuration files.
LB_PUBLIC_IP=$(openstack loadbalancer show talos-control-plane -f json | jq -r .vip_address)talosctl gen config talos-k8s-openstack-tutorial https://${LB_PUBLIC_IP}:6443
Additionally, you can specify --config-patch with RFC6902 jsonpatch which will be applied during the config generation.
Compute Creation
We are now ready to create our OpenStack nodes.
Create control plane:
# Create control planes 2 and 3, substituting the same info.for i in $( seq 13); do openstack server create talos-control-plane-$i --flavor m1.small --nic port-id=talos-control-plane-$i --image talos --user-data /path/to/controlplane.yaml
done
Create worker:
# Update network name as necessary.openstack server create talos-worker-1 --flavor m1.small --network shared --image talos --user-data /path/to/worker.yaml
Note: This step can be repeated to add more workers.
Bootstrap Etcd
You should now be able to interact with your cluster with talosctl.
We will use one of the floating IPs we allocated earlier.
It does not matter which one.
Using the IP/DNS name of the loadbalancer created earlier, generate the base configuration files for the Talos machines by issuing:
$ talosctl gen config talos-k8s-oracle-tutorial https://<load balancer IP or DNS>:6443 --additional-sans <load balancer IP or DNS>
created controlplane.yaml
created worker.yaml
created talosconfig
At this point, you can modify the generated configs to your liking.
Optionally, you can specify --config-patch with RFC6902 jsonpatches which will be applied during the config generation.
Validate the Configuration Files
Validate any edited machine configs with:
$ talosctl validate --config controlplane.yaml --mode cloud
controlplane.yaml is valid for cloud mode
$ talosctl validate --config worker.yaml --mode cloud
worker.yaml is valid for cloud mode
Set the endpoints and nodes for your talosconfig with:
talosctl --talosconfig talosconfig config endpoint <load balancer IP or DNS>
talosctl --talosconfig talosconfig config node <control-plane-1-IP>
Bootstrap etcd on the first control plane node with:
talosctl --talosconfig talosconfig bootstrap
Retrieve the kubeconfig
At this point we can retrieve the admin kubeconfig by running:
talosctl --talosconfig talosconfig kubeconfig .
2.1.3.13 - Scaleway
Creating a cluster via the CLI (scw) on scaleway.com.
Talos is known to work on scaleway.com; however, it is currently undocumented.
2.1.3.14 - UpCloud
Creating a cluster via the CLI (upctl) on UpCloud.com.
In this guide we will create an HA Kubernetes cluster 3 control plane nodes and 1 worker node.
We assume some familiarity with UpCloud.
If you need more information on UpCloud specifics, please see the official UpCloud documentation.
Create the Image
The best way to create an image for UpCloud, is to build one using
Hashicorp packer, with the
upcloud-amd64.raw.xz image available from the Image Factory.
Using the general ISO is also possible, but the UpCloud image has some UpCloud
specific features implemented, such as the fetching of metadata and user data to configure the nodes.
To create the cluster, you need a few things locally installed:
NOTE: Make sure your account allows API connections.
To do so, log into
UpCloud control panel and go to People
-> Account -> Permissions -> Allow API connections checkbox.
It is recommended
to create a separate subaccount for your API access and only set the API permission.
To use the UpCloud CLI, you need to create a config in $HOME/.config/upctl.yaml
To use the UpCloud packer plugin, you need to also export these credentials to your
environment variables, by e.g. putting the following in your .bashrc or .zshrc
Now create a new image by issuing the commands shown below.
packer init .
packer build .
After doing this, you can find the custom image in the console interface under storage.
Creating a Cluster via the CLI
Create an Endpoint
To communicate with the Talos cluster you will need a single endpoint that is used
to access the cluster.
This can either be a loadbalancer that will sit in front of
all your control plane nodes, a DNS name with one or more A or AAAA records pointing
to the control plane nodes, or directly the IP of a control plane node.
Which option is best for you will depend on your needs.
Endpoint selection has been further documented here.
After you decide on which endpoint to use, note down the domain name or IP, as
we will need it in the next step.
Create the Machine Configuration Files
Generating Base Configurations
Using the DNS name of the endpoint created earlier, generate the base
configuration files for the Talos machines:
$ talosctl gen config talos-upcloud-tutorial https://<load balancer IP or DNS>:<port> --install-disk /dev/vda
created controlplane.yaml
created worker.yaml
created talosconfig
At this point, you can modify the generated configs to your liking.
Depending on the Kubernetes version you want to run, you might need to select a different Talos version, as not all versions are compatible.
You can find the support matrix here.
Optionally, you can specify --config-patch with RFC6902 jsonpatch or yamlpatch
which will be applied during the config generation.
Validate the Configuration Files
$ talosctl validate --config controlplane.yaml --mode cloud
controlplane.yaml is valid for cloud mode
$ talosctl validate --config worker.yaml --mode cloud
worker.yaml is valid for cloud mode
Create the Servers
Create the Control Plane Nodes
Run the following to create three total control plane nodes:
for ID in $(seq 3); do upctl server create \
--zone us-nyc1 \
--title talos-us-nyc1-master-$ID\
--hostname talos-us-nyc1-master-$ID\
--plan 2xCPU-4GB \
--os "Talos (v1.9.0)"\
--user-data "$(cat controlplane.yaml)"\
--enable-metada
done
Note: modify the zone and OS depending on your preferences.
The OS should match the template name generated with packer in the previous step.
Note the IP address of the first control plane node, as we will need it later.
To configure talosctl we will need the first control plane node’s IP, as noted earlier.
We only add one node IP, as that is the entry into our cluster against which our commands will be run.
All requests to other nodes are proxied through the endpoint, and therefore not
all nodes need to be manually added to the config.
You don’t want to run your commands against all nodes, as this can destroy your
cluster if you are not careful (further documentation).
At this point we can retrieve the admin kubeconfig by running:
talosctl --talosconfig talosconfig kubeconfig
It will take a few minutes before Kubernetes has been fully bootstrapped, and is accessible.
You can check if the nodes are registered in Talos by running
talosctl --talosconfig talosconfig get members
To check if your nodes are ready, run
kubectl get nodes
2.1.3.15 - Vultr
Creating a cluster via the CLI (vultr-cli) on Vultr.com.
Creating a Cluster using the Vultr CLI
This guide will demonstrate how to create a highly-available Kubernetes cluster with one worker using the Vultr cloud provider.
Vultr have a very well documented REST API, and an open-source CLI tool to interact with the API which will be used in this guide.
Make sure to follow installation and authentication instructions for the vultr-cli tool.
Boot Options
Upload an ISO Image
First step is to make the Talos ISO available to Vultr by uploading the latest release of the ISO to the Vultr ISO server.
vultr-cli iso create --url https://factory.talos.dev/image/376567988ad370138ad8b2698212367b8edcb69b5fd68c80be1f2ec7d603b4ba/v1.9.0vultr-amd64.iso
Make a note of the ID in the output, it will be needed later when creating the instances.met
PXE Booting via Image Factory
Talos Linux can be PXE-booted on Vultr using Image Factory, using the vultr platform: e.g.
https://pxe.factory.talos.dev/pxe/376567988ad370138ad8b2698212367b8edcb69b5fd68c80be1f2ec7d603b4ba/v1.9.0/vultr-amd64 (this URL references the default schematic and amd64 architecture).
Make a note of the ID in the output, it will be needed later when creating the instances.
Create a Load Balancer
A load balancer is needed to serve as the Kubernetes endpoint for the cluster.
Make a note of the ID of the load balancer from the output of the above command, it will be needed after the control plane instances are created.
vultr-cli load-balancer get $LOAD_BALANCER_ID | grep ^IP
Make a note of the IP address, it will be needed later when generating the configuration.
Create the Machine Configuration
Generate Base Configuration
Using the IP address (or DNS name if one was created) of the load balancer created above, generate the machine configuration files for the new cluster.
talosctl gen config talos-kubernetes-vultr https://$LOAD_BALANCER_ADDRESS
Once generated, the machine configuration can be modified as necessary for the new cluster, for instance updating disk installation, or adding SANs for the certificates.
First a control plane needs to be created, with the example below creating 3 instances in a loop.
The instance type (noted by the --plan vc2-2c-4gb argument) in the example is for a minimum-spec control plane node, and should be updated to suit the cluster being created.
for id in $(seq 3); do vultr-cli instance create \
--plan vc2-2c-4gb \
--region $REGION\
--iso $TALOS_ISO_ID\
--host talos-k8s-cp${id}\
--label "Talos Kubernetes Control Plane"\
--tags talos,kubernetes,control-plane
done
Make a note of the instance IDs, as they are needed to attach to the load balancer created earlier.
Now worker nodes can be created and configured in a similar way to the control plane nodes, the difference being mainly in the machine configuration file.
Note that like with the control plane nodes, the instance type (here set by --plan vc2-1-1gb) should be changed for the actual cluster requirements.
for id in $(seq 1); do vultr-cli instance create \
--plan vc2-1c-1gb \
--region $REGION\
--iso $TALOS_ISO_ID\
--host talos-k8s-worker${id}\
--label "Talos Kubernetes Worker"\
--tags talos,kubernetes,worker
done
Once the worker is booted and in maintenance mode, the machine configuration can be applied in the following manner.
Once all the cluster nodes are correctly configured, the cluster can be bootstrapped to become functional.
It is important that the talosctl bootstrap command be executed only once and against only a single control plane node.
Finally, with the cluster fully running, the administrative kubeconfig can be retrieved from the Talos API to be saved locally.
talosctl --talosconfig talosconfig kubeconfig .
Now the kubeconfig can be used by any of the usual Kubernetes tools to interact with the Talos-based Kubernetes cluster as normal.
2.1.4 - Local Platforms
Installation of Talos Linux on local platforms, helpful for testing and developing.
2.1.4.1 - Docker
Creating Talos Kubernetes cluster using Docker.
In this guide we will create a Kubernetes cluster in Docker, using a containerized version of Talos.
Running Talos in Docker is intended to be used in CI pipelines, and local testing when you need a quick and easy cluster.
Furthermore, if you are running Talos in production, it provides an excellent way for developers to develop against the same version of Talos.
Requirements
The follow are requirements for running Talos in Docker:
If you are using Docker Desktop on a macOS computer, and you encounter the error: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running? you may need to manually create the link for the Docker socket:
sudo ln -s "$HOME/.docker/run/docker.sock" /var/run/docker.sock
Caveats
Due to the fact that Talos will be running in a container, certain APIs are not available.
For example upgrade, reset, and similar APIs don’t apply in container mode.
Further, when running on a Mac in docker, due to networking limitations, VIPs are not supported.
Create the Cluster
Creating a local cluster is as simple as:
talosctl cluster create
Once the above finishes successfully, your talosconfig (~/.talos/config) and kubeconfig (~/.kube/config) will be configured to point to the new cluster.
Note: Startup times can take up to a minute or more before the cluster is available.
Finally, we just need to specify which nodes you want to communicate with using talosctl.
Talosctl can operate on one or all the nodes in the cluster – this makes cluster wide commands much easier.
talosctl config nodes 10.5.0.2 10.5.0.3
Talos and Kubernetes API are mapped to a random port on the host machine, the retrieved talosconfig and kubeconfig are configured automatically to point to the new cluster.
Talos API endpoint can be found using talosctl config info:
$ talosctl config info
...
Endpoints: 127.0.0.1:38423
Kubernetes API endpoint is available with talosctl cluster show:
$ talosctl cluster show
...
KUBERNETES ENDPOINT https://127.0.0.1:43083
Using the Cluster
Once the cluster is available, you can make use of talosctl and kubectl to interact with the cluster.
For example, to view current running containers, run talosctl containers for a list of containers in the system namespace, or talosctl containers -k for the k8s.io namespace.
To view the logs of a container, use talosctl logs <container> or talosctl logs -k <container>.
Cleaning Up
To cleanup, run:
talosctl cluster destroy
Multiple Clusters
Multiple Talos Linux cluster can be created on the same host, each cluster will need to have:
The machine configuration submitted to the container should have a host DNS feature enabled with forwardKubeDNSToHost enabled.
It is used to forward DNS requests to the resolver provided by Docker (or other container runtime).
2.1.4.2 - QEMU
Creating Talos Kubernetes cluster using QEMU VMs.
In this guide we will create a Kubernetes cluster using QEMU.
Video Walkthrough
To see a live demo of this writeup, see the video below:
Requirements
Linux
a kernel with
KVM enabled (/dev/kvm must exist)
CONFIG_NET_SCH_NETEM enabled
CONFIG_NET_SCH_INGRESS enabled
at least CAP_SYS_ADMIN and CAP_NET_ADMIN capabilities
QEMU
bridge, static and firewall CNI plugins from the standard CNI plugins, and tc-redirect-tap CNI plugin from the awslabs tc-redirect-tap installed to /opt/cni/bin (installed automatically by talosctl)
iptables
/var/run/netns directory should exist
Installation
How to get QEMU
Install QEMU with your operating system package manager.
For example, on Ubuntu for x86:
Before the first cluster is created, talosctl will download the CNI bundle for the VM provisioning and install it to ~/.talos/cni directory.
Once the above finishes successfully, your talosconfig (~/.talos/config) will be configured to point to the new cluster, and kubeconfig will be
downloaded and merged into default kubectl config location (~/.kube/config).
Once the cluster is available, you can make use of talosctl and kubectl to interact with the cluster.
For example, to view current running containers, run talosctl -n 10.5.0.2 containers for a list of containers in the system namespace, or talosctl -n 10.5.0.2 containers -k for the k8s.io namespace.
To view the logs of a container, use talosctl -n 10.5.0.2 logs <container> or talosctl -n 10.5.0.2 logs -k <container>.
A bridge interface will be created, and assigned the default IP 10.5.0.1.
Each node will be directly accessible on the subnet specified at cluster creation time.
A loadbalancer runs on 10.5.0.1 by default, which handles loadbalancing for the Kubernetes APIs.
You can see a summary of the cluster state by running:
$ talosctl cluster show --provisioner qemu
PROVISIONER qemu
NAME talos-default
NETWORK NAME talos-default
NETWORK CIDR 10.5.0.0/24
NETWORK GATEWAY 10.5.0.1
NETWORK MTU 1500NODES:
NAME TYPE IP CPU RAM DISK
talos-default-controlplane-1 ControlPlane 10.5.0.2 1.00 1.6 GB 4.3 GB
talos-default-controlplane-2 ControlPlane 10.5.0.3 1.00 1.6 GB 4.3 GB
talos-default-controlplane-3 ControlPlane 10.5.0.4 1.00 1.6 GB 4.3 GB
talos-default-worker-1 Worker 10.5.0.5 1.00 1.6 GB 4.3 GB
Note: In that case that the host machine is rebooted before destroying the cluster, you may need to manually remove ~/.talos/clusters/talos-default.
Manual Clean Up
The talosctl cluster destroy command depends heavily on the clusters state directory.
It contains all related information of the cluster.
The PIDs and network associated with the cluster nodes.
If you happened to have deleted the state folder by mistake or you would like to cleanup
the environment, here are the steps how to do it manually:
Remove VM Launchers
Find the process of talosctl qemu-launch:
ps -elf | grep 'talosctl qemu-launch'
To remove the VMs manually, execute:
sudo kill -s SIGTERM <PID>
Example output, where VMs are running with PIDs 157615 and 157617
This is more tricky part as if you have already deleted the state folder.
If you didn’t then it is written in the state.yaml in the
~/.talos/clusters/<cluster-name> directory.
Start by creating a new VM by clicking the “New” button in the VirtualBox UI:
Supply a name for this VM, and specify the Type and Version:
Edit the memory to supply at least 2GB of RAM for the VM:
Proceed through the disk settings, keeping the defaults.
You can increase the disk space if desired.
Once created, select the VM and hit “Settings”:
In the “System” section, supply at least 2 CPUs:
In the “Network” section, switch the network “Attached To” section to “Bridged Adapter”:
Finally, in the “Storage” section, select the optical drive and, on the right, select the ISO by browsing your filesystem:
Repeat this process for a second VM to use as a worker node.
You can also repeat this for additional nodes desired.
Start Control Plane Node
Once the VMs have been created and updated, start the VM that will be the first control plane node.
This VM will boot the ISO image specified earlier and enter “maintenance mode”.
Once the machine has entered maintenance mode, there will be a console log that details the IP address that the node received.
Take note of this IP address, which will be referred to as $CONTROL_PLANE_IP for the rest of this guide.
If you wish to export this IP as a bash variable, simply issue a command like export CONTROL_PLANE_IP=1.2.3.4.
Generate Machine Configurations
With the IP address above, you can now generate the machine configurations to use for installing Talos and Kubernetes.
Issue the following command, updating the output directory, cluster name, and control plane IP as you see fit:
talosctl gen config talos-vbox-cluster https://$CONTROL_PLANE_IP:6443 --output-dir _out
This will create several files in the _out directory: controlplane.yaml, worker.yaml, and talosconfig.
Create Control Plane Node
Using the controlplane.yaml generated above, you can now apply this config using talosctl.
Issue:
You should now see some action in the VirtualBox console for this VM.
Talos will be installed to disk, the VM will reboot, and then Talos will configure the Kubernetes control plane on this VM.
Note: This process can be repeated multiple times to create an HA control plane.
Create Worker Node
Create at least a single worker node using a process similar to the control plane creation above.
Start the worker node VM and wait for it to enter “maintenance mode”.
Take note of the worker node’s IP address, which will be referred to as $WORKER_IP
Note: This process can be repeated multiple times to add additional workers.
Using the Cluster
Once the cluster is available, you can make use of talosctl and kubectl to interact with the cluster.
For example, to view current running containers, run talosctl containers for a list of containers in the system namespace, or talosctl containers -k for the k8s.io namespace.
To view the logs of a container, use talosctl logs <container> or talosctl logs -k <container>.
First, configure talosctl to talk to your control plane node by issuing the following, updating paths and IPs as necessary:
The default schematic id for “vanilla” Banana Pi M64 is 8e11dcb3c2803fbe893ab201fcadf1ef295568410e7ced95c6c8b122a5070ce4.
Refer to the Image Factory documentation for more information.
Insert the SD card to your board, turn it on and wait for the console to show you the instructions for bootstrapping the node.
Following the instructions in the console output to connect to the interactive installer:
talosctl apply-config --insecure --mode=interactive --nodes <node IP or DNS name>
Once the interactive installation is applied, the cluster will form and you can then use kubectl.
Retrieve the kubeconfig
Retrieve the admin kubeconfig by running:
talosctl kubeconfig
Upgrading
For example, to upgrade to the latest version of Talos, you can run:
talosctl -n <node IP or DNS name> upgrade --image=factory.talos.dev/installer/8e11dcb3c2803fbe893ab201fcadf1ef295568410e7ced95c6c8b122a5070ce4:v1.9.0
2.1.5.2 - Friendlyelec Nano PI R4S
Installing Talos on a Nano PI R4S SBC using raw disk image.
The default schematic id for “vanilla” NanoPi R4S is 5f74a09891d5830f0b36158d3d9ea3b1c9cc019848ace08ff63ba255e38c8da4.
Refer to the Image Factory documentation for more information.
Insert the SD card to your board, turn it on and wait for the console to show you the instructions for bootstrapping the node.
Following the instructions in the console output to connect to the interactive installer:
talosctl apply-config --insecure --mode=interactive --nodes <node IP or DNS name>
Once the interactive installation is applied, the cluster will form and you can then use kubectl.
Retrieve the kubeconfig
Retrieve the admin kubeconfig by running:
talosctl kubeconfig
Upgrading
For example, to upgrade to the latest version of Talos, you can run:
talosctl -n <node IP or DNS name> upgrade --image=factory.talos.dev/installer/5f74a09891d5830f0b36158d3d9ea3b1c9cc019848ace08ff63ba255e38c8da4:v1.9.0
2.1.5.3 - Jetson Nano
Installing Talos on Jetson Nano SBC using raw disk image.
We will use the R32.7.2 release for the Jetson Nano.
Most of the instructions is similar to this doc except that we’d be using a upstream version of u-boot with patches from NVIDIA u-boot so that USB boot also works.
Before flashing we need the following:
A USB-A to micro USB cable
A jumper wire to enable recovery mode
A HDMI monitor to view the logs if the USB serial adapter is not available
A USB to Serial adapter with 3.3V TTL (optional)
A 5V DC barrel jack
If you’re planning to use the serial console follow the documentation here
First start by downloading the Jetson Nano L4T release.
Next we will extract the L4T release and replace the u-boot binary with the patched version.
tar xf jetson-210_linux_r32.6.1_aarch64.tbz2
cd Linux_for_Tegra
crane --platform=linux/arm64 export ghcr.io/siderolabs/sbc-jetson:v0.1.0 - | tar xf - --strip-components=4 -C bootloader/t210ref/p3450-0000/ artifacts/arm64/u-boot/jetson_nano/u-boot.bin
Next we will flash the firmware to the Jetson Nano SPI flash.
In order to do that we need to put the Jetson Nano into Force Recovery Mode (FRC).
We will use the instructions from here
Ensure that the Jetson Nano is powered off.
There is no need for the SD card/USB storage/network cable to be connected
Connect the micro USB cable to the micro USB port on the Jetson Nano, don’t plug the other end to the PC yet
Enable Force Recovery Mode (FRC) by placing a jumper across the FRC pins on the Jetson Nano
For board revision A02, these are pins 3 and 4 of header J40
For board revision B01, these are pins 9 and 10 of header J50
Place another jumper across J48 to enable power from the DC jack and connect the Jetson Nano to the DC jack J25
Now connect the other end of the micro USB cable to the PC and remove the jumper wire from the FRC pins
Now the Jetson Nano is in Force Recovery Mode (FRC) and can be confirmed by running the following command
lsusb | grep -i "nvidia"
Now we can move on the flashing the firmware.
sudo ./flash p3448-0000-max-spi external
This will flash the firmware to the Jetson Nano SPI flash and you’ll see a lot of output.
If you’ve connected the serial console you’ll also see the progress there.
Once the flashing is done you can disconnect the USB cable and power off the Jetson Nano.
Download the Image
The default schematic id for “vanilla” Jetson Nano is c7d6f36c6bdfb45fd63178b202a67cff0dd270262269c64886b43f76880ecf1e.
Refer to the Image Factory documentation for more information.
| Replace /dev/mmcblk0 with the name of your SD card/USB storage.
Bootstrapping the Node
Insert the SD card/USB storage to your board, turn it on and wait for the console to show you the instructions for bootstrapping the node.
Following the instructions in the console output to connect to the interactive installer:
talosctl apply-config --insecure --mode=interactive --nodes <node IP or DNS name>
Once the interactive installation is applied, the cluster will form and you can then use kubectl.
Retrieve the kubeconfig
Retrieve the admin kubeconfig by running:
talosctl kubeconfig
Upgrading
For example, to upgrade to the latest version of Talos, you can run:
talosctl -n <node IP or DNS name> upgrade --image=factory.talos.dev/installer/c7d6f36c6bdfb45fd63178b202a67cff0dd270262269c64886b43f76880ecf1e:v1.9.0
2.1.5.4 - Libre Computer Board ALL-H3-CC
Installing Talos on Libre Computer Board ALL-H3-CC SBC using raw disk image.
The default schematic id for “vanilla” Libretech H3 CC H5 is 5689d7795f91ac5bf6ccc85093fad8f8b27f6ea9d96a9ac5a059997bffd8ad5c.
Refer to the Image Factory documentation for more information.
Insert the SD card to your board, turn it on and wait for the console to show you the instructions for bootstrapping the node.
Create a installer-patch.yaml containing reference to the installer image generated from an overlay:
Following the instructions in the console output to connect to the interactive installer:
talosctl apply-config --insecure --mode=interactive --nodes <node IP or DNS name>
Once the interactive installation is applied, the cluster will form and you can then use kubectl.
Retrieve the kubeconfig
Retrieve the admin kubeconfig by running:
talosctl kubeconfig
Upgrading
For example, to upgrade to the latest version of Talos, you can run:
talosctl -n <node IP or DNS name> upgrade --image=factory.talos.dev/installer/5689d7795f91ac5bf6ccc85093fad8f8b27f6ea9d96a9ac5a059997bffd8ad5c:v1.9.0
2.1.5.5 - Orange Pi R1 Plus LTS
Installing Talos on Orange Pi R1 Plus LTS SBC using raw disk image.
The default schematic id for “vanilla” Orange Pi R1 Plus LTS is da388062cd9318efdc7391982a77ebb2a97ed4fbda68f221354c17839a750509.
Refer to the Image Factory documentation for more information.
Insert the SD card to your board, turn it on and wait for the console to show you the instructions for bootstrapping the node.
Following the instructions in the console output to connect to the interactive installer:
talosctl apply-config --insecure --mode=interactive --nodes <node IP or DNS name>
Once the interactive installation is applied, the cluster will form and you can then use kubectl.
Retrieve the kubeconfig
Retrieve the admin kubeconfig by running:
talosctl kubeconfig
Upgrading
For example, to upgrade to the latest version of Talos, you can run:
talosctl -n <node IP or DNS name> upgrade --image=factory.talos.dev/installer/da388062cd9318efdc7391982a77ebb2a97ed4fbda68f221354c17839a750509:v1.9.0
2.1.5.6 - Pine64
Installing Talos on a Pine64 SBC using raw disk image.
The default schematic id for “vanilla” Pine64 is 185431e0f0bf34c983c6f47f4c6d3703aa2f02cd202ca013216fd71ffc34e175.
Refer to the Image Factory documentation for more information.
Insert the SD card to your board, turn it on and wait for the console to show you the instructions for bootstrapping the node.
Following the instructions in the console output to connect to the interactive installer:
talosctl apply-config --insecure --mode=interactive --nodes <node IP or DNS name>
Once the interactive installation is applied, the cluster will form and you can then use kubectl.
Retrieve the kubeconfig
Retrieve the admin kubeconfig by running:
talosctl kubeconfig
Upgrading
For example, to upgrade to the latest version of Talos, you can run:
talosctl -n <node IP or DNS name> upgrade --image=factory.talos.dev/installer/185431e0f0bf34c983c6f47f4c6d3703aa2f02cd202ca013216fd71ffc34e175:v1.9.0
2.1.5.7 - Pine64 Rock64
Installing Talos on Pine64 Rock64 SBC using raw disk image.
The default schematic id for “vanilla” Pine64 Rock64 is 0e162298269125049a51ec0a03c2ef85405a55e1d2ac36a7ef7292358cf3ce5a.
Refer to the Image Factory documentation for more information.
Insert the SD card to your board, turn it on and wait for the console to show you the instructions for bootstrapping the node.
Following the instructions in the console output to connect to the interactive installer:
talosctl apply-config --insecure --mode=interactive --nodes <node IP or DNS name>
Once the interactive installation is applied, the cluster will form and you can then use kubectl.
Retrieve the kubeconfig
Retrieve the admin kubeconfig by running:
talosctl kubeconfig
Upgrading
For example, to upgrade to the latest version of Talos, you can run:
talosctl -n <node IP or DNS name> upgrade --image=factory.talos.dev/installer/0e162298269125049a51ec0a03c2ef85405a55e1d2ac36a7ef7292358cf3ce5a:v1.9.0
2.1.5.8 - Radxa ROCK 4C Plus
Installing Talos on Radxa ROCK 4c Plus SBC using raw disk image.
Prerequisites
You will need
talosctl
an SD card or an eMMC or USB drive or an nVME drive
The default schematic id for “vanilla” Rock 4c Plus is ed7091ab924ef1406dadc4623c90f245868f03d262764ddc2c22c8a19eb37c1c.
Refer to the Image Factory documentation for more information.
Wait for the console to show you the instructions for bootstrapping the node.
Following the instructions in the console output to connect to the interactive installer:
talosctl apply-config --insecure --mode=interactive --nodes <node IP or DNS name>
Once the interactive installation is applied, the cluster will form and you can then use kubectl.
Retrieve the kubeconfig
Retrieve the admin kubeconfig by running:
talosctl kubeconfig
Upgrading
For example, to upgrade to the latest version of Talos, you can run:
talosctl -n <node IP or DNS name> upgrade --image=factory.talos.dev/installer/ed7091ab924ef1406dadc4623c90f245868f03d262764ddc2c22c8a19eb37c1c:v1.9.0
2.1.5.9 - Radxa ROCK PI 4
Installing Talos on Radxa ROCK PI 4a/4b SBC using raw disk image.
Prerequisites
You will need
talosctl
an SD card or an eMMC or USB drive or an nVME drive
The default schematic id for “vanilla” RockPi 4 is 25d2690bb48685de5939edd6dee83a0e09591311e64ad03c550de00f8a521f51.
Refer to the Image Factory documentation for more information.
After these above steps, Talos will boot from the nVME/USB and enter maintenance mode.
Proceed to bootstrapping the node.
Bootstrapping the Node
Wait for the console to show you the instructions for bootstrapping the node.
Following the instructions in the console output to connect to the interactive installer:
talosctl apply-config --insecure --mode=interactive --nodes <node IP or DNS name>
Once the interactive installation is applied, the cluster will form and you can then use kubectl.
Retrieve the kubeconfig
Retrieve the admin kubeconfig by running:
talosctl kubeconfig
Upgrading
For example, to upgrade to the latest version of Talos, you can run:
talosctl -n <node IP or DNS name> upgrade --image=factory.talos.dev/installer/25d2690bb48685de5939edd6dee83a0e09591311e64ad03c550de00f8a521f51:v1.9.0
2.1.5.10 - Radxa ROCK PI 4C
Installing Talos on Radxa ROCK PI 4c SBC using raw disk image.
Prerequisites
You will need
talosctl
an SD card or an eMMC or USB drive or an nVME drive
The default schematic id for “vanilla” RockPi 4c is 08e72e242b71f42c9db5bed80e8255b2e0d442a372bc09055b79537d9e3ce191.
Refer to the Image Factory documentation for more information.
After these above steps, Talos will boot from the nVME/USB and enter maintenance mode.
Proceed to bootstrapping the node.
Bootstrapping the Node
Wait for the console to show you the instructions for bootstrapping the node.
Following the instructions in the console output to connect to the interactive installer:
talosctl apply-config --insecure --mode=interactive --nodes <node IP or DNS name>
Once the interactive installation is applied, the cluster will form and you can then use kubectl.
Retrieve the kubeconfig
Retrieve the admin kubeconfig by running:
talosctl kubeconfig
Upgrading
For example, to upgrade to the latest version of Talos, you can run:
talosctl -n <node IP or DNS name> upgrade --image=factory.talos.dev/installer/08e72e242b71f42c9db5bed80e8255b2e0d442a372bc09055b79537d9e3ce191:v1.9.0
2.1.5.11 - Raspberry Pi Series
Installing Talos on Raspberry Pi SBC’s using raw disk image.
Talos disk image for the Raspberry Pi generic should in theory work for the boards supported by u-bootrpi_arm64_defconfig.
This has only been officialy tested on the Raspberry Pi 4 and community tested on one variant of the Compute Module 4 using Super 6C boards.
If you have tested this on other Raspberry Pi boards, please let us know.
Video Walkthrough
To see a live demo of this writeup, see the video below:
Prerequisites
You will need
talosctl
an SD card
Download the latest talosctl.
curl -sL 'https://www.talos.dev/install' | bash
Updating the EEPROM
Use Raspberry Pi Imager to write an EEPROM update image to a spare SD card.
Select Misc utility images under the Operating System tab.
Remove the SD card from your local machine and insert it into the Raspberry Pi.
Power the Raspberry Pi on, and wait at least 10 seconds.
If successful, the green LED light will blink rapidly (forever), otherwise an error pattern will be displayed.
If an HDMI display is attached to the port closest to the power/USB-C port,
the screen will display green for success or red if a failure occurs.
Power off the Raspberry Pi and remove the SD card from it.
Note: Updating the bootloader only needs to be done once.
Download the Image
The default schematic id for “vanilla” Raspberry Pi generic image is ee21ef4a5ef808a9b7484cc0dda0f25075021691c8c09a276591eedb638ea1f9.Refer to the Image Factory documentation for more information.
Insert the SD card to your board, turn it on and wait for the console to show you the instructions for bootstrapping the node.
Following the instructions in the console output to connect to the interactive installer:
talosctl apply-config --insecure --mode=interactive --nodes <node IP or DNS name>
Once the interactive installation is applied, the cluster will form and you can then use kubectl.
Note: if you have an HDMI display attached and it shows only a rainbow splash,
please use the other HDMI port, the one closest to the power/USB-C port.
Retrieve the kubeconfig
Retrieve the admin kubeconfig by running:
talosctl kubeconfig
Upgrading
For example, to upgrade to the latest version of Talos, you can run:
talosctl -n <node IP or DNS name> upgrade --image=factory.talos.dev/installer/ee21ef4a5ef808a9b7484cc0dda0f25075021691c8c09a276591eedb638ea1f9:v1.9.0
Troubleshooting
The following table can be used to troubleshoot booting issues:
Long Flashes
Short Flashes
Status
0
3
Generic failure to boot
0
4
start*.elf not found
0
7
Kernel image not found
0
8
SDRAM failure
0
9
Insufficient SDRAM
0
10
In HALT state
2
1
Partition not FAT
2
2
Failed to read from partition
2
3
Extended partition not FAT
2
4
File signature/hash mismatch - Pi 4
4
4
Unsupported board type
4
5
Fatal firmware error
4
6
Power failure type A
4
7
Power failure type B
2.1.5.12 - Turing RK1
Installing Talos on Turing RK1 SOM using raw disk image.
Go to https://factory.talos.dev select Single Board Computers, select the version and select Turing RK1 from the options.
Choose your desired extensions and fill in the kernel command line arguments if needed.
Download the sbc overlay image and extract the SPI image:
crane --platform=linux/arm64 export ghcr.io/siderolabs/sbc-rockchip:<releasetag> | tar x --strip-components=4 artifacts/arm64/u-boot/turingrk1/u-boot-rockchip-spi.bin
Flash the eMMC with the Talos raw image (even if Talos was previously installed): (or use the WebUI of the Turing Pi 2)
Image Factory is easier to use, but it only produces images for official Talos Linux releases, official Talos Linux system extensions
and official Talos Overlays.
The imager container can be used to generate images from main branch, with local changes, or with custom system extensions.
Image Factory
Image Factory is a service that generates Talos boot assets on-demand.
Image Factory allows to generate boot assets for the official Talos Linux releases, official Talos Linux system extensions
and official Talos Overlays.
The main concept of the Image Factory is a schematic which defines the customization of the boot asset.
Once the schematic is configured, Image Factory can be used to pull various Talos Linux images, ISOs, installer images, PXE booting bare-metal machines across different architectures,
versions of Talos and platforms.
Sidero Labs maintains a public Image Factory instance at https://factory.talos.dev.
Image Factory provides a simple UI to prepare schematics and retrieve asset links.
Example: Bare-metal with Image Factory
Let’s assume we want to boot Talos on a bare-metal machine with Intel CPU and add a gvisor container runtime to the image.
Also we want to disable predictable network interface names with net.ifnames=0 kernel argument.
First, let’s create the schematic file bare-metal.yaml:
The Image Factory URL contains both schematic ID and Talos version, and both can be changed to generate different boot assets.
Once the bare-metal machine is booted up for the first time, it will require Talos Linux installer image to be installed on the disk.
The installer image will be produced by the Image Factory as well:
Same way upgrade process can be used to transition to a new set of system extensions: generate new schematic with the new set of system extensions, and upgrade the machine to the new schematic ID:
Same way upgrade process can be used to transition to a new set of system extensions: generate new schematic with the new set of system extensions, and upgrade the machine to the new schematic ID:
Talos Linux is installed on AWS from a disk image (AWS AMI), so only a single boot asset is required.
Let’s assume we want to boot Talos on AWS with gvisor container runtime system extension.
Now the aws-amd64.raw.xz file contains the customized Talos AWS disk image which can be uploaded as an AMI to the AWS.
Once the AWS VM is created from the AMI, it can be upgraded to a different Talos version or a different schematic using talosctl upgrade:
# upgrade to a new Talos versiontalosctl upgrade --image factory.talos.dev/installer/d9ff89777e246792e7642abd3220a616afb4e49822382e4213a2e528ab826fe5:<new_version>
# upgrade to a new schematictalosctl upgrade --image factory.talos.dev/installer/<new_schematic_id>:v1.9.0
Imager
A custom disk image, boot asset can be generated by using the Talos Linux imager container: ghcr.io/siderolabs/imager:v1.9.0.
The imager container image can be checked by verifying its signature.
The generation process can be run with a simple docker run command:
secureboot-iso builds a Talos ISO image with SecureBoot (see SecureBoot)
metal builds a generic disk image for bare-metal machines
secureboot-metal builds a generic disk image for bare-metal machines with SecureBoot
secureboot-installer builds an installer container image with SecureBoot (see SecureBoot)
aws, gcp, azure, etc. builds a disk image for a specific Talos platform
The base profile can be customized with the additional flags to the imager:
--arch specifies the architecture of the image to be generated (default: host architecture)
--meta allows to set initial META values
--extra-kernel-arg allows to customize the kernel command line arguments.
Default kernel arg can be removed by prefixing the argument with a -.
For example -console removes all console=<value> arguments, whereas -console=tty0 removes the console=tty0 default argument.
--system-extension-image allows to install a system extension into the image
While Image Factory automatically resolves the extension name into a matching container image for a specific version of Talos, imager requires the full explicit container image reference.
The imager also allows to install custom extensions which are not part of the official Talos Linux system extensions.
To get the official Talos Linux system extension container image reference matching a Talos release, use the following command:
crane export ghcr.io/siderolabs/extensions:v1.9.0 | tar x -O image-digests | grep EXTENSION-NAME
Note: this command is using crane tool, but any other tool which allows
to export the image contents can be used.
For each Talos release, the ghcr.io/siderolabs/extensions:VERSION image contains a pinned reference to each system extension container image.
Overlay Image Reference
While Image Factory automatically resolves the overlay name into a matching container image for a specific version of Talos, imager requires the full explicit container image reference.
The imager also allows to install custom overlays which are not part of the official Talos overlays.
To get the official Talos overlays container image reference matching a Talos release, use the following command:
crane export ghcr.io/siderolabs/overlays:v1.9.0 | tar x -O overlays.yaml
Note: this command is using crane tool, but any other tool which allows
to export the image contents can be used.
For each Talos release, the ghcr.io/siderolabs/overlays:VERSION image contains a pinned reference to each overlay container image.
Pulling from Private Registries
Talos Linux official images are all public, but when pulling a custom image from a private registry, the imager might need authentication to access the images.
The imager container when pulling images supports following methods to authenticate to an external registry:
for ghcr.io registry, GITHUB_TOKEN can be provided as an environment variable;
for other registries, ~/.docker/config.json can be mounted into the container from the host:
another option is to use a DOCKER_CONFIG environment variable, and the path will be $DOCKER_CONFIG/config.json in the container;
the third option is to mount Podman’s auth file at $XDG_RUNTIME_DIR/containers/auth.json.
Example: Bare-metal with Imager
Let’s assume we want to boot Talos on a bare-metal machine with Intel CPU and add a gvisor container runtime to the image.
Also we want to disable predictable network interface names with net.ifnames=0 kernel argument and replace the Talos default console arguments and add a custom console arg.
First, let’s lookup extension images for Intel CPU microcode updates and gvisor container runtime in the extensions repository:
$ crane export ghcr.io/siderolabs/extensions:v1.9.0 | tar x -O image-digests | grep -E 'gvisor|intel-ucode'ghcr.io/siderolabs/gvisor:20231214.0-v1.9.0@sha256:548b2b121611424f6b1b6cfb72a1669421ffaf2f1560911c324a546c7cee655e
ghcr.io/siderolabs/intel-ucode:20231114@sha256:ea564094402b12a51045173c7523f276180d16af9c38755a894cf355d72c249d
Now we can generate the ISO image with the following command:
Now the _out/metal-amd64.iso contains the customized Talos ISO image.
If the machine is going to be booted using PXE, we can instead generate kernel and initramfs images:
docker run --rm -t -v $PWD/_out:/out ghcr.io/siderolabs/imager:v1.9.0 iso --output-kind kernel
docker run --rm -t -v $PWD/_out:/out ghcr.io/siderolabs/imager:v1.9.0 iso --output-kind initramfs --system-extension-image ghcr.io/siderolabs/gvisor:20231214.0-v1.9.0@sha256:548b2b121611424f6b1b6cfb72a1669421ffaf2f1560911c324a546c7cee655e --system-extension-image ghcr.io/siderolabs/intel-ucode:20231114@sha256:ea564094402b12a51045173c7523f276180d16af9c38755a894cf355d72c249d
Now the _out/kernel-amd64 and _out/initramfs-amd64 contain the customized Talos kernel and initramfs images.
Note: the extra kernel args are not used now, as they are set via the PXE boot process, and can’t be embedded into the kernel or initramfs.
As the next step, we should generate a custom installer image which contains all required system extensions (kernel args can’t be specified with the installer image, but they are set in the machine configuration):
Now we can use the customized installer image to install Talos on the bare-metal machine.
When it’s time to upgrade a machine, a new installer image can be generated using the new version of imager, and updating the system extension images to the matching versions.
The custom installer image can now be used to upgrade Talos machine.
Example: Raspberry Pi overlay with Imager
Let’s assume we want to boot Talos on Raspberry Pi with rpi_generic overlay and iscsi-tools system extension.
Now the _out/metal-arm64.raw.xz is the compressed disk image which can be written to a boot media.
As the next step, we should generate a custom installer image which contains all required system extensions (kernel args can’t be specified with the installer image, but they are set in the machine configuration):
Now we can use the customized installer image to install Talos on Raspvberry Pi.
When it’s time to upgrade a machine, a new installer image can be generated using the new version of imager, and updating the system extension and overlay images to the matching versions.
The custom installer image can now be used to upgrade Talos machine.
Example: AWS with Imager
Talos is installed on AWS from a disk image (AWS AMI), so only a single boot asset is required.
Let’s assume we want to boot Talos on AWS with gvisor container runtime system extension.
First, let’s lookup extension images for the gvisor container runtime in the extensions repository:
$ crane export ghcr.io/siderolabs/extensions:v1.9.0 | tar x -O image-digests | grep gvisor
ghcr.io/siderolabs/gvisor:20231214.0-v1.9.0@sha256:548b2b121611424f6b1b6cfb72a1669421ffaf2f1560911c324a546c7cee655e
Next, let’s generate AWS disk image with that system extension:
Now the _out/aws-amd64.raw.xz contains the customized Talos AWS disk image which can be uploaded as an AMI to the AWS.
If the AWS machine is later going to be upgraded to a new version of Talos (or a new set of system extensions), generate a customized installer image following the steps above, and upgrade Talos to that installer image.
Example: Assets with system extensions from image tarballs with Imager
Some advanced features of imager are currently not exposed via command line arguments like --system-extension-image.
To access them nonetheless it is possible to supply imager with a profile.yaml instead.
Let’s use these advanced features to build a bare-metal installer using a system extension from a private registry.
First use crane on a host with access to the private registry to export the extension image into a tarball.
When can then reference the tarball in a suitable profile.yaml for our intended architecture and output.
In this case we want to build an amd64, bare-metal installer.
# profile.yamlarch: amd64
platform: metal
secureboot: falseversion: v1.9.0
input:
kernel:
path: /usr/install/amd64/vmlinuz
initramfs:
path: /usr/install/amd64/initramfs.xz
baseInstaller:
imageRef: ghcr.io/siderolabs/installer:v1.9.0
systemExtensions:
- tarballPath: <your-extension> # notice we use 'tarballPath' instead of 'imageRef'output:
kind: installer
outFormat: raw
To build the asset we pass profile.yaml to imager via stdin
Omni is a project created by the Talos team that has native support for Talos Linux.
Omni allows you to start with bare metal, virtual machines or a cloud provider, and create clusters spanning all of your locations, with a few clicks.
You provide the machines – edge compute, bare metal, VMs, or in your cloud account.
Boot from an Omni Talos Linux image.
Click to allocate to a cluster.
That’s it!
Vanilla Kubernetes, on your machines, under your control.
Elegant UI for management and operations
Security taken care of – ties into your Enterprise ID provider
Highly Available Kubernetes API end point built in
Firewall friendly: manage Edge nodes securely
From single-node clusters to the largest scale
Support for GPUs and most CSIs.
The Omni SaaS is available to run locally, to support air-gapped security and data sovereignty concerns.
Omni handles the lifecycle of Talos Linux machines, provides unified access to the Talos and Kubernetes API tied to the identity provider of your choice,
and provides a UI for cluster management and operations.
Omni automates scaling the clusters up and down, and provides a unified view of the state of your clusters.
The client can be installed and updated via the Homebrew package manager for macOS and Linux.
You will need to install brew and then you can install talosctl from the Sidero Labs tap.
brew install siderolabs/tap/talosctl
This will also keep your version of talosctl up to date with new releases.
This homebrew tap also has formulae for omnictl if you need to install that package.
Note: Your talosctl version should match the version of Talos Linux you are running on a host.
To install a specific version of talosctl with brew you can follow this github issue.
Alternative install
You can automatically install the correct version of talosctl for your operating system and architecture with an installer script.
This script won’t keep your version updated with releases and you will need to re-run the script to download a new version.
curl -sL https://talos.dev/install | sh
This script will work on macOS, Linux, and WSL on Windows.
It supports amd64 and arm64 architecture.
Manual and Windows install
All versions can be manually downloaded from the talos releases page including Linux, macOS, and Windows.
You will need to add the binary to a folder part of your executable $PATH to use it without providing the full path to the executable.
Updating the binary will be a manual process.
2.2 - Configuration
Guides on how to configure Talos Linux machines
2.2.1 - Configuration Patches
In this guide, we’ll patch the generated machine configuration.
Talos generates machine configuration for two types of machines: controlplane and worker machines.
Many configuration options can be adjusted using talosctl gen config but not all of them.
Configuration patching allows modifying machine configuration to fit it for the cluster or a specific machine.
Configuration Patch Formats
Talos supports two configuration patch formats:
strategic merge patches
RFC6902 (JSON patches)
Strategic merge patches are the easiest to use, but JSON patches allow more precise configuration adjustments.
Note: Talos 1.5+ supports multi-document machine configuration.
JSON patches don’t support multi-document machine configuration, while strategic merge patches do.
Strategic Merge patches
Strategic merge patches look like incomplete machine configuration files:
machine:
network:
hostname: worker1
When applied to the machine configuration, the patch gets merged with the respective section of the machine configuration:
In general, machine configuration contents are merged with the contents of the strategic merge patch, with strategic merge patch
values overriding machine configuration values.
There are some special rules:
If the field value is a list, the patch value is appended to the list, with the following exceptions:
values of the fields cluster.network.podSubnets and cluster.network.serviceSubnets are overwritten on merge
network.interfaces section is merged with the value in the machine config if there is a match on interface: or deviceSelector: keys
network.interfaces.vlans section is merged with the value in the machine config if there is a match on the vlanId: key
cluster.apiServer.auditPolicy value is replaced on merge
ExtensionServiceConfig.configFiles section is merged matching on mountPath (replacing content if matches)
for each document in the patch, the document is merged with the respective document in the machine configuration (matching by kind, apiVersion and name for named documents)
if the patch document doesn’t exist in the machine configuration, it is appended to the machine configuration
The strategic merge patch itself might be a multi-document YAML, and each document will be applied as a patch to the base machine configuration.
Keep in mind that you can’t patch the same document multiple times with the same patch.
You can also delete parts from the configuration using $patch: delete syntax similar to the
Kubernetes
strategic merge patch.
This will remove the documents SideroLinkConfig and ExtensionServiceConfig with name foo from the configuration.
RFC6902 (JSON Patches)
JSON patches can be written either in JSON or YAML format.
A proper JSON patch requires an op field that depends on the machine configuration contents: whether the path already exists or not.
For example, the strategic merge patch from the previous section can be written either as:
Several talosctl commands accept config patches as command-line flags.
Config patches might be passed either as an inline value or as a reference to a file with @file.patch syntax:
If multiple config patches are specified, they are applied in the order of appearance.
The format of the patch (JSON patch or strategic merge patch) is detected automatically.
Talos machine configuration can be patched at the moment of generation with talosctl gen config:
Once the server reboots, metrics are now available:
$ curl ${IP}:11234/v1/metrics
# HELP container_blkio_io_service_bytes_recursive_bytes The blkio io service bytes recursive# TYPE container_blkio_io_service_bytes_recursive_bytes gaugecontainer_blkio_io_service_bytes_recursive_bytes{container_id="0677d73196f5f4be1d408aab1c4125cf9e6c458a4bea39e590ac779709ffbe14",device="/dev/dm-0",major="253",minor="0",namespace="k8s.io",op="Async"}0container_blkio_io_service_bytes_recursive_bytes{container_id="0677d73196f5f4be1d408aab1c4125cf9e6c458a4bea39e590ac779709ffbe14",device="/dev/dm-0",major="253",minor="0",namespace="k8s.io",op="Discard"}0...
...
Pause Image
This change is often required for air-gapped environments, as containerd CRI plugin has a reference to the pause image which is used
to create pods, and it can’t be controlled with Kubernetes pod definitions.
Multiple documents can be appended, and multiple CA certificates might be present in each configuration document.
This configuration can be also applied in maintenance mode.
2.2.4 - Disk Encryption
Guide on using system disk encryption
It is possible to enable encryption for system disks at the OS level.
Currently, only STATE and EPHEMERAL partitions can be encrypted.
STATE contains the most sensitive node data: secrets and certs.
The EPHEMERAL partition may contain sensitive workload data.
Data is encrypted using LUKS2, which is provided by the Linux kernel modules and cryptsetup utility.
The operating system will run additional setup steps when encryption is enabled.
If the disk encryption is enabled for the STATE partition, the system will:
Save STATE encryption config as JSON in the META partition.
Before mounting the STATE partition, load encryption configs either from the machine config or from the META partition.
Note that the machine config is always preferred over the META one.
Before mounting the STATE partition, format and encrypt it.
This occurs only if the STATE partition is empty and has no filesystem.
If the disk encryption is enabled for the EPHEMERAL partition, the system will:
Get the encryption config from the machine config.
Before mounting the EPHEMERAL partition, encrypt and format it.
This occurs only if the EPHEMERAL partition is empty and has no filesystem.
Talos Linux supports four encryption methods, which can be combined together for a single partition:
static - encrypt with the static passphrase (weakest protection, for STATE partition encryption it means that the passphrase will be stored in the META partition).
nodeID - encrypt with the key derived from the node UUID (weak, it is designed to protect against data being leaked or recovered from a drive that has been removed from a Talos Linux node).
kms - encrypt using key sealed with network KMS (strong, but requires network access to decrypt the data.)
tpm - encrypt with the key derived from the TPM (strong, when used with SecureBoot).
Note: nodeID encryption is not designed to protect against attacks where physical access to the machine, including the drive, is available.
It uses the hardware characteristics of the machine in order to decrypt the data, so drives that have been removed, or recycled from a cloud environment or attached to a different virtual machine, will maintain their protection and encryption.
Configuration
Disk encryption is disabled by default.
To enable disk encryption you should modify the machine configuration with the following options:
Note: What the LUKS2 docs call “keys” are, in reality, a passphrase.
When this passphrase is added, LUKS2 runs argon2 to create an actual key from that passphrase.
LUKS2 supports up to 32 encryption keys and it is possible to specify all of them in the machine configuration.
Talos always tries to sync the keys list defined in the machine config with the actual keys defined for the LUKS2 partition.
So if you update the keys list, keep at least one key that is not changed to be used for key management.
When you define a key you should specify the key kind and the slot:
Take a note that key order does not play any role on which key slot is used.
Every key must always have a slot defined.
Encryption Key Kinds
Talos supports two kinds of keys:
nodeID which is generated using the node UUID and the partition label (note that if the node UUID is not really random it will fail the entropy check).
static which you define right in the configuration.
kms which is sealed with the network KMS.
tpm which is sealed using the TPM and protected with SecureBoot.
Note: Use static keys only if your STATE partition is encrypted and only for the EPHEMERAL partition.
For the STATE partition it will be stored in the META partition, which is not encrypted.
Key Rotation
In order to completely rotate keys, it is necessary to do talosctl apply-config a couple of times, since there is a need to always maintain a single working key while changing the other keys around it.
That’s it!
After you run the last command, the partition will be wiped and the node will reboot.
During the next boot the system will encrypt the partition.
State Partition
Calling wipe against the STATE partition will make the node lose the config, so the previous flow is not going to work.
The flow should be to first wipe the STATE partition:
talosctl reset --system-labels-to-wipe STATE -n <node ip> --reboot=true
Node will enter into maintenance mode, then run apply-config with --insecure flag:
After installation is complete the node should encrypt the STATE partition.
2.2.5 - Disk Management
Guide on managing disks
Talos Linux version 1.8.0 introduces a new backend for managing system and user disks.
The machine configuration changes required are minimal, and the new backend is fully compatible with the existing machine configuration.
Listing Disks
To obtain a list of all available block devices (disks) on the machine, you can use the following command:
$ talosctl get disks
NODE NAMESPACE TYPE ID VERSION SIZE READ ONLY TRANSPORT ROTATIONAL WWID MODEL SERIAL
172.20.0.5 runtime Disk loop0 175 MB true172.20.0.5 runtime Disk nvme0n1 110 GB false nvme nvme.1b36-6465616462656566-51454d55204e564d65204374726c-00000001 QEMU NVMe Ctrl deadbeef
172.20.0.5 runtime Disk sda 110 GB false virtio true QEMU HARDDISK
172.20.0.5 runtime Disk sdb 110 GB false sata true t10.ATA QEMU HARDDISK QM00013 QEMU HARDDISK
172.20.0.5 runtime Disk sdc 110 GB false sata true t10.ATA QEMU HARDDISK QM00001 QEMU HARDDISK
172.20.0.5 runtime Disk vda 113 GB false virtio true
To obtain detailed information about a specific disk, execute the following command:
Talos Linux monitors all block devices and partitions on the machine.
Details about these devices, including their type, can be found in the DiscoveredVolume resource.
Talos Linux has built-in automatic detection for various filesystem types and GPT partition tables.
Currently, the following filesystem types are supported:
bluestore (Ceph)
ext2, ext3, ext4
iso9660
luks (LUKS encrypted partition)
lvm2
squashfs
swap
talosmeta (Talos Linux META partition)
vfat
xfs
zfs
The discovered volumes can include both Talos-managed volumes and any other volumes present on the machine, such as Ceph volumes.
Volume Management
Talos Linux implements disk management through the concept of volumes.
A volume represents a provisioned, located, mounted, or unmounted entity, such as a disk, partition, or tmpfs filesystem.
The configuration of volumes is defined using the VolumeConfig resource, while the current state of volumes is stored in the VolumeStatus resource.
Configuration
The volume configuration is managed by Talos Linux based on machine configuration.
To see configured volumes, use the following command:
$ talosctl get volumeconfigs
NODE NAMESPACE TYPE ID VERSION
172.20.0.5 runtime VolumeConfig /dev/disk/by-id/ata-QEMU_HARDDISK_QM00001-1 2172.20.0.5 runtime VolumeConfig /dev/disk/by-id/ata-QEMU_HARDDISK_QM00001-2 2172.20.0.5 runtime VolumeConfig /dev/disk/by-id/ata-QEMU_HARDDISK_QM00003-1 2172.20.0.5 runtime VolumeConfig EPHEMERAL 2172.20.0.5 runtime VolumeConfig META 2172.20.0.5 runtime VolumeConfig STATE 4
In the provided output, the volumes EPHEMERAL, META, and STATE are system volumes managed by Talos, while the remaining volumes are based on the machine configuration for machine.disks.
To get details about a specific volume configuration, use the following command:
Each volume goes through different phases during its lifecycle:
waiting: the volume is waiting to be provisioned
missing: all disks have been discovered, but the volume cannot be found
located: the volume is found without prior provisioning
provisioned: the volume has been provisioned (e.g., partitioned, resized if necessary)
prepared: the encrypted volume is open
ready: the volume is formatted and ready to be mounted
closed: the encrypted volume is closed
Machine Configuration
Note: Only EPHEMERAL and IMAGECACHE system volume configuration can be managed through the machine configuration.
Note: The volume configuration in the machine configuration is only applied when the volume has not been provisioned yet.
So applying changes after the initial provisioning will not have any effect.
To configure the EPHEMERAL (/var) volume, add the following document to the machine configuration:
By default, the EPHEMERAL volume is provisioned on the system disk, which is the disk where Talos Linux is installed.
It has a minimum size of 2 GiB and automatically grows to utilize the maximum available space on the disk.
Disk Selector
The diskSelector field is utilized to choose the disk where the volume will be provisioned.
It is a Common Expression Language (CEL) expression that evaluates against the available disks.
The volume will be provisioned on the first disk that matches the expression and has sufficient free space for the volume.
The expression is evaluated in the following context:
system_disk (bool) - indicates if the disk is the system disk
disk (Disks.block.talos.dev) - the disk resource being evaluated
For the disk resource, any field available in the resource specification can be used (use talosctl get disks -o yaml to see the output for your machine):
The disk expression is evaluated against each available disk, and the expression should either return true or false.
If the expression returns true, the disk is selected for provisioning.
Note: In CEL, signed and unsigned integers are not interchangeable.
Disk sizes are represented as unsigned integers, so suffix u should be used in constants to avoid type mismatch, e.g. disk.size > 10u * GiB.
Examples of disk selector expressions:
disk.transport == 'nvme': select the NVMe disks only
disk.serial.startsWith('deadbeef') && !cdrom: select disks with serial number starting with deadbeef and not of CD-ROM type
Minimum and Maximum Size
The minSize and maxSize fields define the minimum and maximum size of the volume, respectively.
Talos Linux will always ensure that the volume is at least minSize in size and will not exceed maxSize.
If maxSize is not set, the volume will grow to utilize the maximum available space on the disk.
If grow is set to true, the volume will automatically grow to utilize the maximum available space on the disk on each boot.
Setting minSize might influence disk selection - if the disk does not have enough free space to satisfy the minimum size requirement, it will not be selected for provisioning.
2.2.6 - Editing Machine Configuration
How to edit and patch Talos machine configuration, with reboot, immediately, or stage update on reboot.
Talos node state is fully defined by machine configuration.
Initial configuration is delivered to the node at bootstrap time, but configuration can be updated while the node is running.
There are three talosctl commands which facilitate machine configuration updates:
talosctl apply-config to apply configuration from the file
talosctl edit machineconfig to launch an editor with existing node configuration, make changes and apply configuration back
talosctl patch machineconfig to apply automated machine configuration via JSON patch
Each of these commands can operate in one of four modes:
apply change in automatic mode (default): reboot if the change can’t be applied without a reboot, otherwise apply the change immediately
apply change with a reboot (--mode=reboot): update configuration, reboot Talos node to apply configuration change
apply change immediately (--mode=no-reboot flag): change is applied immediately without a reboot, fails if the change contains any fields that can not be updated without a reboot
apply change on next reboot (--mode=staged): change is staged to be applied after a reboot, but node is not rebooted
apply change with automatic revert (--mode=try): change is applied immediately (if not possible, returns an error), and reverts it automatically in 1 minute if no configuration update is applied
apply change in the interactive mode (--mode=interactive; only for talosctl apply-config): launches TUI based interactive installer
Note: applying change on next reboot (--mode=staged) doesn’t modify current node configuration, so next call to
talosctl edit machineconfig --mode=staged will not see changes
Additionally, there is also talosctl get machineconfig -o yaml, which retrieves the current node configuration API resource and contains the machine configuration in the .spec field.
It can be used to modify the configuration locally before being applied to the node.
The list of config changes allowed to be applied immediately in Talos v1.9.0:
.debug
.cluster
.machine.time
.machine.ca
.machine.acceptedCAs
.machine.certCANs
.machine.install (configuration is only applied during install/upgrade)
.machine.network
.machine.nodeAnnotations
.machine.nodeLabels
.machine.nodeTaints
.machine.sysfs
.machine.sysctls
.machine.logging
.machine.controlplane
.machine.kubelet
.machine.pods
.machine.kernel
.machine.registries (CRI containerd plugin will not pick up the registry authentication settings without a reboot)
.machine.features.kubernetesTalosAPIAccess
.machine.features.hostDNS
.machine.features.imageCache
.machine.features.kubePrism
.machine.features.nodeAddressSortAlgorithm
talosctl apply-config
This command is traditionally used to submit initial machine configuration generated by talosctl gen config to the node.
It can also be used to apply configuration to running nodes.
The initial YAML for this is typically obtained using talosctl get machineconfig -o yaml | yq eval .spec >machs.yaml.
(We must use yq because for historical reasons, get returns the configuration as a full resource, while apply-config only accepts the raw machine config directly.)
Example:
talosctl -n <IP> apply-config -f config.yaml
Command apply-config can also be invoked as apply machineconfig:
Applying machine configuration immediately (without a reboot):
talosctl -n IP apply machineconfig -f config.yaml --mode=no-reboot
Starting the interactive installer:
talosctl -n IP apply machineconfig --mode=interactive
Note: when a Talos node is running in the maintenance mode it’s necessary to provide --insecure (-i) flag to connect to the API and apply the config.
talosctl edit machineconfig
Command talosctl edit loads current machine configuration from the node and launches configured editor to modify the config.
If config hasn’t been changed in the editor (or if updated config is empty), update is not applied.
Note: Talos uses environment variables TALOS_EDITOR, EDITOR to pick up the editor preference.
If environment variables are missing, vi editor is used by default.
Example:
talosctl -n <IP> edit machineconfig
Configuration can be edited for multiple nodes if multiple IP addresses are specified:
talosctl -n <IP1>,<IP2>,... edit machineconfig
Applying machine configuration change immediately (without a reboot):
Command talosctl patch works similar to talosctl edit command - it loads current machine configuration, but instead of launching configured editor it applies a set of JSON patches to the configuration and writes the result back to the node.
Example, updating kubelet version (in auto mode):
$ talosctl -n <IP> patch machineconfig -p '[{"op": "replace", "path": "/machine/kubelet/image", "value": "ghcr.io/siderolabs/kubelet:v1.32.0"}]'patched mc at the node <IP>
Updating kube-apiserver version in immediate mode (without a reboot):
$ talosctl -n <IP> patch machineconfig --mode=no-reboot -p '[{"op": "replace", "path": "/cluster/apiServer/image", "value": "registry.k8s.io/kube-apiserver:v1.32.0"}]'patched mc at the node <IP>
A patch might be applied to multiple nodes when multiple IPs are specified:
If a Talos node fails to boot because of wrong configuration (for example, control plane endpoint is incorrect), configuration can be updated to fix the issue.
2.2.7 - Image Cache
How to enable and configure Talos image cache feature.
Talos Image Cache feature allows to provide container images to the nodes without the need to pull them from the Internet.
This feature is useful in environments with limited or no Internet access.
Image Cache is local to the machine, and automatically managed by Talos if enabled.
Preparing Image Cache
First, build a list of image references that need to be cached.
The talosctl images default might be used as a starting point, but it should be customized to include additional images (e.g. custom CNI, workload images, etc.)
Note: The cache-create supports a --layer-cache flag to additionally cache the pulled images layers on the filesystem.
This is useful to speed up repeated calls for cache-create with the same images.
The OCI image cache directory might be used directly (./image-cache.oci) or pushed itself to a container registry of your choice (e.g. with crane push).
Example of pushing the OCI image cache directory to a container registry:
The image cache is provided to Talos via the boot assets.
There are two supported boot asset types for the Image Cache: ISO and disk image.
ISO
In case of ISO, the image cache is bundled with a Talos ISO image, it will be available for the initial install and (if configured) copied to the
disk during the installation process.
The ISO image can built with the imager by passing an additional --image-cache flag:
mkdir -p _out/
docker run --rm -t -v $PWD/_out:/secureboot:ro -v $PWD/_out:/out -v $PWD/image-cache.oci:/image-cache.oci:ro -v /dev:/dev --privileged ghcr.io/siderolabs/imager:v1.9.0 iso --image-cache /image-cache.oci
Note: If the image cache was pushed to a container registry, the --image-cache flag should point to the image reference.
SecureBoot ISO is supported as well.
The ISO image can be utilized in the following ways (which allows both booting Talos and using the image cache):
Using a physical or virtual CD/DVD drive.
Copying the ISO image to a USB drive using dd.
Copying the contents of the ISO image to a FAT-formatted USB drive with a volume label that starts with TALOS_, such as TALOS_1 (only for UEFI systems).
Note: Third-party boot loaders, such as Ventoy, are not supported as Talos will not be able to access the image cache.
Disk Image
In case of disk image, the image cache is included in the disk image itself, and on boot it would be used immediately by the Talos.
The disk image can be built with the imager by passing an additional --image-cache flag:
mkdir -p _out/
docker run --rm -t -v $PWD/_out:/secureboot:ro -v $PWD/_out:/out -v $PWD/image-cache.oci:/image-cache.oci:ro -v /dev:/dev --privileged ghcr.io/siderolabs/imager:v1.9.0 metal --image-cache /image-cache.oci
Note: If the image cache was pushed to a container registry, the --image-cache flag should point to the image reference.
For a disk image, the IMAGECACHE partition will use all available space on the disk image (excluding the mandatory boot partitions).
Therefore, you may need to adjust the disk image size using the --image-disk-size flag to ensure the IMAGECACHE partition is large enough to accommodate the image cache contents, for example, --image-disk-size=4GiB.
Upon boot, Talos will expand the disk image to utilize the full disk size.
Configuration
The image cache feature (for security reasons) should be explicitly enabled in the Talos configuration:
machine:
features:
imageCache:
localEnabled: true
Once enabled, Talos Linux will automatically look for the image cache contents either on the disk or in the ISO image.
If the image cache is bundled with the ISO, the disk volume size for the image cache should be configured to copy the image cache to the disk during the installation process:
The default settings for the IMAGECACHE volume are as follows (note that a configuration should still be provided to enable the image cache volume provisioning):
minSize: 500MB
maxSize: 1GB
diskSelector: match: system_disk
In this example, image cache volume is provisioned on the system disk with a fixed size of 2GB.
The size of the volume should be adjusted to fit the image cache.
You can see the size of your cache by looking at the size of the image-cache.oci folder with du -sh image-cache.oci.
If the disk image is used, the IMAGECACHE volume doesn’t need to be configured, as the image cache volume is already present in the disk image.
See disk management for more information on volume configuration.
Troubleshooting
When the image cache is enabled, Talos will block on boot waiting for the image cache to be available:
task install (1/1): waiting for the image cache
After the initial install from an ISO, the image cache will be copied to the disk and will be available for the subsequent boots:
The status field indicates the readiness of the image cache, and the copyStatus field indicates the readiness of the image cache copy.
The roots field contains the paths to the image cache contents, in this example both on-disk and ISO caches are available.
Image cache roots are used in order they are listed.
2.2.8 - Logging
Dealing with Talos Linux logs.
Viewing logs
Kernel messages can be retrieved with talosctl dmesg command:
Service logs can be retrieved with talosctl logs command:
$ talosctl -n 172.20.1.2 services
NODE SERVICE STATE HEALTH LAST CHANGE LAST EVENT
172.20.1.2 apid Running OK 19m27s ago Health check successful
172.20.1.2 containerd Running OK 19m29s ago Health check successful
172.20.1.2 cri Running OK 19m27s ago Health check successful
172.20.1.2 etcd Running OK 19m22s ago Health check successful
172.20.1.2 kubelet Running OK 19m20s ago Health check successful
172.20.1.2 machined Running ? 19m30s ago Service started as goroutine
172.20.1.2 trustd Running OK 19m27s ago Health check successful
172.20.1.2 udevd Running OK 19m28s ago Health check successful
$ talosctl -n 172.20.1.2 logs machined
172.20.1.2: [talos] task setupLogger (1/1): done, 106.109µs
172.20.1.2: [talos] phase logger (1/7): done, 564.476µs
[...]
Container logs for Kubernetes pods can be retrieved with talosctl logs -k command:
Messages are newline-separated when sent over TCP.
Over UDP messages are sent with one message per packet.
msg, talos-level, talos-service, and talos-time fields are always present; there may be additional fields.
Every message sent can be enhanced with additional fields by using the extraTags field in the machine configuration:
The specified extraTags are added to every message sent to the destination verbatim.
Kernel logs
Kernel log delivery can be enabled with the talos.logging.kernel kernel command line argument, which can be specified
in the .machine.installer.extraKernelArgs:
Kernel log destination is specified in the same way as service log endpoint.
The only supported format is json_lines.
Sample message:
{
"clock":6252819, // time relative to the kernel boot time
"facility":"user",
"msg":"[talos] task startAllServices (1/1): waiting for 6 services\n",
"priority":"warning",
"seq":711,
"talos-level":"warn", // Talos-translated `priority` into common logging level
"talos-time":"2021-11-26T16:53:21.3258698Z"// Talos-translated `clock` using current time
}
extraKernelArgs in the machine configuration are only applied on Talos upgrades, not just by applying the config.
(Upgrading to the same version is fine).
Filebeat example
To forward logs to other Log collection services, one way to do this is sending
them to a Filebeat running in the
cluster itself (in the host network), which takes care of forwarding it to
other endpoints (and the necessary transformations).
If Elastic Cloud on Kubernetes
is being used, the following Beat (custom resource) configuration might be
helpful:
The input configuration ensures that messages and timestamps are extracted properly.
Refer to the Filebeat documentation on how to forward logs to other outputs.
Also note the hostNetwork: true in the daemonSet configuration.
This ensures filebeat uses the host network, and listens on 127.0.0.1:12345
(UDP) on every machine, which can then be specified as a logging endpoint in
the machine configuration.
Fluent-bit example
First, we’ll create a value file for the fluentd-bit Helm chart.
# fluentd-bit.yamlpodAnnotations:
fluentbit.io/exclude: 'true'extraPorts:
- port: 12345containerPort: 12345protocol: TCP
name: talos
config:
service: | [SERVICE]
Flush 5
Daemon Off
Log_Level warn
Parsers_File custom_parsers.confinputs: | [INPUT]
Name tcp
Listen 0.0.0.0
Port 12345
Format json
Tag talos.*
[INPUT]
Name tail
Alias kubernetes
Path /var/log/containers/*.log
Parser containerd
Tag kubernetes.*
[INPUT]
Name tail
Alias audit
Path /var/log/audit/kube/*.log
Parser audit
Tag audit.*filters: | [FILTER]
Name kubernetes
Alias kubernetes
Match kubernetes.*
Kube_Tag_Prefix kubernetes.var.log.containers.
Use_Kubelet Off
Merge_Log On
Merge_Log_Trim On
Keep_Log Off
K8S-Logging.Parser Off
K8S-Logging.Exclude On
Annotations Off
Labels On
[FILTER]
Name modify
Match kubernetes.*
Add source kubernetes
Remove logtagcustomParsers: | [PARSER]
Name audit
Format json
Time_Key requestReceivedTimestamp
Time_Format %Y-%m-%dT%H:%M:%S.%L%z
[PARSER]
Name containerd
Format regex
Regex ^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<logtag>[^ ]*) (?<log>.*)$
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L%zoutputs: | [OUTPUT]
Name stdout
Alias stdout
Match *
Format json_lines# If you wish to ship directly to Loki from Fluentbit,# Uncomment the following output, updating the Host with your Loki DNS/IP info as necessary.# [OUTPUT]# Name loki# Match *# Host loki.loki.svc# Port 3100# Labels job=fluentbit# Auto_Kubernetes_Labels ondaemonSetVolumes:
- name: varlog
hostPath:
path: /var/log
daemonSetVolumeMounts:
- name: varlog
mountPath: /var/log
tolerations:
- operator: Exists
effect: NoSchedule
Next, we will add the helm repo for FluentBit, and deploy it to the cluster.
$ kubectl -n kube-system get svc -l app.kubernetes.io/name=fluent-bit
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
fluent-bit ClusterIP 10.200.0.138 <none> 2020/TCP,5170/TCP 108m
Finally, we will change talos log destination with the command talosctl edit mc.
This example configuration was well tested with Cilium CNI, and it should work with iptables/ipvs based CNI plugins too.
Vector example
Vector is a lightweight observability pipeline ideal for a Kubernetes environment.
It can ingest (source) logs from multiple sources, perform remapping on the logs (transform), and forward the resulting pipeline to multiple destinations (sinks).
As it is an end to end platform, it can be run as a single-deployment ‘aggregator’ as well as a replicaSet of ‘Agents’ that run on each node.
As Talos can be set as above to send logs to a destination, we can run Vector as an Aggregator, and forward both kernel and service to a UDP socket in-cluster.
Below is an excerpt of a source/sink setup for Talos, with a ‘sink’ destination of an in-cluster Grafana Loki log aggregation service.
As Loki can create labels from the log input, we have set up the Loki sink to create labels based on the host IP, service and facility of the inbound logs.
Note that a method of exposing the Vector service will be required which may vary depending on your setup - a LoadBalancer is a good option.
In this guide we’ll follow the procedure to support NVIDIA GPU using OSS drivers on Talos.
Enabling NVIDIA GPU support on Talos is bound by NVIDIA EULA.
The Talos published NVIDIA OSS drivers are bound to a specific Talos release.
The extensions versions also needs to be updated when upgrading Talos.
We will be using the following NVIDIA OSS system extensions:
nvidia-open-gpu-kernel-modules
nvidia-container-toolkit
Create the boot assets which includes the system extensions mentioned above (or create a custom installer and perform a machine upgrade if Talos is already installed).
Make sure the driver version matches for both the nvidia-open-gpu-kernel-modules and nvidia-container-toolkit extensions.
The nvidia-open-gpu-kernel-modules extension is versioned as <nvidia-driver-version>-<talos-release-version> and the nvidia-container-toolkit extension is versioned as <nvidia-driver-version>-<nvidia-container-toolkit-version>.
Proprietary vs OSS Nvidia Driver Support
The NVIDIA Linux GPU Driver contains several kernel modules: nvidia.ko, nvidia-modeset.ko, nvidia-uvm.ko, nvidia-drm.ko, and nvidia-peermem.ko.
Two “flavors” of these kernel modules are provided, and both are available for use within Talos:
Proprietary, This is the flavor that NVIDIA has historically shipped.
Now apply the patch to all Talos nodes in the cluster having NVIDIA GPU’s installed:
talosctl patch mc --patch @gpu-worker-patch.yaml
The NVIDIA modules should be loaded and the system extension should be installed.
This can be confirmed by running:
talosctl read /proc/modules
which should produce an output similar to below:
nvidia_uvm 1146880 - - Live 0xffffffffc2733000 (PO)
nvidia_drm 69632 - - Live 0xffffffffc2721000 (PO)
nvidia_modeset 1142784 - - Live 0xffffffffc25ea000 (PO)
nvidia 39047168 - - Live 0xffffffffc00ac000 (PO)
talosctl get extensions
which should produce an output similar to below:
NODE NAMESPACE TYPE ID VERSION NAME VERSION
172.31.41.27 runtime ExtensionStatus 000.ghcr.io-siderolabs-nvidia-container-toolkit-515.65.01-v1.10.0 1 nvidia-container-toolkit 515.65.01-v1.10.0
172.31.41.27 runtime ExtensionStatus 000.ghcr.io-siderolabs-nvidia-open-gpu-kernel-modules-515.65.01-v1.2.0 1 nvidia-open-gpu-kernel-modules 515.65.01-v1.2.0
talosctl read /proc/driver/nvidia/version
which should produce an output similar to below:
NVRM version: NVIDIA UNIX x86_64 Kernel Module 515.65.01 Wed Mar 16 11:24:05 UTC 2022
GCC version: gcc version 12.2.0 (GCC)
Deploying NVIDIA device plugin
First we need to create the RuntimeClass
Apply the following manifest to create a runtime class that uses the extension:
In this guide we’ll follow the procedure to support NVIDIA GPU using proprietary drivers on Talos.
Enabling NVIDIA GPU support on Talos is bound by NVIDIA EULA.
The Talos published NVIDIA drivers are bound to a specific Talos release.
The extensions versions also needs to be updated when upgrading Talos.
We will be using the following NVIDIA system extensions:
nonfree-kmod-nvidia
nvidia-container-toolkit
To build a NVIDIA driver version not published by SideroLabs follow the instructions here
Create the boot assets which includes the system extensions mentioned above (or create a custom installer and perform a machine upgrade if Talos is already installed).
Make sure the driver version matches for both the nonfree-kmod-nvidia and nvidia-container-toolkit extensions.
The nonfree-kmod-nvidia extension is versioned as <nvidia-driver-version>-<talos-release-version> and the nvidia-container-toolkit extension is versioned as <nvidia-driver-version>-<nvidia-container-toolkit-version>.
Proprietary vs OSS Nvidia Driver Support
The NVIDIA Linux GPU Driver contains several kernel modules: nvidia.ko, nvidia-modeset.ko, nvidia-uvm.ko, nvidia-drm.ko, and nvidia-peermem.ko.
Two “flavors” of these kernel modules are provided, and both are available for use within Talos:
Proprietary, This is the flavor that NVIDIA has historically shipped.
Now apply the patch to all Talos nodes in the cluster having NVIDIA GPU’s installed:
talosctl patch mc --patch @gpu-worker-patch.yaml
The NVIDIA modules should be loaded and the system extension should be installed.
This can be confirmed by running:
talosctl read /proc/modules
which should produce an output similar to below:
nvidia_uvm 1146880 - - Live 0xffffffffc2733000 (PO)
nvidia_drm 69632 - - Live 0xffffffffc2721000 (PO)
nvidia_modeset 1142784 - - Live 0xffffffffc25ea000 (PO)
nvidia 39047168 - - Live 0xffffffffc00ac000 (PO)
talosctl get extensions
which should produce an output similar to below:
NODE NAMESPACE TYPE ID VERSION NAME VERSION
172.31.41.27 runtime ExtensionStatus 000.ghcr.io-frezbo-nvidia-container-toolkit-510.60.02-v1.9.0 1 nvidia-container-toolkit 510.60.02-v1.9.0
talosctl read /proc/driver/nvidia/version
which should produce an output similar to below:
NVRM version: NVIDIA UNIX x86_64 Kernel Module 510.60.02 Wed Mar 16 11:24:05 UTC 2022
GCC version: gcc version 11.2.0 (GCC)
Deploying NVIDIA device plugin
First we need to create the RuntimeClass
Apply the following manifest to create a runtime class that uses the extension:
In this guide, we’ll describe various performance tuning knobs available.
Talos Linux tries to strike a balance between performance and security/efficiency.
However, there are some performance tuning knobs available to adjust the system to your needs.
With any performance tuning, it’s essential to measure the impact of the changes and ensure they don’t introduce security vulnerabilities.
Note: Most of the suggestions below apply to bare metal machines, but some of them might be useful for VMs as well.
If you find more performance tuning knobs, please let us know by editing this document.
Kernel Parameters
Talos Linux kernel parameters can be adjusted in the following ways:
temporary, one-time adjustments can be done via console access, and editing the kernel command line in the bootloader (doesn’t work for Secure Boot enabled systems)
on initial install (when booting off ISO/PXE), .machine.install.extraKernelArgs can be used to set kernel parameters
after the initial install (or when booting off a disk image), .machine.install.extraKernelArgs changes require a no-op upgrade (e.g. to the same version of Talos) to take effect
CPU Scaling
Talos Linux uses the schedutilCPU scaling governor by default, for maximum performance, you can switch to the performance governor:
cpufreq.default_governor=performance
Processor Sleep States
Modern processors support various sleep states to save power, but they might introduce latency when transitioning back to the active state.
AMD
For maximum performance (and lower latency), use active mode of the amd-pstate driver:
amd_pstate=active
Intel
For maximum performance (and lower latency), disable the intel_idle driver:
intel_idle.max_cstate=0
Hardware Vulnerabilities
Modern processors have various security vulnerabilities that require software/microcode mitigations.
These mitigations might have a performance impact, and some of them can be disabled if you are willing to take the risk.
First of all, ensure that Talos system extensions amd-ucode and intel-ucode are installed (and using latest version of Talos Linux).
Linux kernel will load the microcode updates on early boot, and for some processors, it might reduce the performance impact of the mitigations.
The availability of microcode updates depends on the processor model.
The kernel command line argument mitigations can be used to disable all mitigations at once (not recommended from security point of view):
mitigations=off
There is also a way to disable specific mitigations, see Kernel documentation for more details.
I/O
For Talos Linux before version 1.8.2, the I/O performance can be improved by setting iommu.strict=0, for later versions this is a default setting.
Performance can be further improved at some cost of security by bypassing the I/O memory management unit (IOMMU) for DMA:
iommu.passthrough=1
2.2.13 - Pull Through Image Cache
How to set up local transparent container images caches.
In this guide we will create a set of local caching Docker registry proxies to minimize local cluster startup time.
When running Talos locally, pulling images from container registries might take a significant amount of time.
We spin up local caching pass-through registries to cache images and configure a local Talos cluster to use those proxies.
A similar approach might be used to run Talos in production in air-gapped environments.
It can be also used to verify that all the images are available in local registries.
Video Walkthrough
To see a live demo of this writeup, see the video below:
Requirements
The follow are requirements for creating the set of caching proxies:
Docker 18.03 or greater
Local cluster requirements for either docker or QEMU.
Launch the Caching Docker Registry Proxies
Talos pulls from docker.io, registry.k8s.io, gcr.io, and ghcr.io by default.
If your configuration is different, you might need to modify the commands below:
Note: Proxies are started as docker containers, and they’re automatically configured to start with Docker daemon.
As a registry container can only handle a single upstream Docker registry, we launch a container per upstream, each on its own
host port (5000, 5001, 5002, 5003 and 5004).
Using Caching Registries with QEMU Local Cluster
With a QEMU local cluster, a bridge interface is created on the host.
As registry containers expose their ports on the host, we can use bridge IP to direct proxy requests.
The Talos local cluster should now start pulling via caching registries.
This can be verified via registry logs, e.g. docker logs -f registry-docker.io.
The first time cluster boots, images are pulled and cached, so next cluster boot should be much faster.
Note: 10.5.0.1 is a bridge IP with default network (10.5.0.0/24), if using custom --cidr, value should be adjusted accordingly.
Using Caching Registries with docker Local Cluster
With a docker local cluster we can use docker bridge IP, default value for that IP is 172.17.0.1.
On Linux, the docker bridge address can be inspected with ip addr show docker0.
Note: Removing docker registry containers also removes the image cache.
So if you plan to use caching registries, keep the containers running.
Using Harbor as a Caching Registry
Harbor is an open source container registry that can be used as a caching proxy.
Harbor supports configuring multiple upstream registries, so it can be used to cache multiple registries at once behind a single endpoint.
As Harbor puts a registry name in the pull image path, we need to set overridePath: true to prevent Talos and containerd from appending /v2 to the path.
Talos v0.11 introduced initial support for role-based access control (RBAC).
This guide will explain what that is and how to enable it without losing access to the cluster.
RBAC in Talos
Talos uses certificates to authorize users.
The certificate subject’s organization field is used to encode user roles.
There is a set of predefined roles that allow access to different API methods:
os:admin grants access to all methods;
os:operator grants everything os:reader role does, plus additional methods: rebooting, shutting down, etcd backup, etcd alarm management, and so on;
os:reader grants access to “safe” methods (for example, that includes the ability to list files, but does not include the ability to read files content);
Roles in the current talosconfig can be checked with the following command:
$ talosctl config info
[...]Roles: os:admin
[...]
RBAC is enabled by default in new clusters created with talosctl v0.11+ and disabled otherwise.
Enabling RBAC
First, both the Talos cluster and talosctl tool should be upgraded.
Then the talosctl config new command should be used to generate a new client configuration with the os:admin role.
Additional configurations and certificates for different roles can be generated by passing --roles flag:
talosctl config new --roles=os:reader reader
That command will create a new client configuration file reader with a new certificate with os:reader role.
After that, RBAC should be enabled in the machine configuration:
machine:
features:
rbac: true
2.2.15 - System Extensions
Customizing the Talos Linux immutable root file system.
System extensions allow extending the Talos root filesystem, which enables a variety of features, such as including custom
container runtimes, loading additional firmware, etc.
System extensions are only activated during the installation or upgrade of Talos Linux.
With system extensions installed, the Talos root filesystem is still immutable and read-only.
Installing System Extensions
Note: the way to install system extensions in the .machine.install section of the machine configuration is now deprecated.
Starting with Talos v1.5.0, Talos supports generation of boot media with system extensions included, this removes the need to rebuild
the initramfs.xz on the machine itself during the installation or upgrade.
There are two kinds of boot assets that Talos can generate:
initial boot assets (ISO, PXE, etc.) that are used to boot the machine
disk images that have Talos pre-installed
installer container images that can be used to install or upgrade Talos on a machine (installation happens when booted from ISO or PXE)
Depending on the nature of the system extension (e.g. network device driver or containerd plugin), it may be necessary to include the extension in
both initial boot assets and disk images/installer, or just the installer.
The process of generating boot assets with extensions included is described in the boot assets guide.
Example: Booting from an ISO
Let’s assume NVIDIA extension is required on a bare metal machine which is going to be booted from an ISO.
As NVIDIA extension is not required for the initial boot and install step, it is sufficient to include the extension in the installer image only.
Use a generic Talos ISO to boot the machine.
Prepare a custom installer container image with NVIDIA extension included, push the image to a registry.
Ensure that machine configuration field .machine.install.image points to the custom installer image.
Boot the machine using the ISO, apply the machine configuration.
Talos pulls a custom installer image from the registry (containing NVIDIA extension), installs Talos on the machine, and reboots.
When it’s time to upgrade Talos, generate a custom installer container for a new version of Talos, push it to a registry, and perform upgrade
pointing to the custom installer image.
Example: Disk Image
Let’s assume NVIDIA extension is required on AWS VM.
Prepare an AWS disk image with NVIDIA extension included.
Upload the image to AWS, register it as an AMI.
Use the AMI to launch a VM.
Talos boots with NVIDIA extension included.
When it’s time to upgrade Talos, either repeat steps 1-4 to replace the VM with a new AMI, or
like in the previous example, generate a custom installer and use it to upgrade Talos in-place.
Authoring System Extensions
A Talos system extension is a container image with the specific folder structure.
System extensions can be built and managed using any tool that produces container images, e.g. docker build.
Use talosctl get extensions to get a list of system extensions:
$ talosctl get extensions
NODE NAMESPACE TYPE ID VERSION NAME VERSION
172.20.0.2 runtime ExtensionStatus 000.ghcr.io-talos-systems-gvisor-54b831d 1 gvisor 20220117.0-v1.0.0
172.20.0.2 runtime ExtensionStatus 001.ghcr.io-talos-systems-intel-ucode-54b831d 1 intel-ucode microcode-20210608-v1.0.0
Use YAML or JSON format to see additional details about the extension:
Talos Linux itself does not require time to be synchronized across the cluster, but as Talos Linux and Kubernetes components issue certificates
with expiration dates, it is recommended to have time synchronized across the cluster.
Some workloads (e.g. Ceph) might require to be in sync across the machines in the cluster due to the design of the application.
Talos Linux tries to launch API even if the time is not sync, and if time jumps as a result of NTP sync, the API certificates will be rotated automatically.
Some components like kubelet and etcd wait for the time to be in sync before starting, as they don’t support graceful certificate rotation.
By default, Talos Linux uses time.cloudflare.com as the NTP server, but it can be overridden in the machine configuration, or provided via DHCP, kernel args, platform sources, etc.
Talos Linux implements SNTP protocol to sync time with the NTP server.
Observing Status
Current time sync status can be observed with:
$ talosctl get timestatus
NODE NAMESPACE TYPE ID VERSION SYNCED
172.20.0.2 runtime TimeStatus node 2true
The list of servers Talos Linux is syncing with can be observed with:
$ talosctl get timeservers
NODE NAMESPACE TYPE ID VERSION TIMESERVERS
172.20.0.2 network TimeServerStatus timeservers 1["time.cloudflare.com"]
More detailed logs about the time sync process can be queried with:
When running in a VM on a hypervisor, instead of doing network time sync, Talos can sync the time to the hypervisor clock (if supported by the hypervisor).
To check if the PTP device is available:
$ talosctl ls /sys/class/ptp/
NODE NAME
172.20.0.2 .
172.20.0.2 ptp0
Make sure that the PTP device is provided by the hypervisor, as some PTP devices don’t provide accurate time value without proper setup:
To enable PTP sync, set the machine.time.servers to the PTP device name (e.g. /dev/ptp0):
machine:
time:
servers:
- /dev/ptp0
After setting the PTP device, Talos will sync the time to the PTP device instead of using the NTP server:
172.20.0.2: 2024-04-17T19:11:48.817Z DEBUG adjusting time (slew) by 32.223689ms via /dev/ptp0, state TIME_OK, status STA_PLL | STA_NANO {"component": "controller-runtime", "controller": "time.SyncController"}
Additional Configuration
Talos NTP sync can be disabled with the following machine configuration patch:
machine:
time:
disabled: true
When time sync is disabled, Talos assumes that time is always in sync.
Time sync can be also configured on best-effort basis, where Talos will try to sync time for the specified period of time, but if it fails to do so, time will be configured to be in sync when the period expires:
machine:
time:
bootTimeout: 2m
2.3 - How Tos
How to guide for common tasks in Talos Linux
2.3.1 - How to enable workers on your control plane nodes
How to enable workers on your control plane nodes.
By default, Talos Linux taints control plane nodes so that workloads are not schedulable on them.
In order to allow workloads to run on the control plane nodes (useful for single node clusters, or non-production clusters), follow the procedure below.
Modify the MachineConfig for the controlplane nodes to add allowSchedulingOnControlPlanes: true:
2.3.2 - How to manage PKI and certificate lifetimes with Talos Linux
Talos Linux automatically manages and rotates all server side certificates for etcd, Kubernetes, and the Talos API.
Note however that the kubelet needs to be restarted at least once a year in order for the certificates to be rotated.
Any upgrade/reboot of the node will suffice for this effect.
You can check the Kubernetes certificates with the command talosctl get KubernetesDynamicCerts -o yaml on the controlplane.
Client certificates (talosconfig and kubeconfig) are the user’s responsibility.
Each time you download the kubeconfig file from a Talos Linux cluster, the client certificate is regenerated giving you a kubeconfig which is valid for a year.
The talosconfig file should be renewed at least once a year, using the talosctl config new command, as shown below, or by one of the other methods.
Generating New Client Configuration
Using Controlplane Node
If you have a valid (not expired) talosconfig with os:admin role,
a new client configuration file can be generated with talosctl config new against
any controlplane node:
talosctl -n CP1 config new talosconfig-reader --roles os:reader --crt-ttl 24h
A specific role and certificate lifetime can be specified.
Note: <cluster-name> and <cluster-endpoint> arguments don’t matter, as they are not used for talosconfig.
From Control Plane Machine Configuration
In order to create a new key pair for client configuration, you will need the root Talos API CA.
The base64 encoded CA can be found in the control plane node’s configuration file.
Save the CA public key, and CA private key as ca.crt, and ca.key respectively:
The command talosctl reset will cordon and drain the node, leaving etcd if required, and then erase its disks and power down the system.
This command will also remove the node from registration with the discovery service, so it will no longer show up in talosctl get members.
It is still necessary to remove the node from Kubernetes, as noted above.
2.3.4 - How to scale up a Talos cluster
How to add more nodes to a Talos Linux cluster.
To add more nodes to a Talos Linux cluster, follow the same procedure as when initially creating the cluster:
boot the new machines to install Talos Linux
apply the worker.yaml or controlplane.yaml configuration files to the new machines
You need the controlplane.yaml and worker.yaml that were created when you initially deployed your cluster.
These contain the certificates that enable new machines to join.
Once you have the IP address, you can then apply the correct configuration for each machine you are adding, either worker or controlplane.
The insecure flag is necessary because the PKI infrastructure has not yet been made available to the node.
You do not need to bootstrap the new node.
Regardless of whether you are adding a control plane or worker node, it will now join the cluster in its role.
2.4 - Network
Set up networking layers for Talos Linux
2.4.1 - Corporate Proxies
How to configure Talos Linux to use proxies in a corporate environment
Appending the Certificate Authority of MITM Proxies
Put into each machine the PEM encoded certificate:
Talos Linux starting with 1.7.0 provides a caching DNS resolver for host workloads (including host networking pods).
Host DNS resolver is enabled by default for clusters created with Talos 1.7, and it can be enabled manually on upgrade.
Enabling Host DNS
Use the following machine configuration patch to enable host DNS resolver:
machine:
features:
hostDNS:
enabled: true
Host DNS can be disabled by setting enabled: false as well.
Operations
When enabled, Talos Linux starts a DNS caching server on the host, listening on address 127.0.0.53:53 (both TCP and UDP protocols).
The host /etc/resolv.conf file is rewritten to point to the host DNS server:
All host-based workloads will use the host DNS server for name resolution.
Host DNS server forwards requests to the upstream DNS servers, which are either acquired automatically (DHCP, platform sources, kernel args), or specified in the machine configuration.
The upstream DNS servers can be observed with:
$ talosctl get resolvers
NODE NAMESPACE TYPE ID VERSION RESOLVERS
172.20.0.2 network ResolverStatus resolvers 2["8.8.8.8","1.1.1.1"]
Logs of the host DNS resolver can be queried with:
talosctl logs dns-resolve-cache
Upstream server status can be observed with:
$ talosctl get dnsupstream
NODE NAMESPACE TYPE ID VERSION HEALTHY ADDRESS
172.20.0.2 network DNSUpstream 1.1.1.1 1true 1.1.1.1:53
172.20.0.2 network DNSUpstream 8.8.8.8 1true 8.8.8.8:53
Forwarding kube-dns to Host DNS
Note: This feature is enabled by default for new clusters created with Talos 1.8.0 and later.
When host DNS is enabled, by default, kube-dns service (CoreDNS in Kubernetes) uses host DNS server to resolve external names.
This way the cache is shared between the host DNS and kube-dns.
Talos allows forwarding kube-dns to the host DNS resolver to be disabled with:
This configuration should be applied to all nodes in the cluster, if applied after cluster creation, restart coredns pods in Kubernetes to pick up changes.
When forwardKubeDNSToHost is enabled, Talos Linux allocates IP address 169.254.116.108 for the host DNS server, and kube-dns service is configured to use this IP address as the upstream DNS server:
This way kube-dns service forwards all DNS requests to the host DNS server, and the cache is shared between the host and kube-dns.
Resolving Talos Cluster Member Names
Host DNS can be configured to resolve Talos cluster member names to IP addresses, so that the host can communicate with the cluster members by name.
Sometimes machine hostnames are already resolvable by the upstream DNS, but this might not always be the case.
When enabled, Talos Linux uses discovery data to resolve Talos cluster member names to IP addresses:
$ talosctl get members
NODE NAMESPACE TYPE ID VERSION HOSTNAME MACHINE TYPE OS ADDRESSES
172.20.0.2 cluster Member talos-default-controlplane-1 1 talos-default-controlplane-1 controlplane Talos (v1.9.0)["172.20.0.2"]172.20.0.2 cluster Member talos-default-worker-1 1 talos-default-worker-1 worker Talos (v1.9.0)["172.20.0.3"]
With the example output above, talos-default-worker-1 name will resolve to 127.0.0.3.
Example usage:
talosctl -n talos-default-worker-1 version
When combined with forwardKubeDNSToHost, kube-dns service will also resolve Talos cluster member names to IP addresses.
2.4.3 - Ingress Firewall
Learn to use Talos Linux Ingress Firewall to limit access to the host services.
Talos Linux Ingress Firewall is a simple and effective way to limit network access to the services running on the host, which includes both Talos standard
services (e.g. apid and kubelet), and any additional workloads that may be running on the host.
Talos Linux Ingress Firewall doesn’t affect the traffic between the Kubernetes pods/services, please use CNI Network Policies for that.
The first document configures the default action for ingress traffic, which can be either accept or block, with the default being accept.
If the default action is set to accept, then all ingress traffic will be allowed, unless there is a matching rule that blocks it.
If the default action is set to block, then all ingress traffic will be blocked, unless there is a matching rule that allows it.
With either accept or block, traffic is always allowed on the following network interfaces:
lo
siderolink
kubespan
In block mode:
ICMP and ICMPv6 traffic is also allowed with a rate limit of 5 packets per second
traffic between Kubernetes pod/service subnets is allowed (for native routing CNIs)
The second document defines an ingress rule for a set of ports and protocols on the host.
The NetworkRuleConfig might be repeated many times to define multiple rules, but each document must have a unique name.
The ports field accepts either a single port or a port range:
The ingress specifies the list of subnets that are allowed to access the host services, with the optional except field to exclude a set of addresses from the subnet.
Note: incorrect configuration of the ingress firewall might result in the host becoming inaccessible over Talos API.
It is recommended that the configuration be applied in --mode=try to ensure it is reverted in case of a mistake.
Recommended Rules
The following rules improve the security of the cluster and cover only standard Talos services.
If there are additional services running with host networking in the cluster, they should be covered by additional rules.
In block mode, the ingress firewall will also block encapsulated traffic (e.g. VXLAN) between the nodes, which needs to be explicitly allowed for the Kubernetes
networking to function properly.
Please refer to the documentation of the CNI in use for the specific ports required.
Some default configurations are listed below:
Flannel, Calico: vxlan UDP port 4789
Cilium: vxlan UDP port 8472
In the examples we assume the following template variables to describe the cluster:
$CLUSTER_SUBNET, e.g. 172.20.0.0/24 - the subnet which covers all machines in the cluster
$CP1, $CP2, $CP3 - the IP addresses of the controlplane nodes
$VXLAN_PORT - the UDP port used by the CNI for encapsulated traffic
Controlplane
In this example Ingress policy:
apid and Kubernetes API are wide open
kubelet and trustd API are only accessible within the cluster
Learn to use KubeSpan to connect Talos Linux machines securely across networks.
KubeSpan is a feature of Talos that automates the setup and maintenance of a full mesh WireGuard network for your cluster, giving you the ability to operate hybrid Kubernetes clusters that can span the edge, datacenter, and cloud.
Management of keys and discovery of peers can be completely automated, making it simple and easy to create hybrid clusters.
KubeSpan consists of client code in Talos Linux, as well as a discovery service that enables clients to securely find each other.
Sidero Labs operates a free Discovery Service, but the discovery service may, with a commercial license, be operated by your organization and can be downloaded here.
Video Walkthrough
To see a live demo of KubeSpan, see one the videos below:
Network Requirements
KubeSpan uses UDP port 51820 to carry all KubeSpan encrypted traffic.
Because UDP traversal of firewalls is often lenient, and the Discovery Service communicates the apparent IP address of all peers to all other peers, KubeSpan will often work automatically, even when each nodes is behind their own firewall.
However, when both ends of a KubeSpan connection are behind firewalls, it is possible the connection may not be established correctly - it depends on each end sending out packets in a limited time window.
Thus best practice is to ensure that one end of all possible node-node communication allows UDP port 51820, inbound.
For example, if control plane nodes are running in a corporate data center, behind firewalls, KubeSpan connectivity will work correctly so long as worker nodes on the public Internet can receive packets on UDP port 51820.
(Note the workers will also need to receive TCP port 50000 for initial configuration via talosctl).
An alternative topology would be to run control plane nodes in a public cloud, and allow inbound UDP port 51820 to the control plane nodes.
Workers could be behind firewalls, and KubeSpan connectivity will be established.
Note that if workers are in different locations, behind different firewalls, the KubeSpan connectivity between workers should be correctly established, but may require opening the KubeSpan UDP port on the local firewall also.
Caveats
Kubernetes API Endpoint Limitations
When the K8s endpoint is an IP address that is not part of Kubespan, but is an address that is forwarded on to the Kubespan address of a control plane node, without changing the source address, then worker nodes will fail to join the cluster.
In such a case, the control plane node has no way to determine whether the packet arrived on the private Kubespan address, or the public IP address.
If the source of the packet was a Kubespan member, the reply will be Kubespan encapsulated, and thus not translated to the public IP, and so the control plane will reply to the session with the wrong address.
This situation is seen, for example, when the Kubernetes API endpoint is the public IP of a VM in GCP or Azure for a single node control plane.
The control plane will receive packets on the public IP, but will reply from it’s KubeSpan address.
The workaround is to create a load balancer to terminate the Kubernetes API endpoint.
Digital Ocean Limitations
Digital Ocean assigns an “Anchor IP” address to each droplet.
Talos Linux correctly identifies this as a link-local address, and configures KubeSpan correctly, but this address will often be selected by Flannel or other CNIs as a node’s private IP.
Because this address is not routable, nor advertised via KubeSpan, it will break pod-pod communication between nodes.
This can be worked-around by assigning a non-Anchor private IP:
Then restarting flannel:
kubectl delete pods -n kube-system -l k8s-app=flannel
Enabling
Creating a New Cluster
To enable KubeSpan for a new cluster, we can use the --with-kubespan flag in talosctl gen config.
This will enable peer discovery and KubeSpan.
machine:
network:
kubespan:
enabled: true# Enable the KubeSpan feature.cluster:
discovery:
enabled: true# Configure registries used for cluster member discovery.registries:
kubernetes: # Kubernetes registry is problematic with KubeSpan, if the control plane endpoint is routeable itself via KubeSpan.disabled: trueservice: {}
The default discovery service is an external service hosted by Sidero Labs at https://discovery.talos.dev/.
Contact Sidero Labs if you need to run this service privately.
Enabling for an Existing Cluster
In order to enable KubeSpan on an existing cluster, enable kubespan and discovery settings in the machine config for each machine in the cluster (discovery is enabled by default):
The setting advertiseKubernetesNetworks controls whether the node will advertise Kubernetes service and pod networks to other nodes in the cluster over KubeSpan.
It defaults to being disabled, which means KubeSpan only controls the node-to-node traffic, while pod-to-pod traffic is routed and encapsulated by CNI.
This setting should not be enabled with Calico and Cilium CNI plugins, as they do their own pod IP allocation which is not visible to KubeSpan.
The setting allowDownPeerBypass controls whether the node will allow traffic to bypass WireGuard if the destination is not connected over KubeSpan.
If enabled, there is a risk that traffic will be routed unencrypted if the destination is not connected over KubeSpan, but it allows a workaround
for the case where a node is not connected to the KubeSpan network, but still needs to access the cluster.
The mtu setting configures the Wireguard MTU, which defaults to 1420.
This default value of 1420 is safe to use when the underlying network MTU is 1500, but if the underlying network MTU is smaller, the KubeSpanMTU should be adjusted accordingly:
KubeSpanMTU = UnderlyingMTU - 80.
The filters setting allows hiding some endpoints from being advertised over KubeSpan.
This is useful when some endpoints are known to be unreachable between the nodes, so that KubeSpan doesn’t try to establish a connection to them.
Another use-case is hiding some endpoints if nodes can connect on multiple networks, and some of the networks are more preferable than others.
To include additional announced endpoints, such as inbound NAT mappings, you can add the machine config document.
Talos automatically configures unique IPv6 address for each node in the cluster-specific IPv6 ULA prefix.
The Wireguard private key is generated and never leaves the node, while the public key is published through the cluster discovery.
KubeSpanIdentity is persisted across reboots and upgrades in STATE partition in the file kubespan-identity.yaml.
KubeSpanPeerSpecs
A node’s WireGuard peers can be obtained with:
$ talosctl get kubespanpeerspecs
ID VERSION LABEL ENDPOINTS
06D9QQOydzKrOL7oeLiqHy9OWE8KtmJzZII2A5/FLFI=2 talos-default-controlplane-2 ["172.20.0.3:51820"]THtfKtfNnzJs1nMQKs5IXqK0DFXmM//0WMY+NnaZrhU=2 talos-default-controlplane-3 ["172.20.0.4:51820"]nVHu7l13uZyk0AaI1WuzL2/48iG8af4WRv+LWmAax1M=2 talos-default-worker-2 ["172.20.0.6:51820"]zXP0QeqRo+CBgDH1uOBiQ8tA+AKEQP9hWkqmkE/oDlc=2 talos-default-worker-1 ["172.20.0.5:51820"]
The peer ID is the Wireguard public key.
KubeSpanPeerSpecs are built from the cluster discovery data.
KubeSpanPeerStatuses
The status of a node’s WireGuard peers can be obtained with:
$ talosctl get kubespanpeerstatuses
ID VERSION LABEL ENDPOINT STATE RX TX
06D9QQOydzKrOL7oeLiqHy9OWE8KtmJzZII2A5/FLFI=63 talos-default-controlplane-2 172.20.0.3:51820 up 1504322017869488THtfKtfNnzJs1nMQKs5IXqK0DFXmM//0WMY+NnaZrhU=62 talos-default-controlplane-3 172.20.0.4:51820 up 1457320818157680nVHu7l13uZyk0AaI1WuzL2/48iG8af4WRv+LWmAax1M=60 talos-default-worker-2 172.20.0.6:51820 up 13007246888zXP0QeqRo+CBgDH1uOBiQ8tA+AKEQP9hWkqmkE/oDlc=60 talos-default-worker-1 172.20.0.5:51820 up 13004446556
KubeSpan peer status includes following information:
the actual endpoint used for peer communication
link state:
unknown: the endpoint was just changed, link state is not known yet
up: there is a recent handshake from the peer
down: there is no handshake from the peer
number of bytes sent/received over the Wireguard link with the peer
If the connection state goes down, Talos will be cycling through the available endpoints until it finds the one which works.
Peer status information is updated every 30 seconds.
KubeSpanEndpoints
A node’s WireGuard endpoints (peer addresses) can be obtained with:
$ talosctl get kubespanendpoints
ID VERSION ENDPOINT AFFILIATE ID
06D9QQOydzKrOL7oeLiqHy9OWE8KtmJzZII2A5/FLFI=1 172.20.0.3:51820 2VfX3nu67ZtZPl57IdJrU87BMjVWkSBJiL9ulP9TCnF
THtfKtfNnzJs1nMQKs5IXqK0DFXmM//0WMY+NnaZrhU=1 172.20.0.4:51820 b3DebkPaCRLTLLWaeRF1ejGaR0lK3m79jRJcPn0mfA6C
nVHu7l13uZyk0AaI1WuzL2/48iG8af4WRv+LWmAax1M=1 172.20.0.6:51820 NVtfu1bT1QjhNq5xJFUZl8f8I8LOCnnpGrZfPpdN9WlB
zXP0QeqRo+CBgDH1uOBiQ8tA+AKEQP9hWkqmkE/oDlc=1 172.20.0.5:51820 6EVq8RHIne03LeZiJ60WsJcoQOtttw1ejvTS6SOBzhUA
The endpoint ID is the base64 encoded WireGuard public key.
The observed endpoints are submitted back to the discovery service (if enabled) so that other peers can try additional endpoints to establish the connection.
2.4.5 - Network Device Selector
How to configure network devices by selecting them using hardware information
Configuring Network Device Using Device Selector
deviceSelector is an alternative method of configuring a network device:
In this example, the bond0 interface will be created and bonded using two devices with the specified hardware addresses.
For bonding, use permanentAddr instead of hardwareAddr to match the permanent hardware address of the device, as hardwareAddr might change
as the link becomes part of the bond.
2.4.6 - Predictable Interface Names
How to use predictable interface naming.
Starting with version Talos 1.5, network interfaces are renamed to predictable names
same way as systemd does that in other Linux distributions.
The naming schema enx78e7d1ea46da (based on MAC addresses) is enabled by default, the order of interface naming decisions is:
firmware/BIOS provided index numbers for on-board devices (example: eno1)
firmware/BIOS provided PCI Express hotplug slot index numbers (example: ens1)
physical/geographical location of the connector of the hardware (example: enp2s0)
interfaces’s MAC address (example: enx78e7d1ea46da)
The predictable network interface names features can be disabled by specifying net.ifnames=0 in the kernel command line.
Note: Talos automatically adds the net.ifnames=0 kernel argument when upgrading from Talos versions before 1.5, so upgrades to 1.5 don’t require any manual intervention.
“Cloud” platforms, like AWS, still use old eth0 naming scheme as Talos automatically adds net.ifnames=0 to the kernel command line.
Single Network Interface
When running Talos on a machine with a single network interface, predictable interface names might be confusing, as it might come up as enxSOMETHING which is hard to address.
There are two ways to solve this:
disable the feature by supplying net.ifnames=0 to the initial boot of Talos, Talos will persist net.ifnames=0 over installs/upgrades.
machine:
network:
interfaces:
- deviceSelector:
busPath: "0*"# should select any hardware network device, if you have just one, it will be selected# any configuration can follow, e.g:addresses: [10.3.4.5/24]
SideroLink provides a secure point-to-point management overlay network for Talos clusters.
Each Talos machine configured to use SideroLink will establish a secure Wireguard connection to the SideroLink API server.
SideroLink provides overlay network using ULA IPv6 addresses allowing to manage Talos Linux machines even if direct access to machine IP addresses is not possible.
SideroLink is a foundation building block of Sidero Omni.
Configuration
SideroLink is configured by providing the SideroLink API server address, either via kernel command line argument siderolink.api or as a config document.
SideroLink API URL: https://siderolink.api/?jointoken=token&grpc_tunnel=true.
If URL scheme is grpc://, the connection will be established without TLS, otherwise, the connection will be established with TLS.
If specified, join token token will be sent to the SideroLink server.
If grpc_tunnel is set to true, the Wireguard traffic will be tunneled over the same SideroLink API gRPC connection instead of using plain UDP.
Connection Flow
Talos Linux creates an ephemeral Wireguard key.
Talos Linux establishes a gRPC connection to the SideroLink API server, sends its own Wireguard public key, join token and other connection settings.
If the join token is valid, the SideroLink API server sends back the Wireguard public key of the SideroLink API server, and two overlay IPv6 addresses: machine address and SideroLink server address.
Talos Linux configured Wireguard interface with the received settings.
Talos Linux monitors status of the Wireguard connection and re-establishes the connection if needed.
Operations with SideroLink
When SideroLink is configured, Talos maintenance mode API listens only on the SideroLink network.
Maintenance mode API over SideroLink allows operations which are not generally available over the public network: getting Talos version, getting sensitive resources, etc.
Talos Linux always provides Talos API over SideroLink, and automatically allows access over SideroLink even if the Ingress Firewall is enabled.
Wireguard connections should be still allowed by the Ingress Firewall.
SideroLink only allows point-to-point connections between Talos machines and the SideroLink management server, two Talos machines cannot communicate directly over SideroLink.
2.4.8 - Virtual (shared) IP
Using Talos Linux to set up a floating virtual IP address for cluster access.
One of the pain points when building a high-availability controlplane
is giving clients a single IP or URL at which they can reach any of the controlplane nodes.
The most common approaches - reverse proxy, load
balancer, BGP, and DNS - all require external resources, and add complexity in setting up Kubernetes.
To simplify cluster creation, Talos Linux supports a “Virtual” IP (VIP) address to access the Kubernetes API server, providing high availability with no other resources required.
What happens is that the controlplane machines vie for control of the shared IP address using etcd elections.
There can be only one owner of the IP address at any given time.
If that owner disappears or becomes non-responsive, another owner will be chosen,
and it will take up the IP address.
Requirements
The controlplane nodes must share a layer 2 network, and the virtual IP must be assigned from that shared network subnet.
In practical terms, this means that they are all connected via a switch, with no router in between them.
Note that the virtual IP election depends on etcd being up, as Talos uses etcd for elections and leadership (control) of the IP address.
The virtual IP is not restricted by ports - you can access any port that the control plane nodes are listening on, on that IP address.
Thus it is possible to access the Talos API over the VIP, but it is not recommended, as you cannot access the VIP when etcd is down - and then you could not access the Talos API to recover etcd.
Video Walkthrough
To see a live demo of this writeup, see the video below:
Choose your Shared IP
The Virtual IP should be a reserved, unused IP address in the same subnet as
your controlplane nodes.
It should not be assigned or assignable by your DHCP server.
For our example, we will assume that the controlplane nodes have the following
IP addresses:
192.168.0.10
192.168.0.11
192.168.0.12
We then choose our shared IP to be:
192.168.0.15
Configure your Talos Machines
The shared IP setting is only valid for controlplane nodes.
For the example above, each of the controlplane nodes should have the following
Machine Config snippet:
If the machine has a single network interface, it can be selected using a dummy device selector:
machine:
network:
interfaces:
- deviceSelector:
physical: true# should select any hardware network device, if you have just one, it will be selecteddhcp: truevip:
ip: 192.168.0.15
Caveats
Since VIP functionality relies on etcd for elections, the shared IP will not come
alive until after you have bootstrapped Kubernetes.
Don’t use the VIP as the endpoint in the talosconfig, as the VIP is bound to etcd and kube-apiserver health, and you will not be able to recover from a failure of either of those components using Talos API.
2.4.9 - Wireguard Network
A guide on how to set up Wireguard network using Kernel module.
Configuring Wireguard Network
Quick Start
The quickest way to try out Wireguard is to use talosctl cluster create command:
It will automatically generate Wireguard network configuration for each node with the following network topology:
Where all controlplane nodes will be used as Wireguard servers which listen on port 51111.
All controlplanes and workers will connect to all controlplanes.
It also sets PersistentKeepalive to 5 seconds to establish controlplanes to workers connection.
After the cluster is deployed it should be possible to verify Wireguard network connectivity.
It is possible to deploy a container with hostNetwork enabled, then do kubectl exec <container> /bin/bash and either do:
ping 10.1.0.2
Or install wireguard-tools package and run:
wg show
Wireguard show should output something like this:
interface: wg0
public key: OMhgEvNIaEN7zeCLijRh4c+0Hwh3erjknzdyvVlrkGM= private key: (hidden) listening port: 47946peer: 1EsxUygZo8/URWs18tqB5FW2cLVlaTA+lUisKIf8nh4= endpoint: 10.5.0.2:51111
allowed ips: 10.1.0.0/24
latest handshake: 1 minute, 55 seconds ago
transfer: 3.17 KiB received, 3.55 KiB sent
persistent keepalive: every 5 seconds
It is also possible to use generated configuration as a reference by pulling generated config files using:
All Wireguard configuration can be done by changing Talos machine config files.
As an example we will use this official Wireguard quick start tutorial.
Key Generation
This part is exactly the same:
wg genkey | tee privatekey | wg pubkey > publickey
Setting up Device
Inline comments show relations between configs and wg quickstart tutorial commands:
...
network:
interfaces:
...
# ip link add dev wg0 type wireguard - interface: wg0
mtu: 1500# ip address add dev wg0 192.168.2.1/24addresses:
- 192.168.2.1/24
# wg set wg0 listen-port 51820 private-key /path/to/private-key peer ABCDEF... allowed-ips 192.168.88.0/24 endpoint 209.202.254.14:8172wireguard:
privateKey: <privatekey file contents>
listenPort: 51820peers:
allowedIPs:
- 192.168.88.0/24
endpoint: 209.202.254.14.8172publicKey: ABCDEF...
...
When networkd gets this configuration it will create the device, configure it and will bring it up (equivalent to ip link set up dev wg0).
Talos Linux includes node-discovery capabilities that depend on a discovery registry.
This allows you to see the members of your cluster, and the associated IP addresses of the nodes.
talosctl get members
NODE NAMESPACE TYPE ID VERSION HOSTNAME MACHINE TYPE OS ADDRESSES
10.5.0.2 cluster Member talos-default-controlplane-1 1 talos-default-controlplane-1 controlplane Talos (v1.2.3)["10.5.0.2"]10.5.0.2 cluster Member talos-default-worker-1 1 talos-default-worker-1 worker Talos (v1.2.3)["10.5.0.3"]
There are currently two supported discovery services: a Kubernetes registry (which stores data in the cluster’s etcd service) and an external registry service.
Sidero Labs runs a public external registry service, which is enabled by default.
The Kubernetes registry service is disabled by default.
The advantage of the external registry service is that it is not dependent on etcd, and thus can inform you of cluster membership even when Kubernetes is down.
Note: Kubernetes registry is deprecated as it is not compatible with Kubernetes 1.32 and later versions in the default configuration.
Video Walkthrough
To see a live demo of Cluster Discovery, see the video below:
Registries
Peers are aggregated from enabled registries.
By default, Talos will use the service registry, while the kubernetes registry is disabled.
To disable a registry, set disabled to true (this option is the same for all registries):
For example, to disable the service registry:
Note: Starting with Kubernetes 1.32, the feature gate AuthorizeNodeWithSelectors enables additional authorization for Node resource read access via system:node:* role.
This prevents Talos Kubernetes registry from functioning correctly.
The workaround is to disable the feature gate on the API server, but it’s not recommended as it disables also other important security protections.
For this reason, the Kubernetes registry is deprecated and disabled by default.
Discovery Service Registry
The Service registry by default uses a public external Discovery Service to exchange encrypted information about cluster members.
Note: Talos supports operations when Discovery Service is disabled, but some features will rely on Kubernetes API availability to discover
controlplane endpoints, so in case of a failure disabled Discovery Service makes troubleshooting much harder.
Sidero Labs maintains a public discovery service at https://discovery.talos.dev/ whereby cluster members use a shared key that is globally unique to coordinate basic connection information (i.e. the set of possible “endpoints”, or IP:port pairs).
We call this data “affiliate data.”
This data is encrypted by Talos Linux before being sent to the discovery service, and it can only be decrypted by the cluster members.
Note: If KubeSpan is enabled the data has the addition of the WireGuard public key.
Data sent to the discovery service is encrypted with AES-GCM encryption and endpoint data is separately encrypted with AES in ECB mode so that endpoints coming from different sources can be deduplicated server-side.
Each node submits its own data, plus the endpoints it sees from other peers, to the discovery service.
The discovery service aggregates the data, deduplicates the endpoints, and sends updates to each connected peer.
Each peer receives information back from the discovery service, decrypts it and uses it to drive KubeSpan and cluster discovery.
Data is stored in memory only (and snapshotted to disk in encrypted way to facilitate quick recovery on restarts).
The cluster ID is used as a key to select the affiliates (so that different clusters see different affiliates).
To summarize, the discovery service knows the client version, cluster ID, the number of affiliates, some encrypted data for each affiliate, and a list of encrypted endpoints.
The discovery service doesn’t see actual node information – it only stores and updates encrypted blobs.
Discovery data is encrypted/decrypted by the clients – the cluster members.
The discovery service does not have the encryption key.
The discovery service may, with a commercial license, be operated by your organization and can be downloaded here.
In order for nodes to communicate to the discovery service, they must be able to connect to it on TCP port 443.
Resource Definitions
Talos provides resources that can be used to introspect the discovery and KubeSpan features.
Discovery
Identities
The node’s unique identity (base62 encoded random 32 bytes) can be obtained with:
Note: Using base62 allows the ID to be URL encoded without having to use the ambiguous URL-encoding version of base64.
$ talosctl get identities -o yaml
...
spec:
nodeId: Utoh3O0ZneV0kT2IUBrh7TgdouRcUW2yzaaMl4VXnCd
Node identity is used as the unique Affiliate identifier.
Node identity resource is preserved in the STATE partition in node-identity.yaml file.
Node identity is preserved across reboots and upgrades, but it is regenerated if the node is reset (wiped).
Affiliates
An affiliate is a proposed member: the node has the same cluster ID and secret.
$ talosctl get affiliates
ID VERSION HOSTNAME MACHINE TYPE ADDRESSES
2VfX3nu67ZtZPl57IdJrU87BMjVWkSBJiL9ulP9TCnF 2 talos-default-controlplane-2 controlplane ["172.20.0.3","fd83:b1f7:fcb5:2802:986b:7eff:fec5:889d"]6EVq8RHIne03LeZiJ60WsJcoQOtttw1ejvTS6SOBzhUA 2 talos-default-worker-1 worker ["172.20.0.5","fd83:b1f7:fcb5:2802:cc80:3dff:fece:d89d"]NVtfu1bT1QjhNq5xJFUZl8f8I8LOCnnpGrZfPpdN9WlB 2 talos-default-worker-2 worker ["172.20.0.6","fd83:b1f7:fcb5:2802:2805:fbff:fe80:5ed2"]Utoh3O0ZneV0kT2IUBrh7TgdouRcUW2yzaaMl4VXnCd 4 talos-default-controlplane-1 controlplane ["172.20.0.2","fd83:b1f7:fcb5:2802:8c13:71ff:feaf:7c94"]b3DebkPaCRLTLLWaeRF1ejGaR0lK3m79jRJcPn0mfA6C 2 talos-default-controlplane-3 controlplane ["172.20.0.4","fd83:b1f7:fcb5:2802:248f:1fff:fe5c:c3f"]
One of the Affiliates with the ID matching node identity is populated from the node data, other Affiliates are pulled from the registries.
Enabled discovery registries run in parallel and discovered data is merged to build the list presented above.
Details about data coming from each registry can be queried from the cluster-raw namespace:
Each Affiliate ID is prefixed with k8s/ for data coming from the Kubernetes registry and with service/ for data coming from the discovery service.
Members
A member is an affiliate that has been approved to join the cluster.
The members of the cluster can be obtained with:
$ talosctl get members
ID VERSION HOSTNAME MACHINE TYPE OS ADDRESSES
talos-default-controlplane-1 2 talos-default-controlplane-1 controlplane Talos (v1.9.0)["172.20.0.2","fd83:b1f7:fcb5:2802:8c13:71ff:feaf:7c94"]talos-default-controlplane-2 1 talos-default-controlplane-2 controlplane Talos (v1.9.0)["172.20.0.3","fd83:b1f7:fcb5:2802:986b:7eff:fec5:889d"]talos-default-controlplane-3 1 talos-default-controlplane-3 controlplane Talos (v1.9.0)["172.20.0.4","fd83:b1f7:fcb5:2802:248f:1fff:fe5c:c3f"]talos-default-worker-1 1 talos-default-worker-1 worker Talos (v1.9.0)["172.20.0.5","fd83:b1f7:fcb5:2802:cc80:3dff:fece:d89d"]talos-default-worker-2 1 talos-default-worker-2 worker Talos (v1.9.0)["172.20.0.6","fd83:b1f7:fcb5:2802:2805:fbff:fe80:5ed2"]
2.6 - Interactive Dashboard
A tool to inspect the running Talos machine state on the physical video console.
Interactive dashboard is enabled for all Talos platforms except for SBC images.
The dashboard can be disabled with kernel parameter talos.dashboard.disabled=1.
The dashboard runs only on the physical video console (not serial console) on the 2nd virtual TTY.
The first virtual TTY shows kernel logs same as in Talos <1.4.0.
The virtual TTYs can be switched with <Alt+F1> and <Alt+F2> keys.
Keys <F1> - <Fn> can be used to switch between different screens of the dashboard.
The dashboard is using either UEFI framebuffer or VGA/VESA framebuffer (for legacy BIOS boot).
For legacy BIOS boot screen resolution can be controlled with the vga= kernel parameter.
Summary Screen (F1)
Interactive Dashboard Summary Screen
The header shows brief information about the node:
hostname
Talos version
uptime
CPU and memory hardware information
CPU and memory load, number of processes
Table view presents summary information about the machine:
UUID (from SMBIOS data)
Cluster name (when the machine config is available)
the leftmost section provides a way to enter network configuration: hostname, DNS and NTP servers, configure the network interface either via DHCP or static IP address, etc.
the middle section shows the current network configuration.
the rightmost section shows the network configuration which will be applied after pressing “Save” button.
Once the platform network configuration is saved, it is immediately applied to the machine.
2.7 - Resetting a Machine
Steps on how to reset a Talos Linux machine to a clean state.
From time to time, it may be beneficial to reset a Talos machine to its “original” state.
Bear in mind that this is a destructive action for the given machine.
Doing this means removing the machine from Kubernetes, etcd (if applicable), and clears any data on the machine that would normally persist a reboot.
CLI
WARNING: Running a talosctl reset on cloud VM’s might result in the VM being unable to boot as this wipes the entire disk.
It might be more useful to just wipe the STATE and EPHEMERAL partitions on a cloud VM if not booting via iPXE.
talosctl reset --system-labels-to-wipe STATE --system-labels-to-wipe EPHEMERAL
The API command for doing this is talosctl reset.
There are a couple of flags as part of this command:
Flags:
--graceful if true, attempt to cordon/drain node and leave etcd (if applicable)(default true) --reboot if true, reboot the node after resetting instead of shutting down
--system-labels-to-wipe strings if set, just wipe selected system disk partitions by label but keep other partitions intact keep other partitions intact
The graceful flag is especially important when considering HA vs. non-HA Talos clusters.
If the machine is part of an HA cluster, a normal, graceful reset should work just fine right out of the box as long as the cluster is in a good state.
However, if this is a single node cluster being used for testing purposes, a graceful reset is not an option since Etcd cannot be “left” if there is only a single member.
In this case, reset should be used with --graceful=false to skip performing checks that would normally block the reset.
Kernel Parameter
Another way to reset a machine is to specify talos.experimental.wipe=system kernel parameter.
If the machine got stuck in the boot loop and you access to the console you can use GRUB to specify this kernel argument.
Then when Talos boots for the next time it will reset system disk and reboot.
Next steps can be to install Talos either using PXE boot or by mounting an ISO.
2.8 - Upgrading Talos Linux
Guide to upgrading a Talos Linux machine.
OS upgrades are effected by an API call, which can be sent via the talosctl CLI utility.
The upgrade API call passes a node the installer image to use to perform the upgrade.
Each Talos version has a corresponding installer image, listed on the release page for the version, for example v1.9.0.
Upgrades use an A-B image scheme in order to facilitate rollbacks.
This scheme retains the previous Talos kernel and OS image following each upgrade.
If an upgrade fails to boot, Talos will roll back to the previous version.
Likewise, Talos may be manually rolled back via API (or talosctl rollback), which will update the boot reference and reboot.
Unless explicitly told to preserve data, an upgrade will cause the node to wipe the EPHEMERAL partition, remove itself from the etcd cluster (if it is a controlplane node), and make itself as pristine as is possible.
(This is the desired behavior except in specialised use cases such as single-node clusters.)
Note An upgrade of the Talos Linux OS will not (since v1.0) apply an upgrade to the Kubernetes version by default.
Kubernetes upgrades should be managed separately per upgrading kubernetes.
Supported Upgrade Paths
Because Talos Linux is image based, an upgrade is almost the same as installing Talos, with the difference that the system has already been initialized with a configuration.
The supported configuration may change between versions.
The upgrade process should handle such changes transparently, but this migration is only tested between adjacent minor releases.
Thus the recommended upgrade path is to always upgrade to the latest patch release of all intermediate minor releases.
For example, if upgrading from Talos 1.0 to Talos 1.2.4, the recommended upgrade path would be:
upgrade from 1.0 to latest patch of 1.0 - to v1.0.6
upgrade from v1.0.6 to latest patch of 1.1 - to v1.1.2
upgrade from v1.1.2 to v1.2.4
Before Upgrade to v1.9.0
Talos 1.9 replaces eudev with systemd-udev as the udevd provider, which might lead to changes of the predictable network interface names.
Video Walkthrough
To see a live demo of an upgrade of Talos Linux, see the video below:
After Upgrade to v1.9.0
There are no specific actions to be taken after an upgrade.
talosctl upgrade
To upgrade a Talos node, specify the node’s IP address and the
installer container image for the version of Talos to upgrade to.
For instance, if your Talos node has the IP address 10.20.30.40 and you want
to install the current version, you would enter a command such
as:
There is an option to this command: --preserve, which will explicitly tell Talos to keep ephemeral data intact.
In most cases, it is correct to let Talos perform its default action of erasing the ephemeral data.
However, for a single-node control-plane, make sure that --preserve=true.
Rarely, an upgrade command will fail due to a process holding a file open on disk.
In these cases, you can use the --stage flag.
This puts the upgrade artifacts on disk, and adds some metadata to a disk partition that gets checked very early in the boot process, then reboots the node.
On the reboot, Talos sees that it needs to apply an upgrade, and will do so immediately.
Because this occurs in a just rebooted system, there will be no conflict with any files being held open.
After the upgrade is applied, the node will reboot again, in order to boot into the new version.
Note that because Talos Linux reboots via the kexec syscall, the extra reboot adds very little time.
When a Talos node receives the upgrade command, it cordons
itself in Kubernetes, to avoid receiving any new workload.
It then starts to drain its existing workload.
NOTE: If any of your workloads are sensitive to being shut down ungracefully, be sure to use the lifecycle.preStop Pod spec.
Once all of the workload Pods are drained, Talos will start shutting down its
internal processes.
If it is a control node, this will include etcd.
If preserve is not enabled, Talos will leave etcd membership.
(Talos ensures the etcd cluster is healthy and will remain healthy after our node leaves the etcd cluster, before allowing a control plane node to be upgraded.)
Once all the processes are stopped and the services are shut down, the filesystems will be unmounted.
This allows Talos to produce a very clean upgrade, as close as possible to a pristine system.
We verify the disk and then perform the actual image upgrade.
We set the bootloader to boot once with the new kernel and OS image, then we reboot.
After the node comes back up and Talos verifies itself, it will make
the bootloader change permanent, rejoin the cluster, and finally uncordon itself to receive new workloads.
FAQs
Q. What happens if an upgrade fails?
A. Talos Linux attempts to safely handle upgrade failures.
The most common failure is an invalid installer image reference.
In this case, Talos will fail to download the upgraded image and will abort the upgrade.
Sometimes, Talos is unable to successfully kill off all of the disk access points, in which case it cannot safely unmount all filesystems to effect the upgrade.
In this case, it will abort the upgrade and reboot.
(upgrade --stage can ensure that upgrades can occur even when the filesytems cannot be unmounted.)
It is possible (especially with test builds) that the upgraded Talos system will fail to start.
In this case, the node will be rebooted, and the bootloader will automatically use the previous Talos kernel and image, thus effectively rolling back the upgrade.
Lastly, it is possible that Talos itself will upgrade successfully, start up, and rejoin the cluster but your workload will fail to run on it, for whatever reason.
This is when you would use the talosctl rollback command to revert back to the previous Talos version.
Q. Can upgrades be scheduled?
A. Because the upgrade sequence is API-driven, you can easily tie it in to your own business logic to schedule and coordinate your upgrades.
Q. Can the upgrade process be observed?
A. Yes, using the talosctl dmesg -f command.
You can also use talosctl upgrade --wait, and optionally talosctl upgrade --wait --debug to observe kernel logs
Q. Are worker node upgrades handled differently from control plane node upgrades?
A. Short answer: no.
Long answer: Both node types follow the same set procedure.
From the user’s standpoint, however, the processes are identical.
However, since control plane nodes run additional services, such as etcd, there are some extra steps and checks performed on them.
For instance, Talos will refuse to upgrade a control plane node if that upgrade would cause a loss of quorum for etcd.
If multiple control plane nodes are asked to upgrade at the same time, Talos will protect the Kubernetes cluster by ensuring only one control plane node actively upgrades at any time, via checking etcd quorum.
If running a single-node cluster, and you want to force an upgrade despite the loss of quorum, you can set preserve to true.
Q. Can I break my cluster by upgrading everything at once?
A. Possibly - it’s not recommended.
Nothing prevents the user from sending near-simultaneous upgrades to each node of the cluster - and while Talos Linux and Kubernetes can generally deal with this situation, other components of the cluster may not be able to recover from more than one node rebooting at a time.
(e.g. any software that maintains a quorum or state across nodes, such as Rook/Ceph)
Q. Which version of talosctl should I use to update a cluster?
A. We recommend using the version that matches the current running version of the cluster.
3 - Kubernetes Guides
Management of a Kubernetes Cluster hosted by Talos Linux
3.1 - Configuration
How to configure components of the Kubernetes cluster itself.
3.1.1 - Ceph Storage cluster with Rook
Guide on how to create a simple Ceph storage cluster with Rook for Kubernetes
Preparation
Talos Linux reserves an entire disk for the OS installation, so machines with multiple available disks are needed for a reliable Ceph cluster with Rook and Talos Linux.
Rook requires that the block devices or partitions used by Ceph have no partitions or formatted filesystems before use.
Rook also requires a minimum Kubernetes version of v1.16 and Helm v3.0 for installation of charts.
It is highly recommended that the Rook Ceph overview is read and understood before deploying a Ceph cluster with Rook.
Installation
Creating a Ceph cluster with Rook requires two steps; first the Rook Operator needs to be installed which can be done with a Helm Chart.
The example below installs the Rook Operator into the rook-ceph namespace, which is the default for a Ceph cluster with Rook.
$ helm repo add rook-release https://charts.rook.io/release
"rook-release" has been added to your repositories
$ helm install --create-namespace --namespace rook-ceph rook-ceph rook-release/rook-ceph
W0327 17:52:44.277830 54987 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0327 17:52:44.612243 54987 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
NAME: rook-ceph
LAST DEPLOYED: Sun Mar 27 17:52:42 2022NAMESPACE: rook-ceph
STATUS: deployed
REVISION: 1TEST SUITE: None
NOTES:
The Rook Operator has been installed. Check its status by running:
kubectl --namespace rook-ceph get pods -l "app=rook-ceph-operator"Visit https://rook.io/docs/rook/latest for instructions on how to create and configure Rook clusters
Important Notes:
- You must customize the 'CephCluster' resource in the sample manifests for your cluster.
- Each CephCluster must be deployed to its own namespace, the samples use `rook-ceph`for the namespace.
- The sample manifests assume you also installed the rook-ceph operator in the `rook-ceph` namespace.
- The helm chart includes all the RBAC required to create a CephCluster CRD in the same namespace.
- Any disk devices you add to the cluster in the 'CephCluster' must be empty (no filesystem and no partitions).
Default PodSecurity configuration prevents execution of priviledged pods.
Adding a label to the namespace will allow ceph to start.
Once that is complete, the Ceph cluster can be installed with the official Helm Chart.
The Chart can be installed with default values, which will attempt to use all nodes in the Kubernetes cluster, and all unused disks on each node for Ceph storage, and make available block storage, object storage, as well as a shared filesystem.
Generally more specific node/device/cluster configuration is used, and the Rook documentation explains all the available options in detail.
For this example the defaults will be adequate.
$ helm install --create-namespace --namespace rook-ceph rook-ceph-cluster --set operatorNamespace=rook-ceph rook-release/rook-ceph-cluster
NAME: rook-ceph-cluster
LAST DEPLOYED: Sun Mar 27 18:12:46 2022NAMESPACE: rook-ceph
STATUS: deployed
REVISION: 1TEST SUITE: None
NOTES:
The Ceph Cluster has been installed. Check its status by running:
kubectl --namespace rook-ceph get cephcluster
Visit https://rook.github.io/docs/rook/latest/ceph-cluster-crd.html for more information about the Ceph CRD.
Important Notes:
- You can only deploy a single cluster per namespace
- If you wish to delete this cluster and start fresh, you will also have to wipe the OSD disks using `sfdisk`
Now the Ceph cluster configuration has been created, the Rook operator needs time to install the Ceph cluster and bring all the components online.
The progression of the Ceph cluster state can be followed with the following command.
$ watch kubectl --namespace rook-ceph get cephcluster rook-ceph
Every 2.0s: kubectl --namespace rook-ceph get cephcluster rook-ceph
NAME DATADIRHOSTPATH MONCOUNT AGE PHASE MESSAGE HEALTH EXTERNAL
rook-ceph /var/lib/rook 3 57s Progressing Configuring Ceph Mons
Depending on the size of the Ceph cluster and the availability of resources the Ceph cluster should become available, and with it the storage classes that can be used with Kubernetes Physical Volumes.
$ kubectl --namespace rook-ceph get cephcluster rook-ceph
NAME DATADIRHOSTPATH MONCOUNT AGE PHASE MESSAGE HEALTH EXTERNAL
rook-ceph /var/lib/rook 3 40m Ready Cluster created successfully HEALTH_OK
$ kubectl get storageclass
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
ceph-block (default) rook-ceph.rbd.csi.ceph.com Delete Immediate true 77m
ceph-bucket rook-ceph.ceph.rook.io/bucket Delete Immediate false 77m
ceph-filesystem rook-ceph.cephfs.csi.ceph.com Delete Immediate true 77m
Talos Linux Considerations
It is important to note that a Rook Ceph cluster saves cluster information directly onto the node (by default dataDirHostPath is set to /var/lib/rook).
If running only a single mon instance, cluster management is little bit more involved, as any time a Talos Linux node is reconfigured or upgraded, the partition that stores the /varfile system is wiped, but the --preserve option of talosctl upgrade will ensure that doesn’t happen.
By default, Rook configues Ceph to have 3 mon instances, in which case the data stored in dataDirHostPath can be regenerated from the other mon instances.
So when performing maintenance on a Talos Linux node with a Rook Ceph cluster (e.g. upgrading the Talos Linux version), it is imperative that care be taken to maintain the health of the Ceph cluster.
Before upgrading, you should always check the health status of the Ceph cluster to ensure that it is healthy.
$ kubectl --namespace rook-ceph get cephclusters.ceph.rook.io rook-ceph
NAME DATADIRHOSTPATH MONCOUNT AGE PHASE MESSAGE HEALTH EXTERNAL
rook-ceph /var/lib/rook 3 98m Ready Cluster created successfully HEALTH_OK
If it is, you can begin the upgrade process for the Talos Linux node, during which time the Ceph cluster will become unhealthy as the node is reconfigured.
Before performing any other action on the Talos Linux nodes, the Ceph cluster must return to a healthy status.
$ talosctl upgrade --nodes 172.20.15.5 --image ghcr.io/talos-systems/installer:v0.14.3
NODE ACK STARTED
172.20.15.5 Upgrade request received 2022-03-27 20:29:55.292432887 +0200 CEST m=+10.050399758
$ kubectl --namespace rook-ceph get cephclusters.ceph.rook.io
NAME DATADIRHOSTPATH MONCOUNT AGE PHASE MESSAGE HEALTH EXTERNAL
rook-ceph /var/lib/rook 3 99m Progressing Configuring Ceph Mgr(s) HEALTH_WARN
$ kubectl --namespace rook-ceph wait --timeout=1800s --for=jsonpath='{.status.ceph.health}=HEALTH_OK' rook-ceph
cephcluster.ceph.rook.io/rook-ceph condition met
The above steps need to be performed for each Talos Linux node undergoing maintenance, one at a time.
Cleaning Up
Rook Ceph Cluster Removal
Removing a Rook Ceph cluster requires a few steps, starting with signalling to Rook that the Ceph cluster is really being destroyed.
Then all Persistent Volumes (and Claims) backed by the Ceph cluster must be deleted, followed by the Storage Classes and the Ceph storage types.
If the Rook Operator is cleanly removed following the above process, the node metadata and disks should be clean and ready to be re-used.
In the case of an unclean cluster removal, there may be still a few instances of metadata stored on the system disk, as well as the partition information on the storage disks.
First the node metadata needs to be removed, make sure to update the nodeName with the actual name of a storage node that needs cleaning, and path with the Rook configuration dataDirHostPath set when installing the chart.
The following will need to be repeated for each node used in the Rook Ceph cluster.
$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: disk-clean
spec:
restartPolicy: Never
nodeName: <storage-node-name>
volumes:
- name: rook-data-dir
hostPath:
path: <dataDirHostPath>
containers:
- name: disk-clean
image: busybox
securityContext:
privileged: true
volumeMounts:
- name: rook-data-dir
mountPath: /node/rook-data
command: ["/bin/sh", "-c", "rm -rf /node/rook-data/*"]
EOFpod/disk-clean created
$ kubectl wait --timeout=900s --for=jsonpath='{.status.phase}=Succeeded' pod disk-clean
pod/disk-clean condition met
$ kubectl delete pod disk-clean
pod "disk-clean" deleted
Lastly, the disks themselves need the partition and filesystem data wiped before they can be reused.
Again, the following as to be repeated for each node and disk used in the Rook Ceph cluster, updating nodeName and of= in the command as needed.
$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: disk-wipe
spec:
restartPolicy: Never
nodeName: <storage-node-name>
containers:
- name: disk-wipe
image: busybox
securityContext:
privileged: true
command: ["/bin/sh", "-c", "dd if=/dev/zero bs=1M count=100 oflag=direct of=<device>"]
EOFpod/disk-wipe created
$ kubectl wait --timeout=900s --for=jsonpath='{.status.phase}=Succeeded' pod disk-wipe
pod/disk-wipe condition met
$ kubectl delete pod disk-wipe
pod "disk-wipe" deleted
3.1.2 - Deploying Metrics Server
In this guide you will learn how to set up metrics-server.
Metrics Server enables use of the Horizontal Pod Autoscaler and Vertical Pod Autoscaler.
It does this by gathering metrics data from the kubelets in a cluster.
By default, the certificates in use by the kubelets will not be recognized by metrics-server.
This can be solved by either configuring metrics-server to do no validation of the TLS certificates, or by modifying the kubelet configuration to rotate its certificates and use ones that will be recognized by metrics-server.
Node Configuration
To enable kubelet certificate rotation, all nodes should have the following Machine Config snippet:
We will want to ensure that new certificates for the kubelets are approved automatically.
This can easily be done with the Kubelet Serving Certificate Approver, which will automatically approve the Certificate Signing Requests generated by the kubelets.
We can have Kubelet Serving Certificate Approver and metrics-server installed on the cluster automatically during bootstrap by adding the following snippet to the Cluster Config of the node that will be handling the bootstrap process:
If you choose not to use extraManifests to install Kubelet Serving Certificate Approver and metrics-server during bootstrap, you can install them once the cluster is online using kubectl:
In this guide you will learn how to expose host devices to the Kubernetes pods.
Kubernetes Device Plugins can be used to expose host devices to the Kubernetes pods.
This guide will show you how to deploy a device plugin to your Talos cluster.
In this guide, we will use Kubernetes Generic Device Plugin, but there are other implementations available.
Deploying the Device Plugin
The Kubernetes Generic Device Plugin is a DaemonSet that runs on each node in the cluster, exposing the devices to the pods.
The device plugin is configured with a list of devices to expose, e.g.
--device='{"name": "video", "groups": [{"paths": [{"path": "/dev/video0"}]}]}.
In this guide, we will demonstrate how to deploy the device plugin with a configuration that exposes the /dev/net/tun device.
This device is commonly used for user-space Wireguard, including Tailscale.
Once the device plugin is deployed, you can verify that the nodes have a new resource: squat.ai/tun (the tun name comes from the name of the group in the device plugin configuration).:
Now that the device plugin is deployed, you can deploy a pod that requests the device.
The request for the device is specified as a resource in the pod spec.
requests:
limits:
squat.ai/tun: "1"
Here is an example non-privileged pod spec that requests the /dev/net/tun device:
Automatically provision iSCSI volumes on a Synology NAS with the synology-csi driver.
Background
Synology is a company that specializes in Network Attached Storage (NAS) devices.
They provide a number of features within a simple web OS, including an LDAP server, Docker support, and (perhaps most relevant to this guide) function as an iSCSI host.
The focus of this guide is to allow a Kubernetes cluster running on Talos to provision Kubernetes storage (both dynamic or static) on a Synology NAS using a direct integration, rather than relying on an intermediary layer like Rook/Ceph or Maystor.
This guide assumes a very basic familiarity with iSCSI terminology (LUN, iSCSI target, etc.).
Prerequisites
Synology NAS running DSM 7.0 or above
Provisioned Talos cluster running Kubernetes v1.20 or above
The synology-csi controller interacts with your NAS in two different ways: via the API and via the iSCSI protocol.
Actions such as creating a new iSCSI target or deleting an old one are accomplished via the Synology API, and require administrator access.
On the other hand, mounting the disk to a pod and reading from / writing to it will utilize iSCSI.
Because you can only authenticate with one account per DSM configured, that account needs to have admin privileges.
In order to minimize access in the case of these credentials being compromised, you should configure the account with the lease possible amount of access – explicitly specify “No Access” on all volumes when configuring the user permissions.
Setting up the Synology CSI
Note: this guide is paraphrased from the Synology CSI readme.
Please consult the readme for more in-depth instructions and explanations.
While Synology provides some automated scripts to deploy the CSI driver, they can be finicky especially when making changes to the source code.
We will be configuring and deploying things manually in this guide.
The relevant files we will be touching are in the following locations:
Use config/client-info-template.yml as an example to configure the connection information for DSM.
You can specify one or more storage systems on which the CSI volumes will be created.
See below for an example:
---
clients:
- host: 192.168.1.1# ipv4 address or domain of the DSMport: 5000# port for connecting to the DSMhttps: false# set this true to use https. you need to specify the port to DSM HTTPS port as wellusername: username # usernamepassword: password # password
Create a Kubernetes secret using the client information config file.
Note that if you rename the secret to something other than client-info-secret, make sure you update the corresponding references in the deployment manifests as well.
Build the Talos-compatible image
Modify the Makefile so that the image is built and tagged under your GitHub Container Registry username:
REGISTRY_NAME=ghcr.io/<username>
When you run make docker-build or make docker-build-multiarch, it will push the resulting image to ghcr.io/<username>/synology-csi:v1.1.0.
Ensure that you find and change any reference to synology/synology-csi:v1.1.0 to point to your newly-pushed image within the deployment manifests.
Configure the CSI driver
By default, the deployment manifests include one storage class and one volume snapshot class.
See below for examples:
It can be useful to configure multiple different StorageClasses.
For example, a popular strategy is to create two nearly identical StorageClasses, with one configured with reclaimPolicy: Retain and the other with reclaimPolicy: Delete.
Alternately, a workload may require a specific filesystem, such as ext4.
If a Synology NAS is going to be the most common way to configure storage on your cluster, it can be convenient to add the storageclass.kubernetes.io/is-default-class: "true" annotation to one of your StorageClasses.
The following table details the configurable parameters for the Synology StorageClass.
Name
Type
Description
Default
Supported protocols
dsm
string
The IPv4 address of your DSM, which must be included in the client-info.yml for the CSI driver to log in to DSM
-
iSCSI, SMB
location
string
The location (/volume1, /volume2, …) on DSM where the LUN for PersistentVolume will be created
-
iSCSI, SMB
fsType
string
The formatting file system of the PersistentVolumes when you mount them on the pods. This parameter only works with iSCSI. For SMB, the fsType is always ‘cifs‘.
ext4
iSCSI
protocol
string
The backing storage protocol. Enter ‘iscsi’ to create LUNs or ‘smb‘ to create shared folders on DSM.
iscsi
iSCSI, SMB
csi.storage.k8s.io/node-stage-secret-name
string
The name of node-stage-secret. Required if DSM shared folder is accessed via SMB.
-
SMB
csi.storage.k8s.io/node-stage-secret-namespace
string
The namespace of node-stage-secret. Required if DSM shared folder is accessed via SMB.
-
SMB
The VolumeSnapshotClass can be similarly configured with the following parameters:
Name
Type
Description
Default
Supported protocols
description
string
The description of the snapshot on DSM
-
iSCSI
is_locked
string
Whether you want to lock the snapshot on DSM
false
iSCSI, SMB
Apply YAML manifests
Once you have created the desired StorageClass(es) and VolumeSnapshotClass(es), the final step is to apply the Kubernetes manifests against the cluster.
The easiest way to apply them all at once is to create a kustomization.yaml file in the same directory as the manifests and use Kustomize to apply:
kubectl apply -k path/to/manifest/directory
Alternately, you can apply each manifest one-by-one:
kubectl apply -f <file>
Run performance tests
In order to test the provisioning, mounting, and performance of using a Synology NAS as Kubernetes persistent storage, use the following command:
Kubernetes pods running in CNI mode can use the kubernetes.default.svc service endpoint to access the Kubernetes API server,
while pods running in host networking mode can only use the external cluster endpoint to access the Kubernetes API server.
Kubernetes controlplane components run in host networking mode, and it is critical for them to be able to access the Kubernetes API server,
same as CNI components (when CNI requires access to Kubernetes API).
The external cluster endpoint might be unavailable due to misconfiguration or network issues, or it might have higher latency than the internal endpoint.
A failure to access the Kubernetes API server might cause a series of issues in the cluster: pods are not scheduled, service IPs stop working, etc.
KubePrism feature solves this problem by enabling in-cluster highly-available controlplane endpoint on every node in the cluster.
Video Walkthrough
To see a live demo of this writeup, see the video below:
Enabling KubePrism
As of Talos 1.6, KubePrism is enabled by default with port 7445.
Note: the port specified should be available on every node in the cluster.
How it works
Talos spins up a TCP loadbalancer on every machine on the localhost on the specified port which automatically picks up one of the endpoints:
the external cluster endpoint as specified in the machine configuration
for controlplane machines: https://localhost:<api-server-local-port> (http://localhost:6443 in the default configuration)
https://<controlplane-address>:<api-server-port> for every controlplane machine (based on the information from Cluster Discovery)
KubePrism automatically filters out unhealthy (or unreachable) endpoints, and prefers lower-latency endpoints over higher-latency endpoints.
Talos automatically reconfigures kubelet, kube-scheduler and kube-controller-manager to use the KubePrism endpoint.
The kube-proxy manifest is also reconfigured to use the KubePrism endpoint by default, but when enabling KubePrism for a running cluster the manifest should be updated
with talosctl upgrade-k8s command.
When using CNI components that require access to the Kubernetes API server, the KubePrism endpoint should be passed to the CNI configuration (e.g. Cilium, Calico CNIs).
Notes
As the list of endpoints for KubePrism includes the external cluster endpoint, KubePrism in the worst case scenario will behave the same as the external cluster endpoint.
For controlplane nodes, the KubePrism should pick up the localhost endpoint of the kube-apiserver, minimizing the latency.
Worker nodes might use direct address of the controlplane endpoint if the latency is lower than the latency of the external cluster endpoint.
KubePrism listen endpoint is bound to localhost address, so it can’t be used outside the cluster.
3.1.6 - Local Storage
Using local storage for Kubernetes workloads.
Using local storage for Kubernetes workloads implies that the pod will be bound to the node where the local storage is available.
Local storage is not replicated, so in case of a machine failure contents of the local storage will be lost.
Note: when using EPHEMERAL Talos partition (/var), make sure to use --preserve set while performing upgrades, otherwise you risk losing data.
hostPath mounts
The simplest way to use local storage is to use hostPath mounts.
When using hostPath mounts, make sure the root directory of the mount is mounted into the kubelet container:
Both EPHEMERAL partition and user disks can be used for hostPath mounts.
Local Path Provisioner
Local Path Provisioner can be used to dynamically provision local storage.
Make sure to update its configuration to use a path under /var, e.g. /var/local-path-provisioner as the root path for the local storage.
(In Talos Linux default local path provisioner path /opt/local-path-provisioner is read-only).
For example, Local Path Provisioner can be installed using kustomize with the following configuration:
Put kustomization.yaml into a new directory, and run kustomize build | kubectl apply -f - to install Local Path Provisioner to a Talos Linux cluster.
There are three patches applied:
change default /opt/local-path-provisioner path to /var/local-path-provisioner
make local-path storage class the default storage class (optional)
label the local-path-storage namespace as privileged to allow privileged pods to be scheduled there
As for the hostPath mounts (see above), this will require the kubelet to bind mount the node’s folder you chose (eg: /var/local-path-provisioner).
Otherwise, you’ll have erratic behavior, especially when using the subPath statement in a volumeMount, which may lead to data loss and/or data never freed after PV deletion.
Enabling Pod Security Admission plugin to configure Pod Security Standards.
Kubernetes deprecated Pod Security Policy as of v1.21, and it was removed in v1.25.
Pod Security Policy was replaced with Pod Security Admission, which is enabled by default
starting with Kubernetes v1.23.
Talos Linux by default enables and configures Pod Security Admission plugin to enforce Pod Security Standards with the
baseline profile as the default enforced with the exception of kube-system namespace which enforces privileged profile.
Some applications (e.g. Prometheus node exporter or storage solutions) require more relaxed Pod Security Standards, which can be configured by either updating the Pod Security Admission plugin configuration,
or by using the pod-security.kubernetes.io/enforce label on the namespace level:
more strict restricted profile is not enforced, but API server warns about found issues
This default policy can be modified by updating the generated machine configuration before the cluster is created or on the fly by using the talosctl CLI utility.
Verify current admission plugin configuration with:
Create a deployment that satisfies the baseline policy but gives warnings on restricted policy:
$ kubectl create deployment nginx --image=nginx
Warning: would violate PodSecurity "restricted:latest": allowPrivilegeEscalation !=false(container "nginx" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "nginx" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot !=true(pod or container "nginx" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "nginx" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")deployment.apps/nginx created
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-85b98978db-j68l8 1/1 Running 0 2m3s
Create a daemonset which fails to meet requirements of the baseline policy:
$ kubectl apply -f debug.yaml
Warning: would violate PodSecurity "restricted:latest": host namespaces (hostNetwork=true, hostPID=true, hostIPC=true), privileged (container "debug-container" must not set securityContext.privileged=true), allowPrivilegeEscalation !=false(container "debug-container" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "debug-container" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot !=true(pod or container "debug-container" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "debug-container" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")daemonset.apps/debug-container created
Daemonset debug-container gets created, but no pods are scheduled:
$ kubectl get ds
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
debug-container 00000 <none> 34s
Pod Security Admission plugin errors are in the daemonset events:
$ kubectl describe ds debug-container
...
Warning FailedCreate 92s daemonset-controller Error creating: pods "debug-container-kwzdj" is forbidden: violates PodSecurity "baseline:latest": host namespaces (hostNetwork=true, hostPID=true, hostIPC=true), privileged (container "debug-container" must not set securityContext.privileged=true)
Pod Security Admission configuration can also be overridden on a namespace level:
$ kubectl label ns default pod-security.kubernetes.io/enforce=privileged
namespace/default labeled
$ kubectl get ds
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
debug-container 22020 <none> 4s
As enforce policy was updated to the privileged for the default namespace, debug-container is now successfully running.
3.1.8 - Replicated Local Storage
Using local storage with OpenEBS
If you want to use replicated storage leveraging disk space from a local disk with Talos Linux installed, OpenEBS is a great option.
Since OpenEBS is a replicated storage, it’s recommended to have at least three nodes where sufficient local disk space is available.
The documentation will follow installing OpenEBS via the offical Helm chart.
Since Talos is different from standard Operating Systems, the OpenEBS components need a little tweaking after the Helm installation.
Refer to the OpenEBS documentation if you need further customization.
NB: Also note that the Talos nodes need to be upgraded with --preserve set while running OpenEBS, otherwise you risk losing data.
Even though it’s possible to recover data from other replicas if the node is wiped during an upgrade, this can require extra operational knowledge to recover, so it’s highly recommended to use --preserve to avoid data loss.
Preparing the nodes
Depending on the version of OpenEBS, there is a hostPath mount with the path /var/openebs/local or /var/local/openebs.
This path should be mounted into the kubelet to make sure kubelet can access the directory.
Note: Replace the path in the YAML snippet below with the correct path for your OpenEBS version.
Create a machine config patch with the contents below and save as patch.yaml
This will create 4 storage classes.
The storage class named openebs-hostpath is used to create storage that is replicated across all of your nodes.
The storage class named openebs-single-replica is used to create hostpath PVCs that are not replicated.
The other 2 storageclasses, mayastor-etcd-localpv and mayastor-loki-localpv, are used by OpenEBS to create persistent volumes on nodes.
Patching the Namespace
when using the default Pod Security Admissions created by Talos you need the following labels on your namespace:
Using custom Seccomp Profiles with Kubernetes workloads.
Seccomp stands for secure computing mode and has been a feature of the Linux kernel since version 2.6.12.
It can be used to sandbox the privileges of a process, restricting the calls it is able to make from userspace into the kernel.
You can clean up the test resources by running the following command:
kubectl delete pod audit-pod
3.1.10 - Storage
Setting up storage for a Kubernetes cluster
In Kubernetes, using storage in the right way is well-facilitated by the API.
However, unless you are running in a major public cloud, that API may not be hooked up to anything.
This frequently sends users down a rabbit hole of researching all the various options for storage backends for their platform, for Kubernetes, and for their workloads.
There are a lot of options out there, and it can be fairly bewildering.
For Talos, we try to limit the options somewhat to make the decision-making easier.
Public Cloud
If you are running on a major public cloud, use their block storage.
It is easy and automatic.
Storage Clusters
Sidero Labs recommends having separate disks (apart from the Talos install disk) to be used for storage.
Redundancy, scaling capabilities, reliability, speed, maintenance load, and ease of use are all factors you must consider when managing your own storage.
Running a storage cluster can be a very good choice when managing your own storage, and there are two projects we recommend, depending on your situation.
If you need vast amounts of storage composed of more than a dozen or so disks, we recommend you use Rook to manage Ceph.
Also, if you need both mount-once and mount-many capabilities, Ceph is your answer.
Ceph also bundles in an S3-compatible object store.
The down side of Ceph is that there are a lot of moving parts.
Please note that most people should never use mount-many semantics.
NFS is pervasive because it is old and easy, not because it is a good idea.
While it may seem like a convenience at first, there are all manner of locking, performance, change control, and reliability concerns inherent in any mount-many situation, so we strongly recommend you avoid this method.
If your storage needs are small enough to not need Ceph, use Mayastor.
Rook/Ceph
Ceph is the grandfather of open source storage clusters.
It is big, has a lot of pieces, and will do just about anything.
It scales better than almost any other system out there, open source or proprietary, being able to easily add and remove storage over time with no downtime, safely and easily.
It comes bundled with RadosGW, an S3-compatible object store; CephFS, a NFS-like clustered filesystem; and RBD, a block storage system.
With the help of Rook, the vast majority of the complexity of Ceph is hidden away by a very robust operator, allowing you to control almost everything about your Ceph cluster from fairly simple Kubernetes CRDs.
So if Ceph is so great, why not use it for everything?
Ceph can be rather slow for small clusters.
It relies heavily on CPUs and massive parallelisation to provide good cluster performance, so if you don’t have much of those dedicated to Ceph, it is not going to be well-optimised for you.
Also, if your cluster is small, just running Ceph may eat up a significant amount of the resources you have available.
Troubleshooting Ceph can be difficult if you do not understand its architecture.
There are lots of acronyms and the documentation assumes a fair level of knowledge.
There are very good tools for inspection and debugging, but this is still frequently seen as a concern.
Mayastor
Mayastor is an OpenEBS project built in Rust utilising the modern NVMEoF system.
(Despite the name, Mayastor does not require you to have NVME drives.)
It is fast and lean but still cluster-oriented and cloud native.
Unlike most of the other OpenEBS project, it is not built on the ancient iSCSI system.
Unlike Ceph, Mayastor is just a block store.
It focuses on block storage and does it well.
It is much less complicated to set up than Ceph, but you probably wouldn’t want to use it for more than a few dozen disks.
Mayastor is new, maybe too new.
If you’re looking for something well-tested and battle-hardened, this is not it.
However, if you’re looking for something lean, future-oriented, and simpler than Ceph, it might be a great choice.
Video Walkthrough
To see a live demo of this section, see the video below:
Prep Nodes
Either during initial cluster creation or on running worker nodes, several machine config values should be edited.
(This information is gathered from the Mayastor documentation.)
We need to set the vm.nr_hugepages sysctl and add openebs.io/engine=mayastor labels to the nodes which are meant to be storage nodes.
This can be done with talosctl patch machineconfig or via config patches during talosctl gen config.
Some examples are shown below: modify as needed.
First create a config patch file named mayastor-patch.yaml with the following contents:
Note: If you are adding/updating the vm.nr_hugepages on a node which already had the openebs.io/engine=mayastor label set, you’d need to restart kubelet so that it picks up the new value, by issuing the following command
talosctl -n <node ip> service kubelet restart
Deploy Mayastor
Continue setting up Mayastor using the official documentation.
Note: The Mayastor helm chart uses an init container that checks for the nvme_tcp module.
It does not mount /sys and will not be able to find it.
Easiest solution is to disable the init container.
# Create device pool on a blank (no partition table!) disk on node01kubectl linstor physical-storage create-device-pool --pool-name nvme_lvm_pool LVM node01 /dev/nvme0n1 --storage-pool nvme_pool
NFS is an old pack animal long past its prime.
NFS is slow, has all kinds of bottlenecks involving contention, distributed locking, single points of service, and more.
However, it is supported by a wide variety of systems.
You don’t want to use it unless you have to, but unfortunately, that “have to” is too frequent.
The NFS client is part of the kubelet image maintained by the Talos team.
This means that the version installed in your running kubelet is the version of NFS supported by Talos.
You can reduce some of the contention problems by parceling Persistent Volumes from separate underlying directories.
Object storage
Ceph comes with an S3-compatible object store, but there are other options, as
well.
These can often be built on top of other storage backends.
For instance, you may have your block storage running with Mayastor but assign a
Pod a large Persistent Volume to serve your object store.
One of the most popular open source add-on object stores is MinIO.
Others (iSCSI)
The most common remaining systems involve iSCSI in one form or another.
These include the original OpenEBS, Rancher’s Longhorn, and many proprietary systems.
iSCSI in Linux is facilitated by open-iscsi.
This system was designed long before containers caught on, and it is not well
suited to the task, especially when coupled with a read-only host operating
system.
iSCSI support in Talos is now supported via the iscsi-toolssystem extension installed.
The extension enables compatibility with OpenEBS Jiva - refer to the local storage installation guide for more information.
3.1.11 - User Namespaces
Guide on how to configure Talos Cluster to support User Namespaces
User Namespaces are a feature of the Linux kernel that allows unprivileged users to have their own range of UIDs and GIDs, without needing to be root.
After applying the configuration, refer to the official documentation to configure workloads to use User Namespaces.
3.2 - Network
Managing the Kubernetes cluster networking
3.2.1 - Deploying Cilium CNI
In this guide you will learn how to set up Cilium CNI on Talos.
Cilium can be installed either via the cilium cli or using helm.
This documentation will outline installing Cilium CNI v1.14.0 on Talos in six different ways.
Adhering to Talos principles we’ll deploy Cilium with IPAM mode set to Kubernetes, and using the cgroupv2 and bpffs mount that talos already provides.
As Talos does not allow loading kernel modules by Kubernetes workloads, SYS_MODULE capability needs to be dropped from the Cilium default set of values, this override can be seen in the helm/cilium cli install commands.
Each method can either install Cilium using kube proxy (default) or without: Kubernetes Without kube-proxy
In this guide we assume that KubePrism is enabled and configured to use the port 7445.
Machine config preparation
When generating the machine config for a node set the CNI to none.
For example using a config patch:
Create a patch.yaml file with the following contents:
cluster:
network:
cni:
name: none
talosctl gen config \
my-cluster https://mycluster.local:6443 \
--config-patch @patch.yaml
Or if you want to deploy Cilium without kube-proxy, you also need to disable kube proxy:
Create a patch.yaml file with the following contents:
talosctl gen config \
my-cluster https://mycluster.local:6443 \
--config-patch @patch.yaml
Installation using Cilium CLI
Note: It is recommended to template the cilium manifest using helm and use it as part of Talos machine config, but if you want to install Cilium using the Cilium CLI, you can follow the steps below.
After applying the machine config and bootstrapping Talos will appear to hang on phase 18/19 with the message: retrying error: node not ready.
This happens because nodes in Kubernetes are only marked as ready once the CNI is up.
As there is no CNI defined, the boot process is pending and will reboot the node to retry after 10 minutes, this is expected behavior.
During this window you can install Cilium manually by running the following:
After generating cilium.yaml using helm template, instead of applying this manifest directly during the Talos boot window (before the reboot timeout).
You can also host this file somewhere and patch the machine config to apply this manifest automatically during bootstrap.
To do this patch your machine configuration to include this config instead of the above:
Create a patch.yaml file with the following contents:
talosctl gen config \
my-cluster https://mycluster.local:6443 \
--config-patch @patch.yaml
However, beware of the fact that the helm generated Cilium manifest contains sensitive key material.
As such you should definitely not host this somewhere publicly accessible.
Method 4: Helm manifests inline install
A more secure option would be to include the helm template output manifest inside the machine configuration.
The machine config should be generated with CNI set to none
Create a patch.yaml file with the following contents:
cluster:
network:
cni:
name: none
talosctl gen config \
my-cluster https://mycluster.local:6443 \
--config-patch @patch.yaml
if deploying Cilium with kube-proxy disabled, you can also include the following:
Create a patch.yaml file with the following contents:
This will install the Cilium manifests at just the right time during bootstrap.
Beware though:
Changing the namespace when templating with Helm does not generate a manifest containing the yaml to create that namespace.
As the inline manifest is processed from top to bottom make sure to manually put the namespace yaml at the start of the inline manifest.
Only add the Cilium inline manifest to the control plane nodes machine configuration.
Make sure all control plane nodes have an identical configuration.
If you delete any of the generated resources they will be restored whenever a control plane node reboots.
As a safety measure, Talos only creates missing resources from inline manifests, it never deletes or updates anything.
If you need to update a manifest make sure to first edit all control plane machine configurations and then run talosctl upgrade-k8s as it will take care of updating inline manifests.
Method 5: Using a job
We can utilize a job pattern run arbitrary logic during bootstrap time.
We can leverage this to our advantage to install Cilium by using an inline manifest as shown in the example below:
Because there is no CNI present at installation time the kubernetes.default.svc cannot be used to install Cilium, to overcome this limitation we’ll utilize the host network connection to connect back to itself with ‘hostNetwork: true’ in tandem with the environment variables KUBERNETES_SERVICE_PORT and KUBERNETES_SERVICE_HOST.
The job runs a container to install cilium to your liking, after the job is finished Cilium can be managed/operated like usual.
The above can be combined exchanged with for example Method 3 to host arbitrary configurations externally but render/run them at bootstrap time.
When using Talos forwardKubeDNSToHost=true option (which is enabled by default) in combination with cilium bpf.masquerade=true.
There is a known issue that causes CoreDNS to not work correctly.
As a workaround, configuring forwardKubeDNSToHost=false resolves the issue.
For more details see the discusssion here
Other things to know
After installing Cilium, cilium connectivity test might hang and/or fail with errors similar to
Error creating: pods "client-69748f45d8-9b9jg" is forbidden: violates PodSecurity "baseline:latest": non-default capabilities (container "client" must not include "NET_RAW" in securityContext.capabilities.add)
This is expected, you can workaround it by adding the pod-security.kubernetes.io/enforce=privilegedlabel on the namespace level.
Talos has full kernel module support for eBPF, See:
A brief instruction on howto use Multus on Talos Linux
Multus CNI is a container network interface (CNI) plugin for Kubernetes that enables attaching multiple network interfaces to pods.
Typically, in Kubernetes each pod only has one network interface (apart from a loopback) – with Multus you can create a multi-homed pod that has multiple interfaces.
This is accomplished by Multus acting as a “meta-plugin”, a CNI plugin that can call multiple other CNI plugins.
Installation
Multus can be deployed by simply applying the thickDaemonSet with kubectl.
This will create a DaemonSet and a CRD: NetworkAttachmentDefinition.
This can be used to specify your network configuration.
Configuration
Patching the DaemonSet
For Multus to properly work with Talos a change need to be made to the DaemonSet.
Instead of of mounting the volume called host-run-netns on /run/netns it has to be mounted on /var/run/netns.
Edit the DaemonSet and change the volume host-run-netns from /run/netns to /var/run/netns.
Failing to do so will leave your cluster crippled.
Running pods will remain running but new pods and deployments will give you the following error in the events:
Normal Scheduled 3s default-scheduler Successfully assigned virtualmachines/samplepod to virt2
Warning FailedCreatePodSandBox 3s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "3a6a58386dfbf2471a6f86bd41e4e9a32aac54ccccd1943742cb67d1e9c58b5b": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): CNI request failed with status 400: 'ContainerID:"3a6a58386dfbf2471a6f86bd41e4e9a32aac54ccccd1943742cb67d1e9c58b5b" Netns:"/var/run/netns/cni-1d80f6e3-fdab-4505-eb83-7deb17431293" IfName:"eth0" Args:"IgnoreUnknown=1;K8S_POD_NAMESPACE=virtualmachines;K8S_POD_NAME=samplepod;K8S_POD_INFRA_CONTAINER_ID=3a6a58386dfbf2471a6f86bd41e4e9a32aac54ccccd1943742cb67d1e9c58b5b;K8S_POD_UID=8304765e-fd7e-4968-9144-c42c53be04f4" Path:"" ERRORED: error configuring pod [virtualmachines/samplepod] networking: [virtualmachines/samplepod/8304765e-fd7e-4968-9144-c42c53be04f4:cbr0]: error adding container to network "cbr0": DelegateAdd: cannot set "" interface name to "eth0": validateIfName: no net namespace /var/run/netns/cni-1d80f6e3-fdab-4505-eb83-7deb17431293 found: failed to Statfs "/var/run/netns/cni-1d80f6e3-fdab-4505-eb83-7deb17431293": no such file or directory
': StdinData: {"capabilities":{"portMappings":true},"clusterNetwork":"/host/etc/cni/net.d/10-flannel.conflist","cniVersion":"0.3.1","logLevel":"verbose","logToStderr":true,"name":"multus-cni-network","type":"multus-shim"}
Creating your NetworkAttachmentDefinition
The NetworkAttachmentDefinition configuration is used to define your bridge where your second pod interface needs to be attached to.
In this example macvlan is used as a bridge type.
There are 3 types of bridges: bridge, macvlan and ipvlan:
bridge is a way to connect two Ethernet segments together in a protocol-independent way.
Packets are forwarded based on Ethernet address, rather than IP address (like a router).
Since forwarding is done at Layer 2, all protocols can go transparently through a bridge.
In terms of containers or virtual machines, a bridge can also be used to connect the virtual interfaces of each container/VM to the host network, allowing them to communicate.
macvlan is a driver that makes it possible to create virtual network interfaces that appear as distinct physical devices each with unique MAC addresses.
The underlying interface can route traffic to each of these virtual interfaces separately, as if they were separate physical devices.
This means that each macvlan interface can have its own IP subnet and routing.
Macvlan interfaces are ideal for situations where containers or virtual machines require the same network access as the host system.
ipvlan is similar to macvlan, with the key difference being that ipvlan shares the parent’s MAC address, which requires less configuration from the networking equipment.
This makes deployments simpler in certain situations where MAC address control or limits are in place.
It offers two operational modes: L2 mode (the default) where it behaves similarly to a MACVLAN, and L3 mode for routing based traffic isolation (rather than bridged).
When using the bridge interface you must also configure a bridge on your Talos nodes.
That can be done by updating Talos Linux machine configuration:
machine:
interfaces:
- interface: br0
addresses:
- 172.16.1.60/24
bridge:
stp:
enabled: trueinterfaces:
- eno1 # This must be changed to your matching interface nameroutes:
- network: 0.0.0.0/0 # The route's network (destination).gateway: 172.16.1.254# The route's gateway (if empty, creates link scope route).metric: 1024# The optional metric for the route.
More information about the configuration of bridges can be found here
Attaching the NetworkAttachmentDefinition to your Pod or Deployment
After the NetworkAttachmentDefinition is configured, you can attach that interface to your your Deployment or Pod.
In this example we use a pod:
Notes on using KubeVirt in combination with Multus
If you would like to use KubeVirt and expose your virtual machine to the outside world with Multus, make sure to configure a bridge instead of macvlan or ipvlan, because that doesn’t work, according to the KubeVirt Documentation.
Invalid CNIs for secondary networks
The following list of CNIs is known not to work for bridge interfaces - which are most common for secondary interfaces.
macvlan
ipvlan
The reason is similar: the bridge interface type moves the pod interface MAC address to the VM, leaving the pod interface with a different address.
The aforementioned CNIs require the pod interface to have the original MAC address.
Notes on using Cilium in combination with Multus
Cilium does not ship the CNI reference plugins, which most multus setups are expecting (e.g. macvlan).
This can be addressed by extending the daemonset with an additional init-container, setting them up, e.g. using the following kustomize strategic-merge patch:
The official images (as of 29.07.24) are built incorrectly for ARM64 (ref).
Self-building them is an adequate workaround for now.
3.3 - Upgrading Kubernetes
Guide on how to upgrade the Kubernetes cluster from Talos Linux.
This guide covers upgrading Kubernetes on Talos Linux clusters.
For a list of Kubernetes versions compatible with each Talos release, see the Support Matrix.
For upgrading the Talos Linux operating system, see Upgrading Talos
Video Walkthrough
To see a demo of this process, watch this video:
Automated Kubernetes Upgrade
The recommended method to upgrade Kubernetes is to use the talosctl upgrade-k8s command.
This will automatically update the components needed to upgrade Kubernetes safely.
Upgrading Kubernetes is non-disruptive to the cluster workloads.
To trigger a Kubernetes upgrade, issue a command specifying the version of Kubernetes to ugprade to, such as:
Note that the --nodes parameter specifies the control plane node to send the API call to, but all members of the cluster will be upgraded.
To check what will be upgraded you can run talosctl upgrade-k8s with the --dry-run flag:
$ talosctl --nodes <controlplane node> upgrade-k8s --to 1.32.0 --dry-run
WARNING: found resources which are going to be deprecated/migrated in the version 1.32.0
RESOURCE COUNT
validatingwebhookconfigurations.v1beta1.admissionregistration.k8s.io 4mutatingwebhookconfigurations.v1beta1.admissionregistration.k8s.io 3customresourcedefinitions.v1beta1.apiextensions.k8s.io 25apiservices.v1beta1.apiregistration.k8s.io 54leases.v1beta1.coordination.k8s.io 4automatically detected the lowest Kubernetes version 1.31.1
checking for resource APIs to be deprecated in version 1.32.0
discovered controlplane nodes ["172.20.0.2""172.20.0.3""172.20.0.4"]discovered worker nodes ["172.20.0.5""172.20.0.6"]updating "kube-apiserver" to version "1.32.0" > "172.20.0.2": starting update
> update kube-apiserver: v1.31.1 -> 1.32.0
> skipped in dry-run
> "172.20.0.3": starting update
> update kube-apiserver: v1.31.1 -> 1.32.0
> skipped in dry-run
> "172.20.0.4": starting update
> update kube-apiserver: v1.31.1 -> 1.32.0
> skipped in dry-run
updating "kube-controller-manager" to version "1.32.0" > "172.20.0.2": starting update
> update kube-controller-manager: v1.31.1 -> 1.32.0
> skipped in dry-run
> "172.20.0.3": starting update
<snip>
updating manifests
> apply manifest Secret bootstrap-token-3lb63t
> apply skipped in dry run
> apply manifest ClusterRoleBinding system-bootstrap-approve-node-client-csr
> apply skipped in dry run
<snip>
To upgrade Kubernetes from v1.31.1 to v1.32.0 run:
$ talosctl --nodes <controlplane node> upgrade-k8s --to 1.32.0
automatically detected the lowest Kubernetes version 1.31.1
checking for resource APIs to be deprecated in version 1.32.0
discovered controlplane nodes ["172.20.0.2""172.20.0.3""172.20.0.4"]discovered worker nodes ["172.20.0.5""172.20.0.6"]updating "kube-apiserver" to version "1.32.0" > "172.20.0.2": starting update
> update kube-apiserver: v1.31.1 -> 1.32.0
> "172.20.0.2": machine configuration patched
> "172.20.0.2": waiting for API server state pod update
< "172.20.0.2": successfully updated
> "172.20.0.3": starting update
> update kube-apiserver: v1.31.1 -> 1.32.0
<snip>
This command runs in several phases:
Images for new Kubernetes components are pre-pulled to the nodes to minimize downtime and test for image availability.
Every control plane node machine configuration is patched with the new image version for each control plane component.
Talos renders new static pod definitions on the configuration update which is picked up by the kubelet.
The command waits for the change to propagate to the API server state.
The command updates the kube-proxy daemonset with the new image version.
On every node in the cluster, the kubelet version is updated.
The command then waits for the kubelet service to be restarted and become healthy.
The update is verified by checking the Node resource state.
Kubernetes bootstrap manifests are re-applied to the cluster.
Updated bootstrap manifests might come with a new Talos version (e.g. CoreDNS version update), or might be the result of machine configuration change.
Note: The upgrade-k8s command never deletes any resources from the cluster: they should be deleted manually.
If the command fails for any reason, it can be safely restarted to continue the upgrade process from the moment of the failure.
Note: When using custom/overridden Kubernetes component images, use flags --*-image to override the default image names.
Manual Kubernetes Upgrade
Kubernetes can be upgraded manually by following the steps outlined below.
They are equivalent to the steps performed by the talosctl upgrade-k8s command.
Kubeconfig
In order to edit the control plane, you need a working kubectl config.
If you don’t already have one, you can get one by running:
talosctl --nodes <controlplane node> kubeconfig
API Server
Patch machine configuration using talosctl patch command:
$ talosctl -n <CONTROL_PLANE_IP_1> patch mc --mode=no-reboot -p '[{"op": "replace", "path": "/cluster/apiServer/image", "value": "registry.k8s.io/kube-apiserver:v1.32.0"}]'patched mc at the node 172.20.0.2
The JSON patch might need to be adjusted if current machine configuration is missing .cluster.apiServer.image key.
Also the machine configuration can be edited manually with talosctl -n <IP> edit mc --mode=no-reboot.
Capture the new version of kube-apiserver config with:
In this example, the new version is 5.
Wait for the new pod definition to propagate to the API server state (replace talos-default-controlplane-1 with the node name):
$ kubectl get pod -n kube-system -l k8s-app=kube-apiserver --field-selector spec.nodeName=talos-default-controlplane-1 -o jsonpath='{.items[0].metadata.annotations.talos\.dev/config\-version}'5
Check that the pod is running:
$ kubectl get pod -n kube-system -l k8s-app=kube-apiserver --field-selector spec.nodeName=talos-default-controlplane-1
NAME READY STATUS RESTARTS AGE
kube-apiserver-talos-default-controlplane-1 1/1 Running 0 16m
Repeat this process for every control plane node, verifying that state got propagated successfully between each node update.
Controller Manager
Patch machine configuration using talosctl patch command:
$ talosctl -n <CONTROL_PLANE_IP_1> patch mc --mode=no-reboot -p '[{"op": "replace", "path": "/cluster/controllerManager/image", "value": "registry.k8s.io/kube-controller-manager:v1.32.0"}]'patched mc at the node 172.20.0.2
The JSON patch might need be adjusted if current machine configuration is missing .cluster.controllerManager.image key.
Capture new version of kube-controller-manager config with:
In this example, new version is 3.
Wait for the new pod definition to propagate to the API server state (replace talos-default-controlplane-1 with the node name):
$ kubectl get pod -n kube-system -l k8s-app=kube-controller-manager --field-selector spec.nodeName=talos-default-controlplane-1 -o jsonpath='{.items[0].metadata.annotations.talos\.dev/config\-version}'3
Check that the pod is running:
$ kubectl get pod -n kube-system -l k8s-app=kube-controller-manager --field-selector spec.nodeName=talos-default-controlplane-1
NAME READY STATUS RESTARTS AGE
kube-controller-manager-talos-default-controlplane-1 1/1 Running 0 35m
Repeat this process for every control plane node, verifying that state propagated successfully between each node update.
Scheduler
Patch machine configuration using talosctl patch command:
$ talosctl -n <CONTROL_PLANE_IP_1> patch mc --mode=no-reboot -p '[{"op": "replace", "path": "/cluster/scheduler/image", "value": "registry.k8s.io/kube-scheduler:v1.32.0"}]'patched mc at the node 172.20.0.2
JSON patch might need be adjusted if current machine configuration is missing .cluster.scheduler.image key.
Capture new version of kube-scheduler config with:
In this example, new version is 3.
Wait for the new pod definition to propagate to the API server state (replace talos-default-controlplane-1 with the node name):
$ kubectl get pod -n kube-system -l k8s-app=kube-scheduler --field-selector spec.nodeName=talos-default-controlplane-1 -o jsonpath='{.items[0].metadata.annotations.talos\.dev/config\-version}'3
Check that the pod is running:
$ kubectl get pod -n kube-system -l k8s-app=kube-scheduler --field-selector spec.nodeName=talos-default-controlplane-1
NAME READY STATUS RESTARTS AGE
kube-scheduler-talos-default-controlplane-1 1/1 Running 0 39m
Repeat this process for every control plane node, verifying that state got propagated successfully between each node update.
Note: if some bootstrap resources were removed, they have to be removed from the cluster manually.
kubelet
For every node, patch machine configuration with new kubelet version, wait for the kubelet to restart with new version:
$ talosctl -n <IP> patch mc --mode=no-reboot -p '[{"op": "replace", "path": "/machine/kubelet/image", "value": "ghcr.io/siderolabs/kubelet:v1.32.0"}]'patched mc at the node 172.20.0.2
Once kubelet restarts with the new configuration, confirm upgrade with kubectl get nodes <name>:
$ kubectl get nodes talos-default-controlplane-1
NAME STATUS ROLES AGE VERSION
talos-default-controlplane-1 Ready control-plane 123m v1.32.0
4 - Advanced Guides
4.1 - Advanced Networking
How to configure advanced networking options on Talos Linux.
Static Addressing
Static addressing is comprised of specifying addresses, routes ( remember to add your default gateway ), and interface.
Most likely you’ll also want to define the nameservers so you have properly functioning DNS.
In some environments you may need to set additional addresses on an interface.
In the following example, we set two additional addresses on the loopback interface.
Setting up Talos Linux to work in environments with no internet access.
In this guide we will create a Talos cluster running in an air-gapped environment with all the required images being pulled from an internal registry.
We will use the QEMU provisioner available in talosctl to create a local cluster, but the same approach could be used to deploy Talos in bigger air-gapped networks.
In air-gapped environments, access to the public Internet is restricted, so Talos can’t pull images from public Docker registries (docker.io, ghcr.io, etc.)
We need to identify the images required to install and run Talos.
The same strategy can be used for images required by custom workloads running on the cluster.
The talosctl image default command provides a list of default images used by the Talos cluster (with default configuration
settings).
To print the list of images, run:
talosctl image default
This list contains images required by a default deployment of Talos.
There might be additional images required for the workloads running on this cluster, and those should be added to this list.
Preparing the Internal Registry
As access to the public registries is restricted, we have to run an internal Docker registry.
In this guide, we will launch the registry on the same machine using Docker:
This registry will be accepting connections on port 6000 on the host IPs.
The registry is empty by default, so we have fill it with the images required by Talos.
First, we pull all the images to our local Docker daemon:
$ for image in `talosctl image default`; do docker pull $image; donev0.15.1: Pulling from coreos/flannel
Digest: sha256:9a296fbb67790659adc3701e287adde3c59803b7fcefe354f1fc482840cdb3d9
...
All images are now stored in the Docker daemon store:
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
gcr.io/etcd-development/etcd v3.5.3 604d4f022632 6 days ago 181MB
ghcr.io/siderolabs/install-cni v1.0.0-2-gc5d3ab0 4729e54f794d 6 days ago 76MB
...
Now we need to re-tag them so that we can push them to our local registry.
We are going to replace the first component of the image name (before the first slash) with our registry endpoint 127.0.0.1:6000:
$ for image in `talosctl image default`; do\
docker tag $image`echo$image | sed -E 's#^[^/]+/#127.0.0.1:6000/#'`; \
done
As the next step, we push images to the internal registry:
$ for image in `talosctl image default`; do\
docker push `echo$image | sed -E 's#^[^/]+/#127.0.0.1:6000/#'`; \
done
We can now verify that the images are pushed to the registry:
Note: images in the registry don’t have the registry endpoint prefix anymore.
Launching Talos in an Air-gapped Environment
For Talos to use the internal registry, we use the registry mirror feature to redirect all image pull requests to the internal registry.
This means that the registry endpoint (as the first component of the image reference) gets ignored, and all pull requests are sent directly to the specified endpoint.
We are going to use a QEMU-based Talos cluster for this guide, but the same approach works with Docker-based clusters as well.
As QEMU-based clusters go through the Talos install process, they can be used better to model a real air-gapped environment.
Identify all registry prefixes from talosctl image default, for example:
docker.io
gcr.io
ghcr.io
registry.k8s.io
The talosctl cluster create command provides conveniences for common configuration options.
The only required flag for this guide is --registry-mirror <endpoint>=http://10.5.0.1:6000 which redirects every pull request to the internal registry, this flag
needs to be repeated for each of the identified registry prefixes above.
The endpoint being used is 10.5.0.1, as this is the default bridge interface address which will be routable from the QEMU VMs (127.0.0.1 IP will be pointing to the VM itself).
$ sudo --preserve-env=HOME talosctl cluster create --provisioner=qemu --install-image=ghcr.io/siderolabs/installer:v1.9.0 \
--registry-mirror docker.io=http://10.5.0.1:6000 \
--registry-mirror gcr.io=http://10.5.0.1:6000 \
--registry-mirror ghcr.io=http://10.5.0.1:6000 \
--registry-mirror registry.k8s.io=http://10.5.0.1:6000 \
validating CIDR and reserving IPs
generating PKI and tokens
creating state directory in "/home/user/.talos/clusters/talos-default"creating network talos-default
creating load balancer
creating dhcpd
creating master nodes
creating worker nodes
waiting for API
...
Note: --install-image should match the image which was copied into the internal registry in the previous step.
You can be verify that the cluster is air-gapped by inspecting the registry logs: docker logs -f registry-airgapped.
Closing Notes
Running in an air-gapped environment might require additional configuration changes, for example using custom settings for DNS and NTP servers.
When scaling this guide to the bare-metal environment, following Talos config snippet could be used as an equivalent of the --registry-mirror flag above:
Other implementations of Docker registry can be used in place of the Docker registry image used above to run the registry.
If required, auth can be configured for the internal registry (and custom TLS certificates if needed).
If building for a specific release, checkout the corresponding tag:
git checkout v1.9.0
Set up the Build Environment
See Developing Talos for details on setting up the buildkit builder.
Architectures
By default, Talos builds for linux/amd64, but you can customize that by passing PLATFORM variable to make:
make <target> PLATFORM=linux/arm64 # build for arm64 onlymake <target> PLATFORM=linux/arm64,linux/amd64 # build for arm64 and amd64, container images will be multi-arch
if you built and pushed only a custom kernel package, the reference can be overridden with PKG_KERNEL variable: make <target> PKG_KERNEL=<registry>/<username>/kernel:<tag>
if any other single package was customized, the reference can be overridden with PKG_<pkg> (e.g. PKG_IPTABLES) variable: make <target> PKG_<pkg>=<registry>/<username>/<pkg>:<tag>
if the full pkgs repository was built and pushed, the references can be overridden with PKGS_PREFIX and PKGS variables: make <target> PKGS_PREFIX=<registry>/<username> PKGS=<tag>
Customizations
Some of the build parameters can be customized by passing environment variables to make, e.g. GOAMD64=v1 can be used to build
Talos images compatible with old AMD64 CPUs:
make <target> GOAMD64=v1
Building Kernel and Initramfs
The most basic boot assets can be built with:
make kernel initramfs
Build result will be stored as _out/vmlinuz-<arch> and _out/initramfs-<arch>.xz.
Building Container Images
Talos container images should be pushed to the registry as the result of the build process.
The default settings are:
IMAGE_REGISTRY is set to ghcr.io
USERNAME is set to the siderolabs (or value of environment variable USERNAME if it is set)
The image can be pushed to any registry you have access to, but the access credentials should be stored in ~/.docker/config.json file (e.g. with docker login).
The local registry running on 127.0.0.1:5005 can be used as well to avoid pushing/pulling over the network:
make installer PUSH=trueREGISTRY=127.0.0.1:5005
When building imager container, by default Talos will include the boot assets for both amd64 and arm64 architectures, if building only for single architecture, specify INSTALLER_ARCH variable:
make imager INSTALLER_ARCH=targetarch PLATFORM=linux/amd64
Building ISO
The ISO image is built with the help of imager container image, by default ghcr.io/siderolabs/imager will be used with the matching tag:
make iso
The ISO image will be stored as _out/talos-<arch>.iso.
If ISO image should be built with the custom imager image, it can be specified with IMAGE_REGISTRY/USERNAME variables:
make iso IMAGE_REGISTRY=docker.io USERNAME=<username>
Building Disk Images
The disk image is built with the help of imager container image, by default ghcr.io/siderolabs/imager will be used with the matching tag:
make image-metal
Available disk images are encoded in the image-% target, e.g. make image-aws.
Same as with ISO image, the custom imager image can be specified with IMAGE_REGISTRY/USERNAME variables.
4.4 - CA Rotation
How to rotate Talos and Kubernetes API root certificate authorities.
In general, you almost never need to rotate the root CA certificate and key for the Talos API and Kubernetes API.
Talos sets up root certificate authorities with the lifetime of 10 years, and all Talos and Kubernetes API certificates are issued by these root CAs.
So the rotation of the root CA is only needed if:
you suspect that the private key has been compromised;
you want to revoke access to the cluster for a leaked talosconfig or kubeconfig;
once in 10 years.
Overview
There are some details which make Talos and Kubernetes API root CA rotation a bit different, but the general flow is the same:
generate new CA certificate and key;
add new CA certificate as ‘accepted’, so new certificates will be accepted as valid;
swap issuing CA to the new one, old CA as accepted;
refresh all certificates in the cluster;
remove old CA from ‘accepted’.
At the end of the flow, old CA is completely removed from the cluster, so all certificates issued by it will be considered invalid.
Both rotation flows are described in detail below.
Talos API
Automated Talos API CA Rotation
Talos API CA rotation doesn’t interrupt connections within the cluster, and it doesn’t require a reboot of the nodes.
Run the following command in dry-run mode to see the steps which will be taken:
$ talosctl -n <CONTROLPLANE> rotate-ca --dry-run=true --talos=true --kubernetes=false> Starting Talos API PKI rotation, dry-run mode true...
> Using config context: "talos-default"> Using Talos API endpoints: ["172.20.0.2"]> Cluster topology:
- control plane nodes: ["172.20.0.2"] - worker nodes: ["172.20.0.3"]> Current Talos CA:
...
No changes will be done to the cluster in dry-run mode, so you can safely run it to see the steps.
Before proceeding, make sure that you can capture the output of talosctl command, as it will contain the new CA certificate and key.
Record a list of Talos API users to make sure they can all be updated with new talosconfig.
Run the following command to rotate the Talos API CA:
$ talosctl -n <CONTROLPLANE> rotate-ca --dry-run=false --talos=true --kubernetes=false> Starting Talos API PKI rotation, dry-run mode false...
> Using config context: "talos-default-268"> Using Talos API endpoints: ["172.20.0.2"]> Cluster topology:
- control plane nodes: ["172.20.0.2"] - worker nodes: ["172.20.0.3"]> Current Talos CA:
...
> New Talos CA:
...
> Generating new talosconfig:
context: talos-default
contexts:
talos-default:
....
> Verifying connectivity with existing PKI:
- 172.20.0.2: OK (version v1.9.0) - 172.20.0.3: OK (version v1.9.0)> Adding new Talos CA as accepted...
- 172.20.0.2: OK
- 172.20.0.3: OK
> Verifying connectivity with new client cert, but old server CA:
2024/04/17 21:26:07 retrying error: rpc error: code= Unavailable desc= connection error: desc="error reading server preface: remote error: tls: unknown certificate authority" - 172.20.0.2: OK (version v1.9.0) - 172.20.0.3: OK (version v1.9.0)> Making new Talos CA the issuing CA, old Talos CA the accepted CA...
- 172.20.0.2: OK
- 172.20.0.3: OK
> Verifying connectivity with new PKI:
2024/04/17 21:26:08 retrying error: rpc error: code= Unavailable desc= connection error: desc="transport: authentication handshake failed: tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of \"x509: Ed25519 verification failure\" while trying to verify candidate authority certificate \"talos\")" - 172.20.0.2: OK (version v1.9.0) - 172.20.0.3: OK (version v1.9.0)> Removing old Talos CA from the accepted CAs...
- 172.20.0.2: OK
- 172.20.0.3: OK
> Verifying connectivity with new PKI:
- 172.20.0.2: OK (version v1.9.0) - 172.20.0.3: OK (version v1.9.0)> Writing new talosconfig to "talosconfig"
Once the rotation is done, stash the new Talos CA, update secrets.yaml (if using that for machine configuration generation) with new CA key and certificate.
The new client talosconfig is written to the current directory as talosconfig.
You can merge it to the default location with talosctl config merge ./talosconfig.
If other client access talosconfig files needs to be generated, use talosctl config new with new talosconfig.
Generate new Talos CA (e.g. use talosctl gen secrets and use Talos CA).
Patch machine configuration on all nodes updating .machine.acceptedCAs with new CA certificate.
Generate talosconfig with client certificate generated with new CA, but still using old CA as server CA, verify connectivity, Talos should accept new client certificate.
Patch machine configuration on all nodes updating .machine.ca with new CA certificate and key, and keeping old CA certificate in .machine.acceptedCAs (on worker nodes .machine.ca doesn’t have the key).
Generate talosconfig with both client certificate and server CA using new CA PKI, verify connectivity.
Remove old CA certificate from .machine.acceptedCAs on all nodes.
Verify connectivity.
Kubernetes API
Automated Kubernetes API CA Rotation
The automated process only rotates Kubernetes API CA, used by the kube-apiserver, kubelet, etc.
Other Kubernetes secrets might need to be rotated manually as required.
Kubernetes pods might need to be restarted to handle changes, and communication within the cluster might be disrupted during the rotation process.
Run the following command in dry-run mode to see the steps which will be taken:
$ talosctl -n <CONTROLPLANE> rotate-ca --dry-run=true --talos=false --kubernetes=true> Starting Kubernetes API PKI rotation, dry-run mode true...
> Cluster topology:
- control plane nodes: ["172.20.0.2"] - worker nodes: ["172.20.0.3"]> Building current Kubernetes client...
> Current Kubernetes CA:
...
Before proceeding, make sure that you can capture the output of talosctl command, as it will contain the new CA certificate and key.
As Talos API access will not be disrupted, the changes can be reverted back if needed by reverting machine configuration.
Run the following command to rotate the Kubernetes API CA:
$ talosctl -n <CONTROLPLANE> rotate-ca --dry-run=false --talos=false --kubernetes=true> Starting Kubernetes API PKI rotation, dry-run mode false...
> Cluster topology:
- control plane nodes: ["172.20.0.2"] - worker nodes: ["172.20.0.3"]> Building current Kubernetes client...
> Current Kubernetes CA:
...
> New Kubernetes CA:
...
> Verifying connectivity with existing PKI...
- OK (2 nodes ready)> Adding new Kubernetes CA as accepted...
- 172.20.0.2: OK
- 172.20.0.3: OK
> Making new Kubernetes CA the issuing CA, old Kubernetes CA the accepted CA...
- 172.20.0.2: OK
- 172.20.0.3: OK
> Building new Kubernetes client...
> Verifying connectivity with new PKI...
2024/04/17 21:45:52 retrying error: Get "https://172.20.0.1:6443/api/v1/nodes": EOF
- OK (2 nodes ready)> Removing old Kubernetes CA from the accepted CAs...
- 172.20.0.2: OK
- 172.20.0.3: OK
> Verifying connectivity with new PKI...
- OK (2 nodes ready)> Kubernetes CA rotation done, new 'kubeconfig' can be fetched with `talosctl kubeconfig`.
At the end of the process, Kubernetes control plane components will be restarted to pick up CA certificate changes.
Each node kubelet will re-join the cluster with new client certficiate.
New kubeconfig can be fetched with talosctl kubeconfig command from the cluster.
Kubernetes pods might need to be restarted manually to pick up changes to the Kubernetes API CA.
.cluster.acceptedCAs in place of .machine.acceptedCAs;
.cluster.ca in place of .machine.ca;
kubeconfig in place of talosconfig.
4.5 - Cgroups Resource Analysis
How to use talosctl cgroups to monitor resource usage on the node.
Talos provides a way to monitor resource usage of the control groups on the machine.
This feature is useful to understand how much resources are being used by the containers and processes running on the machine.
Talos creates several system cgroups:
init (contains machined PID 1)
system (contains system services, and extension services)
Kubelet creates a tree of cgroups for each pod, and each container in the pod, starting with kubepods as the root group.
Talos Linux might set some default limits for the cgroups, and these are not configurable at the moment.
Kubelet is configured by default to reserve some amount of RAM and CPU for system processes to prevent the system from becoming unresponsive under extreme resource pressure.
Note: this feature is only available in cgroupsv2 mode which is Talos default.
The talosctl cgroups command provides a way to monitor the resource usage of the cgroups on the machine, it has a set of presets which are described below.
In the IO (input/output) view, the following columns are displayed:
Bytes Read/Written: the total number of bytes read and written by the cgroup and its children, per each blockdevice
ios Read/Write: the total number of I/O operations read and written by the cgroup and its children, per each blockdevice
PressAvg10: the average IO pressure of the cgroup and its children over the last 10 seconds
PressAvg60: the average IO pressure of the cgroup and its children over the last 60 seconds
PressTotal: the total IO pressure of the cgroup and its children (see PSI for more information)
memory
$ talosctl cgroups --preset=memory
NAME MemCurrent MemPeak MemLow Peak/Low MemHigh MemMin Current/Min MemMax
. unsetunsetunset unset% unsetunset unset% unset├──init 133 MiB 133 MiB 192 MiB 69.18% max 96 MiB 138.35% max
├──kubepods 494 MiB 505 MiB 0 B max% max 0 B max% 1.4 GiB
│ ├──besteffort 70 MiB 74 MiB 0 B max% max 0 B max% max
│ │ └──kube-system/kube-proxy-6r5bz 70 MiB 74 MiB 0 B max% max 0 B max% max
│ │ ├──kube-proxy 69 MiB 73 MiB 0 B max% max 0 B max% max
│ │ └──sandbox 872 KiB 2.2 MiB 0 B max% max 0 B max% max
│ └──burstable 424 MiB 435 MiB 0 B max% max 0 B max% max
│ ├──kube-system/kube-apiserver-talos-default-controlplane-1 233 MiB 242 MiB 0 B max% max 0 B max% max
│ │ ├──kube-apiserver 232 MiB 242 MiB 0 B max% max 0 B max% max
│ │ └──sandbox 208 KiB 3.3 MiB 0 B max% max 0 B max% max
│ ├──kube-system/kube-controller-manager-talos-default-controlplane-1 78 MiB 80 MiB 0 B max% max 0 B max% max
│ │ ├──kube-controller-manager 78 MiB 80 MiB 0 B max% max 0 B max% max
│ │ └──sandbox 212 KiB 3.3 MiB 0 B max% max 0 B max% max
│ ├──kube-system/kube-flannel-jzx6m 48 MiB 50 MiB 0 B max% max 0 B max% max
│ │ ├──kube-flannel 46 MiB 48 MiB 0 B max% max 0 B max% max
│ │ └──sandbox 216 KiB 3.1 MiB 0 B max% max 0 B max% max
│ └──kube-system/kube-scheduler-talos-default-controlplane-1 66 MiB 67 MiB 0 B max% max 0 B max% max
│ ├──kube-scheduler 66 MiB 67 MiB 0 B max% max 0 B max% max
│ └──sandbox 208 KiB 3.4 MiB 0 B max% max 0 B max% max
├──podruntime 549 MiB 647 MiB 0 B max% max 0 B max% max
│ ├──etcd 382 MiB 482 MiB 256 MiB 188.33% max 0 B max% max
│ ├──kubelet 103 MiB 104 MiB 192 MiB 54.31% max 96 MiB 107.57% max
│ └──runtime 64 MiB 71 MiB 392 MiB 18.02% max 196 MiB 32.61% max
└──system 229 MiB 232 MiB 192 MiB 120.99% max 96 MiB 239.00% max
├──apid 26 MiB 28 MiB 32 MiB 88.72% max 16 MiB 159.23% 40 MiB
├──dashboard 113 MiB 113 MiB 0 B max% max 0 B max% 196 MiB
├──runtime 74 MiB 77 MiB 96 MiB 79.89% max 48 MiB 154.57% max
├──trustd 10 MiB 11 MiB 16 MiB 69.85% max 8.0 MiB 127.78% 24 MiB
└──udevd 6.8 MiB 14 MiB 16 MiB 86.87% max 8.0 MiB 84.67% max
In the memory view, the following columns are displayed:
MemCurrent: the current memory usage of the cgroup and its children
MemPeak: the peak memory usage of the cgroup and its children
MemLow: the low memory reservation of the cgroup
Peak/Low: the ratio of the peak memory usage to the low memory reservation
MemHigh: the high memory limit of the cgroup
MemMin: the minimum memory reservation of the cgroup
Current/Min: the ratio of the current memory usage to the minimum memory reservation
MemMax: the maximum memory limit of the cgroup
swap
$ talosctl cgroups --preset=swap
NAME SwapCurrent SwapPeak SwapHigh SwapMax
. unsetunsetunsetunset├──init 0 B 0 B max max
├──kubepods 0 B 0 B max max
│ ├──besteffort 0 B 0 B max max
│ │ └──kube-system/kube-proxy-6r5bz 0 B 0 B max max
│ │ ├──kube-proxy 0 B 0 B max 0 B
│ │ └──sandbox 0 B 0 B max max
│ └──burstable 0 B 0 B max max
│ ├──kube-system/kube-apiserver-talos-default-controlplane-1 0 B 0 B max max
│ │ ├──kube-apiserver 0 B 0 B max 0 B
│ │ └──sandbox 0 B 0 B max max
│ ├──kube-system/kube-controller-manager-talos-default-controlplane-1 0 B 0 B max max
│ │ ├──kube-controller-manager 0 B 0 B max 0 B
│ │ └──sandbox 0 B 0 B max max
│ ├──kube-system/kube-flannel-jzx6m 0 B 0 B max max
│ │ ├──kube-flannel 0 B 0 B max 0 B
│ │ └──sandbox 0 B 0 B max max
│ └──kube-system/kube-scheduler-talos-default-controlplane-1 0 B 0 B max max
│ ├──kube-scheduler 0 B 0 B max 0 B
│ └──sandbox 0 B 0 B max max
├──podruntime 0 B 0 B max max
│ ├──etcd 0 B 0 B max max
│ ├──kubelet 0 B 0 B max max
│ └──runtime 0 B 0 B max max
└──system 0 B 0 B max max
├──apid 0 B 0 B max max
├──dashboard 0 B 0 B max max
├──runtime 0 B 0 B max max
├──trustd 0 B 0 B max max
└──udevd 0 B 0 B max max
In the swap view, the following columns are displayed:
SwapCurrent: the current swap usage of the cgroup and its children
SwapPeak: the peak swap usage of the cgroup and its children
SwapHigh: the high swap limit of the cgroup
SwapMax: the maximum swap limit of the cgroup
Custom Schemas
The talosctl cgroups command allows you to define custom schemas to display the cgroups information in a specific way.
The schema is defined in a YAML file with the following structure:
The schema file can be passed to the talosctl cgroups command with the --schema-file flag:
talosctl cgroups --schema-file=schema.yaml
In the schema, for each column, you can define a name and a template which is a Go template that will be executed with the cgroups data.
In the template, there’s a . variable that contains the cgroups data, and .Parent variable which is a parent cgroup (if available).
Each cgroup node contains information parsed from the cgroup filesystem, with field names matching the filenames adjusted for Go naming conventions,
e.g. io.stat becomes .IOStat in the template.
The schemas for the presets above can be found in the source code.
4.6 - Customizing the Kernel
Guide on how to customize the kernel used by Talos Linux.
Talos Linux configures the kernel to allow loading only cryptographically signed modules.
The signing key is generated during the build process, it is unique to each build, and it is not available to the user.
The public key is embedded in the kernel, and it is used to verify the signature of the modules.
So if you want to use a custom kernel module, you will need to build your own kernel, and all required kernel modules in order to get the signature in sync with the kernel.
Overview
In order to build a custom kernel (or a custom kernel module), the following steps are required:
build a new Linux kernel and modules, push the artifacts to a registry
build a new Talos base artifacts: kernel and initramfs image
produce a new Talos boot artifact (ISO, installer image, disk image, etc.)
We will go through each step in detail.
Building a Custom Kernel
First, you might need to prepare the build environment, follow the Building Custom Images guide.
git clone https://github.com/siderolabs/pkgs.git
cd pkgs
git checkout release-1.9
The kernel configuration is located in the files kernel/build/config-ARCH files.
It can be modified using the text editor, or by using the Linux kernel menuconfig tool:
make kernel-menuconfig
The kernel configuration can be cleaned up by running:
make kernel-olddefconfig
Both commands will output the new configuration to the kernel/build/config-ARCH files.
Once ready, build the kernel any out-of-tree modules (if required, e.g. zfs) and push the artifacts to a registry:
make kernel REGISTRY=127.0.0.1:5005 PUSH=true
By default, this command will compile and push the kernel both for amd64 and arm64 architectures, but you can specify a single architecture by overriding
a variable PLATFORM:
make kernel REGISTRY=127.0.0.1:5005 PUSH=truePLATFORM=linux/amd64
This will create a container image 127.0.0.1:5005/siderolabs/kernel:$TAG with the kernel and modules.
If some new kernel modules were introduced, adjust the list of the default modules compiled into the Talos initramfs by
editing the file hack/modules-ARCH.txt.
Try building base Talos artifacts:
make kernel initramfs PKG_KERNEL=127.0.0.1:5005/siderolabs/kernel:$TAGPLATFORM=linux/amd64
This should create a new image of the kernel and initramfs in _out/vmlinuz-amd64 and _out/initramfs-amd64.xz respectively.
Note: if building for arm64, replace amd64 with arm64 in the commands above.
As a final step, produce the new imager container image which can generate Talos boot assets:
make imager PKG_KERNEL=127.0.0.1:5005/siderolabs/kernel:$TAGPLATFORM=linux/amd64 INSTALLER_ARCH=targetarch
Note: if you built the kernel for both amd64 and arm64, a multi-arch imager container can be built as well by specifying INSTALLER_ARCH=all and PLATFORM=linux/amd64,linux/arm64.
Building Talos Boot Assets
Follow the Boot Assets guide to build Talos boot assets you might need to boot Talos: ISO, installer image, etc.
Replace the reference to the imager in guide with the reference to the imager container built above.
Note: if you update the imager container, don’t forget to docker pull it, as docker caches pulled images and won’t pull the updated image automatically.
4.7 - Developing Talos
Learn how to set up a development environment for local testing and hacking on Talos itself!
This guide outlines steps and tricks to develop Talos operating systems and related components.
The guide assumes Linux operating system on the development host.
Some steps might work under Mac OS X, but using Linux is highly advised.
Note: network=host allows buildx builder to access host network, so that it can push to a local container registry (see below).
Make sure the following steps work:
make talosctl
make initramfs kernel
Set up a local docker registry:
docker run -d -p 5005:5000 \
--restart always \
--name local registry:2
Try to build and push to local registry an installer image:
make installer IMAGE_REGISTRY=127.0.0.1:5005 PUSH=true
Record the image name output in the step above.
Note: it is also possible to force a stable image tag by using TAG variable: make installer IMAGE_REGISTRY=127.0.0.1:5005 TAG=v1.0.0-alpha.1 PUSH=true.
Running Talos cluster
Set up local caching docker registries (this speeds up Talos cluster boot a lot), script is in the Talos repo:
custom --cidr to make QEMU cluster use different network than default Docker setup (optional)
--registry-mirror uses the caching proxies set up above to speed up boot time a lot, last one adds your local registry (installer image was pushed to it)
--install-image is the image you built with make installer above
--controlplanes & --workers configure cluster size, choose to match your resources; 3 controlplanes give you HA control plane; 1 controlplane is enough, never do 2 controlplanes
--with-bootloader=false disables boot from disk (Talos will always boot from _out/vmlinuz-amd64 and _out/initramfs-amd64.xz).
This speeds up development cycle a lot - no need to rebuild installer and perform install, rebooting is enough to get new code.
Note: as boot loader is not used, it’s not necessary to rebuild installer each time (old image is fine), but sometimes it’s needed (when configuration changes are done and old installer doesn’t validate the config).
talosctl cluster create derives Talos machine configuration version from the install image tag, so sometimes early in the development cycle (when new minor tag is not released yet), machine config version can be overridden with --talos-version=v1.9.
If the --with-bootloader=false flag is not enabled, for Talos cluster to pick up new changes to the code (in initramfs), it will require a Talos upgrade (so new installer should be built).
With --with-bootloader=false flag, Talos always boots from initramfs in _out/ directory, so simple reboot is enough to pick up new code changes.
If the installation flow needs to be tested, --with-bootloader=false shouldn’t be used.
Once talosctl cluster create finishes successfully, talosconfig and kubeconfig will be set up automatically to point to your cluster.
Start playing with talosctl:
talosctl -n 172.20.0.2 version
talosctl -n 172.20.0.3,172.20.0.4 dashboard
talosctl -n 172.20.0.4 get members
Same with kubectl:
kubectl get nodes -o wide
You can deploy some Kubernetes workloads to the cluster.
You can edit machine config on the fly with talosctl edit mc --immediate, config patches can be applied via --config-patch flags, also many features have specific flags in talosctl cluster create.
Quick Reboot
To reboot whole cluster quickly (e.g. to pick up a change made in the code):
for socket in ~/.talos/clusters/talos-default/talos-default-*.monitor; doecho"q" | sudo socat - unix-connect:$socket; done
Sending q to a single socket allows to reboot a single node.
Note: This command performs immediate reboot (as if the machine was powered down and immediately powered back up), for normal Talos reboot use talosctl reboot.
Development Cycle
Fast development cycle:
bring up a cluster
make code changes
rebuild initramfs with make initramfs
reboot a node to pick new initramfs
verify code changes
more code changes…
Some aspects of Talos development require to enable bootloader (when working on installer itself), in that case quick development cycle is no longer possible, and cluster should be destroyed and recreated each time.
Running Integration Tests
If integration tests were changed (or when running them for the first time), first rebuild the integration test binary:
rm -f _out/integration-test-linux-amd64; make _out/integration-test-linux-amd64
Running short tests against QEMU provisioned cluster:
Whole test suite can be run removing -test.short flag.
Specfic tests can be run with -test.run=TestIntegration/api.ResetSuite.
Build Flavors
make <something> WITH_RACE=1 enables Go race detector, Talos runs slower and uses more memory, but memory races are detected.
make <something> WITH_DEBUG=1 enables Go profiling and other debug features, useful for local development.
make initramfs WITH_DEBUG_SHELL=true adds bash and minimal utilities for debugging purposes.
Combine with --with-debug-shell flag when creating cluster to obtain shell access.
This is uncommonly used as in this case the bash shell will run in place of machined.
This command stops QEMU and helper processes, tears down bridged network on the host, and cleans up
cluster state in ~/.talos/clusters.
Note: if the host machine is rebooted, QEMU instances and helpers processes won’t be started back.
In that case it’s required to clean up files in ~/.talos/clusters/<cluster-name> directory manually.
Optional
Set up cross-build environment with:
docker run --rm --privileged multiarch/qemu-user-static --reset -p yes
Note: the static qemu binaries which come with Ubuntu 21.10 seem to be broken.
Unit tests
Unit tests can be run in buildx with make unit-tests, on Ubuntu systems some tests using loop devices will fail because Ubuntu uses low-index loop devices for snaps.
Most of the unit-tests can be run standalone as well, with regular go test, or using IDE integration:
go test -v ./internal/pkg/circular/
This provides much faster feedback loop, but some tests require either elevated privileges (running as root) or additional binaries available only in Talos rootfs (containerd tests).
Running tests as root can be done with -exec flag to go test, but this is risky, as test code has root access and can potentially make undesired changes:
go test -exec sudo -v ./internal/app/machined/pkg/controllers/network/...
Go Profiling
Build initramfs with debug enabled: make initramfs WITH_DEBUG=1.
Launch Talos cluster with bootloader disabled, and use go tool pprof to capture the profile and show the output in your browser:
go tool pprof http://172.20.0.2:9982/debug/pprof/heap
The IP address 172.20.0.2 is the address of the Talos node, and port :9982 depends on the Go application to profile:
9981: apid
9982: machined
9983: trustd
Testing Air-gapped Environments
There is a hidden talosctl debug air-gapped command which launches two components:
HTTP proxy capable of proxying HTTP and HTTPS requests
HTTPS server with a self-signed certificate
The command also writes down Talos machine configuration patch to enable the HTTP proxy and add a self-signed certificate
to the list of trusted certificates:
$ talosctl debug air-gapped --advertised-address 172.20.0.1
2022/08/04 16:43:14 writing config patch to air-gapped-patch.yaml
2022/08/04 16:43:14 starting HTTP proxy on :8002
2022/08/04 16:43:14 starting HTTPS server with self-signed cert on :8001
The --advertised-address should match the bridge IP of the Talos node.
The first section appends a self-signed certificate of the HTTPS server to the list of trusted certificates,
followed by the HTTP proxy setup (in-cluster traffic is excluded from the proxy).
The last section adds an extra Kubernetes manifest hosted on the HTTPS server.
The machine configuration patch can now be used to launch a test Talos cluster:
The following lines should appear in the output of the talosctl debug air-gapped command:
CONNECT discovery.talos.dev:443: the HTTP proxy is used to talk to the discovery service
http: TLS handshake error from 172.20.0.2:53512: remote error: tls: bad certificate: an expected error on Talos side, as self-signed cert is not written yet to the file
GET /debug.yaml: Talos successfully fetches the extra manifest successfully
There might be more output depending on the registry caches being used or not.
Running Upgrade Integration Tests
Talos has a separate set of provision upgrade tests, which create a cluster on older versions of Talos, perform an upgrade,
and verify that the cluster is still functional.
Build the test binary:
rm -f _out/integration-test-provision-linux-amd64; make _out/integration-test-provision-linux-amd64
Prepare the test artifacts for the upgrade test:
make release-artifacts
Build and push an installer image for the development version of Talos:
make installer IMAGE_REGISTRY=127.0.0.1:5005 PUSH=true
Run the tests (the tests will create the cluster on the older version of Talos, perform an upgrade, and verify that the cluster is still functional):
Procedure for snapshotting etcd database and recovering from catastrophic control plane failure.
etcd database backs Kubernetes control plane state, so if the etcd service is unavailable,
the Kubernetes control plane goes down, and the cluster is not recoverable until etcd is recovered.
etcd builds around the consensus protocol Raft, so highly-available control plane clusters can tolerate the loss of nodes so long as more than half of the members are running and reachable.
For a three control plane node Talos cluster, this means that the cluster tolerates a failure of any single node,
but losing more than one node at the same time leads to complete loss of service.
Because of that, it is important to take routine backups of etcd state to have a snapshot to recover the cluster from
in case of catastrophic failure.
Backup
Snapshotting etcd Database
Create a consistent snapshot of etcd database with talosctl etcd snapshot command:
$ talosctl -n <IP> etcd snapshot db.snapshot
etcd snapshot saved to "db.snapshot"(2015264 bytes)snapshot info: hash c25fd181, revision 4193, total keys 1287, total size 3035136
Note: filename db.snapshot is arbitrary.
This database snapshot can be taken on any healthy control plane node (with IP address <IP> in the example above),
as all etcd instances contain exactly same data.
It is recommended to configure etcd snapshots to be created on some schedule to allow point-in-time recovery using the latest snapshot.
Disaster Database Snapshot
If the etcd cluster is not healthy (for example, if quorum has already been lost), the talosctl etcd snapshot command might fail.
In that case, copy the database snapshot directly from the control plane node:
This snapshot might not be fully consistent (if the etcd process is running), but it allows
for disaster recovery when latest regular snapshot is not available.
Machine Configuration
Machine configuration might be required to recover the node after hardware failure.
Backup Talos node machine configuration with the command:
talosctl -n IP get mc v1alpha1 -o yaml | yq eval'.spec' -
Recovery
Before starting a disaster recovery procedure, make sure that etcd cluster can’t be recovered:
get etcd cluster member list on all healthy control plane nodes with talosctl -n IP etcd members command and compare across all members.
query etcd health across control plane nodes with talosctl -n IP service etcd.
If the quorum can be restored, restoring quorum might be a better strategy than performing full disaster recovery
procedure.
Latest Etcd Snapshot
Get hold of the latest etcd database snapshot.
If a snapshot is not fresh enough, create a database snapshot (see above), even if the etcd cluster is unhealthy.
Init Node
Make sure that there are no control plane nodes with machine type init:
$ talosctl -n <IP1>,<IP2>,... get machinetype
NODE NAMESPACE TYPE ID VERSION TYPE
172.20.0.2 config MachineType machine-type 2 controlplane
172.20.0.4 config MachineType machine-type 2 controlplane
172.20.0.3 config MachineType machine-type 2 controlplane
Init node type is deprecated, and are incompatible with etcd recovery procedure.
init node can be converted to controlplane type with talosctl edit mc --mode=staged command followed
by node reboot with talosctl reboot command.
Preparing Control Plane Nodes
If some control plane nodes experienced hardware failure, replace them with new nodes.
Use machine configuration backup to re-create the nodes with the same secret material and control plane settings
to allow workers to join the recovered control plane.
If a control plane node is up but etcd isn’t, wipe the node’s EPHEMERAL partition to remove the etcd
data directory (make sure a database snapshot is taken before doing this):
At this point, all control plane nodes should boot up, and etcd service should be in the Preparing state.
The Kubernetes control plane endpoint should be pointed to the new control plane nodes if there were
changes to the node addresses.
Recovering from the Backup
Make sure all etcd service instances are in Preparing state:
$ talosctl -n <IP> service etcd
NODE 172.20.0.2
ID etcd
STATE Preparing
HEALTH ?
EVENTS [Preparing]: Running pre state (17s ago)[Waiting]: Waiting for service "cri" to be "up", time sync (18s ago)[Waiting]: Waiting for service "cri" to be "up", service "networkd" to be "up", time sync (20s ago)
Execute the bootstrap command against any control plane node passing the path to the etcd database snapshot:
$ talosctl -n <IP> bootstrap --recover-from=./db.snapshot
recovering from snapshot "./db.snapshot": hash c25fd181, revision 4193, total keys 1287, total size 3035136
Note: if database snapshot was copied out directly from the etcd data directory using talosctl cp,
add flag --recover-skip-hash-check to skip integrity check on restore.
Talos node should print matching information in the kernel log:
recovering etcd from snapshot: hash c25fd181, revision 4193, total keys 1287, total size 3035136
{"level":"info","msg":"restoring snapshot","path":"/var/lib/etcd.snapshot","wal-dir":"/var/lib/etcd/member/wal","data-dir":"/var/lib/etcd","snap-dir":"/var/li}
{"level":"info","msg":"restored last compact revision","meta-bucket-name":"meta","meta-bucket-name-key":"finishedCompactRev","restored-compact-revision":3360}
{"level":"info","msg":"added member","cluster-id":"a3390e43eb5274e2","local-member-id":"0","added-peer-id":"eb4f6f534361855e","added-peer-peer-urls":["https:/}
{"level":"info","msg":"restored snapshot","path":"/var/lib/etcd.snapshot","wal-dir":"/var/lib/etcd/member/wal","data-dir":"/var/lib/etcd","snap-dir":"/var/lib/etcd/member/snap"}
Now etcd service should become healthy on the bootstrap node, Kubernetes control plane components
should start and control plane endpoint should become available.
Remaining control plane nodes join etcd cluster once control plane endpoint is up.
Single Control Plane Node Cluster
This guide applies to the single control plane clusters as well.
In fact, it is much more important to take regular snapshots of the etcd database in single control plane node
case, as loss of the control plane node might render the whole cluster irrecoverable without a backup.
4.9 - Egress Domains
Allowing outbound access for installing Talos
For some more constrained environments, it is important to whitelist only specific domains for outbound internet access.
These rules will need to be updated to allow for certain domains if the user wishes to still install and bootstrap Talos from public sources.
That said, users should also note that all of the following components can be mirrored locally with an internal registry, as well as a self-hosted discovery service and image factory.
The following list of egress domains was tested using a Fortinet FortiGate Next-Generation Firewall to confirm that Talos was installed, bootstrapped, and Kubernetes was fully up and running.
The FortiGate allows for passing in wildcard domains and will handle resolution of those domains to defined IPs automatically.
All traffic is HTTPS over port 443.
Discovery Service:
discovery.talos.dev
Image Factory:
factory.talos.dev
*.azurefd.net (Azure Front Door for serving cached assets)
Google Container Registry / Google Artifact Registry (GCR/GAR):
gcr.io
storage.googleapis.com (backing blob storage for images)
*.pkg.dev (backing blob storage for images)
Github Container Registry (GHCR)
ghcr.io
*.githubusercontent.com (backing blob storage for images)
Kubernetes Registry (k8s.io)
registry.k8s.io
*.s3.dualstack.us-east-1.amazonaws.com (backing blob storage for images)
Note: In this testing, DNS and NTP servers were updated to use those services that are built-in to the FortiGate.
These may also need to be allowed if the user cannot make use of internal services.
Additionally,these rules only cover that which is required for Talos to be fully installed and running.
There may be other domains like docker.io that must be allowed for non-default CNIs or workload container images.
4.10 - etcd Maintenance
Operational instructions for etcd database.
etcd database backs Kubernetes control plane state, so etcd health is critical for Kubernetes availability.
Space Quota
etcd default database space quota is set to 2 GiB by default.
If the database size exceeds the quota, etcd will stop operations until the issue is resolved.
This condition can be checked with talosctl etcd alarm list command:
$ talosctl -n <IP> etcd alarm list
NODE MEMBER ALARM
172.20.0.2 a49c021e76e707db NOSPACE
If the Kubernetes database contains lots of resources, space quota can be increased to match the actual usage.
The recommended maximum size is 8 GiB.
To increase the space quota, edit the etcd section in the machine configuration:
Once the node is rebooted with the new configuration, use talosctl etcd alarm disarm to clear the NOSPACE alarm.
Defragmentation
etcd database can become fragmented over time if there are lots of writes and deletes.
Kubernetes API server performs automatic compaction of the etcd database, which marks deleted space as free and ready to be reused.
However, the space is not actually freed until the database is defragmented.
If the database is heavily fragmented (in use/db size ratio is less than 0.5), defragmentation might increase the performance.
If the database runs over the space quota (see above), but the actual in use database size is small, defragmentation is required to bring the on-disk database size below the limit.
Current database size can be checked with talosctl etcd status command:
$ talosctl -n <CP1>,<CP2>,<CP3> etcd status
NODE MEMBER DB SIZE IN USE LEADER RAFT INDEX RAFT TERM RAFT APPLIED INDEX LEARNER ERRORS
172.20.0.3 ecebb05b59a776f1 21 MB 6.0 MB (29.08%) ecebb05b59a776f1 53391453391false172.20.0.2 a49c021e76e707db 17 MB 4.5 MB (26.10%) ecebb05b59a776f1 53391453391false172.20.0.4 eb47fb33e59bf0e2 20 MB 5.9 MB (28.96%) ecebb05b59a776f1 53391453391false
If any of the nodes are over database size quota, alarms will be printed in the ERRORS column.
To defragment the database, run talosctl etcd defrag command:
talosctl -n <CP1> etcd defrag
Note: defragmentation is a resource-intensive operation, so it is recommended to run it on a single node at a time.
Defragmentation to a live member blocks the system from reading and writing data while rebuilding its state.
Once the defragmentation is complete, the database size will match closely to the in use size:
$ talosctl -n <CP1> etcd status
NODE MEMBER DB SIZE IN USE LEADER RAFT INDEX RAFT TERM RAFT APPLIED INDEX LEARNER ERRORS
172.20.0.2 a49c021e76e707db 4.5 MB 4.5 MB (100.00%) ecebb05b59a776f1 56065456065false
Snapshotting
Regular backups of etcd database should be performed to ensure that the cluster can be restored in case of a failure.
This procedure is described in the disaster recovery guide.
4.11 - Extension Services
Use extension services in Talos Linux.
Talos provides a way to run additional system services early in the Talos boot process.
Extension services should be included into the Talos root filesystem (e.g. using system extensions).
Extension services run as privileged containers with ephemeral root filesystem located in the Talos root filesystem.
Extension services can be used to use extend core features of Talos in a way that is not possible via static pods or
Kubernetes DaemonSets.
Potential extension services use-cases:
storage: Open iSCSI, software RAID, etc.
networking: BGP FRR, etc.
platform integration: VMWare open VM tools, etc.
Configuration
Talos on boot scans directory /usr/local/etc/containers for *.yaml files describing the extension services to run.
Format of the extension service config:
Field name sets the service name, valid names are [a-z0-9-_]+.
The service container root filesystem path is derived from the name: /usr/local/lib/containers/<name>.
The extension service will be registered as a Talos service under an ext-<name> identifier.
container
entrypoint defines the container entrypoint relative to the container root filesystem (/usr/local/lib/containers/<name>)
environmentFile (deprecated) defines the path to a file containing environment variables, the service waits for the file to
exist before starting.
Use ExtensionServiceConfig instead.
environment defines the container environment variables.
args defines the additional arguments to pass to the entrypoint
mounts defines the volumes to be mounted into the container root
All requested directories will be mounted into the extension service container mount namespace.
If the source directory doesn’t exist in the host filesystem, it will be created (only for writable paths in the Talos root filesystem).
The rootfs is readonly by default unless writeableRootfs: true is set.
The sysfs is readonly by default unless writeableSysfs: true is set.
Masked paths if not set defaults to containerd defaults.
Masked paths will be mounted to /dev/null.
To set empty masked paths use:
container:
security:
maskedPaths: []
Read Only paths if not set defaults to containerd defaults.
Read-only paths will be mounted to /dev/null.
To set empty read only paths use:
container:
security:
readonlyPaths: []
Rootfs propagation is not set by default (container mounts are private).
depends
The depends section describes extension service start dependencies: the service will not be started until all dependencies are met.
Available dependencies:
service: <name>: wait for the service <name> to be running and healthy
path: <path>: wait for the <path> to exist
network: [addresses, connectivity, hostname, etcfiles]: wait for the specified network readiness checks to succeed
time: true: wait for the NTP time sync
configuration: true: wait for ExtensionServiceConfig resource with a name matching the extension name to be available.
The mounts specified in the ExtensionServiceConfig will be added as extra mounts to the extension service.
restart
Field restart defines the service restart policy, it allows to either configure an always running service or a one-shot service:
always: restart service always
never: start service only once and never restart
untilSuccess: restart failing service, stop restarting on successful run
logToConsole
Field logToConsole defines whether the service logs should also be written to the console, i.e., to kernel log buffer (or to the container logs in container mode).
This feature is particularly useful for debugging extensions that operate in maintenance mode or early in the boot process when service logs cannot be accessed yet.
Example
Example layout of the Talos root filesystem contents for the extension service:
Talos starts the container for the extension service with container root filesystem at /usr/local/lib/containers/hello-world:
/
├── hello
└── config.ini
Extension service is registered as ext-hello-world in talosctl services:
$ talosctl service ext-hello-world
NODE 172.20.0.5
ID ext-hello-world
STATE Running
HEALTH ?
EVENTS [Running]: Started task ext-hello-world (PID 1100)for container ext-hello-world (2m47s ago)[Preparing]: Creating service runner (2m47s ago)[Preparing]: Running pre state (2m47s ago)[Waiting]: Waiting for service "containerd" to be "up"(2m48s ago)[Waiting]: Waiting for service "containerd" to be "up", network (2m49s ago)
An extension service can be started, restarted and stopped using talosctl service ext-hello-world start|restart|stop.
Use talosctl logs ext-hello-world to get the logs of the service.
Complete example of the extension service can be found in the extensions repository.
4.12 - Install KubeVirt on Talos
This is a guide on how to get started with KubeVirt on Talos
KubeVirt allows you to run virtual machines on Kubernetes.
It runs with QEMU and KVM to provide a seamless virtual machine experience and can be mixed with containerized workloads.
This guide explains on how to install KubeVirt on Talos.
Prerequisites
For KubeVirt and Talos to work you have to enable certain configurations in the BIOS and configure Talos properly for it to work.
Enable virtualization in your BIOS
On many new PCs and servers, virtualization is enabled by default.
Please consult your manufacturer on how to enable this in the BIOS.
You can also run KubeVirt from within a virtual machine.
For that to work you have to enable Nested Virtualization.
This can also be done in the BIOS.
Configure your network interface in bridge mode (optional)
When you want to leverage Multus to give your virtual machines direct access to your node network, your bridge needs to be configured properly.
This can be done by setting your network interface in bridge mode.
You can look up the network interface name by using the following command:
$ talosctl get links -n 10.99.101.9
NODE NAMESPACE TYPE ID VERSION TYPE KIND HW ADDR OPER STATE LINK STATE
10.99.101.9 network LinkStatus bond0 1 ether bond 52:62:01:53:5b:a7 down false10.99.101.9 network LinkStatus br0 3 ether bridge bc:24:11:a1:98:fc up true10.99.101.9 network LinkStatus cni0 9 ether bridge 1e:5e:99:8f:1e:19 up true10.99.101.9 network LinkStatus dummy0 1 ether dummy 62:1c:3e:d5:72:11 down false10.99.101.9 network LinkStatus eth0 5 ether bc:24:11:a1:98:fc
In this case, this network interface is called eth0.
Now you can configure your bridge properly.
This can be done in the machine config of your node:
machine:
interfaces:
- interface: br0
addresses:
- 10.99.101.9/24
bridge:
stp:
enabled: trueinterfaces:
- eth0 # This must be changed to your matching interface nameroutes:
- network: 0.0.0.0/0 # The route's network (destination).gateway: 10.99.101.254# The route's gateway (if empty, creates link scope route).metric: 1024# The optional metric for the route.
Install the local-path-provisioner
When we are using KubeVirt, we are also installing the CDI (containerized data importer) operator.
For this to work properly, we have to install the local-path-provisioner.
This CNI can be used to write scratch space when importing images with the CDI.
You can install the local-path-provisioner by following this guide.
Configure storage
If you would like to use features such as LiveMigration shared storage is neccesary.
You can either choose to install a CSI that connects to NFS or you can install Longhorn, for example.
For more information on how to install Longhorn on Talos you can follow this link.
To install the NFS-CSI driver, you can follow This guide.
After the installation of the NFS-CSI driver is done, you can create a storage class for the NFS CSI driver to work:
Note that this is just an example.
Make sure to set the nolock option.
If not, the nfs-csi storageclass won’t work, because talos doesn’t have a rpc.statd daemon running.
Install virtctl
virtctl is needed for communication between the CLI and the KubeVirt api server.
You can install the virtctl client directly by running:
Or you can use krew to integrate it nicely in kubectl:
kubectl krew install virt
Installing KubeVirt
After the neccesary preperations are done, you can now install KubeVirt.
This can either be done through the Operator Lifecycle Manager or by just simply applying a YAML file.
We will keep this simple and do the following:
# Point at latest releaseexportRELEASE=$(curl https://storage.googleapis.com/kubevirt-prow/release/kubevirt/kubevirt/stable.txt)# Deploy the KubeVirt operatorkubectl apply -f https://github.com/kubevirt/kubevirt/releases/download/${RELEASE}/kubevirt-operator.yaml
After the operator is installed, it is time to apply the Custom Resource (CR) for the operator to fully deploy KubeVirt.
---
apiVersion: kubevirt.io/v1
kind: KubeVirt
metadata:
name: kubevirt
namespace: kubevirt
spec:
configuration:
developerConfiguration:
featureGates:
- LiveMigration
- NetworkBindingPlugins
smbios:
sku: "TalosCloud"version: "v0.1.0"manufacturer: "Talos Virtualization"product: "talosvm"family: "ccio"workloadUpdateStrategy:
workloadUpdateMethods:
- LiveMigrate # enable if you have deployed either Longhorn or NFS-CSI for shared storage.
KubeVirt configuration options
In this yaml file we specified certain configurations:
featureGates
KubeVirt has a set of features that are not mature enough to be enabled by default.
As such, they are protected by a Kubernetes concept called feature gates.
More information about the feature gates can be found in the KubeVirt documentation.
In this example we enable:
LiveMigration – For live migration of virtual machines to other nodes
NetworkBindingPlugins – This is needed for Multus to work.
smbios
Here we configure a specific smbios configuration.
This can be useful when you want to give your virtual machines a own sku, manufacturer name etc.
workloadUpdateStrategy
If this is configured, virtual machines will be live migrated to other nodes when KubeVirt is updated.
Installing CDI
The CDI (containerized data importer) is needed to import virtual disk images in your KubeVirt cluster.
The CDI can do the following:
Import images of type:
qcow2
raw
iso
Import disks from either:
http/https
uploaded through virtctl
Container registry
Another PVC
You can either import these images by creating a DataVolume CR or by integrating this in your VirtualMachine CR.
When applying either the DataVolume CR or the VirtualMachine CR with a dataVolumeTemplates, the CDI kicks in and will do the following:
creates a PVC with the requirements from either the DataVolume or the dataVolumeTemplates
starts a pod
writes temporary scratch space to local disk
downloads the image
extracts it to the temporary scratch space
copies the image to the PVC
Installing the CDI is very simple:
# Point to latest releaseexportTAG=$(curl -s -w %{redirect_url}\
https://github.com/kubevirt/containerized-data-importer/releases/latest)exportVERSION=$(echo${TAG##*/})# install operatorkubectl create -f \
https://github.com/kubevirt/containerized-data-importer/releases/download/$VERSION/cdi-operator.yaml
After that, you can apply a CDI CR for the CDI operator to fully deploy CDI:
This CR has some special settings that are needed for CDI to work properly:
scratchSpaceStorageClass
This is the storage class that we installed earlier with the local-path-provisioner.
This is needed for the CDI to write scratch space to local disk before importing the image
podResourceRequirements
In many cases the default resource requests and limits are not sufficient for the importer pod to import the image.
This will result in a crash of the importer pod.
After applying this yaml file, the CDI operator is ready.
Creating your first virtual machine
Now it is time to create your first virtual machine in KubeVirt.
Below we will describe two examples:
In this examples we install a basic Fedora 40 virtual machine and install a webserver.
After applying this YAML, the CDI will import the image and create a Datavolume.
You can monitor this process by running:
kubectl get dv -w
After the DataVolume is created, you can start the virtual machine:
kubectl virt start fedora-vm
By starting the virtual machine, KubeVirt will create a instance of that VirtualMachine called VirtualMachineInstance:
kubectl get virtualmachineinstance
NAME AGE PHASE IP NODENAME READY
fedora-vm 13s Running 10.244.4.92 kube1 True
You can view the console of the virtual machine by running:
kubectl virt console fedora-vm
or by running:
kubectl virt vnc fedora-vm
with the console command it will open a terminal to the virtual machine.
With the vnc command, it will open vncviewer.
Note that a vncviewer needs to installed for it to work.
Now you can create a Service object to expose the virtual machine to the outside.
In this example we will use MetalLB as a LoadBalancer.
In this example we will create a virtual machine that is bound to the bridge interface with the help of Multus.
You can start the machine with kubectl virt start fedora-vm.
After that you can look up the ip address of the virtual machine with
kubectl get vmi -owide
NAME AGE PHASE IP NODENAME READY LIVE-MIGRATABLE PAUSED
fedora-vm 6d9h Running 10.99.101.53 kube1 True True
Other forms of management
There is a project called KubeVirt-Manager for managing virtual machines with KubeVirt through a nice web interface.
You can also choose to deploy virtual machines with ArgoCD of Flux.
Documentation
KubeVirt has a huge documentation page where you can check out everything on running virtual machines with KubeVirt.
The documentation can be found here.
How to authenticate Talos machine configuration download (talos.config=) on metal platform using OAuth.
Talos Linux when running on the metal platform can be configured to authenticate the machine configuration download using OAuth2 device flow.
The machine configuration is fetched from the URL specified with talos.config kernel argument, and by default this HTTP request is not authenticated.
When the OAuth2 authentication is enabled, Talos will authenticate the request using OAuth device flow first, and then pass the token to the machine configuration download endpoint.
Prerequisites
Obtain the following information:
OAuth client ID (mandatory)
OAuth client secret (optional)
OAuth device endpoint
OAuth token endpoint
OAuth scopes, audience (optional)
OAuth client secret (optional)
extra Talos variables to send to the device auth endpoint (optional)
Configuration
Set the following kernel parameters on the initial Talos boot to enable the OAuth flow:
talos.config set to the URL of the machine configuration endpoint (which will be authenticated using OAuth)
talos.config.oauth.client_id set to the OAuth client ID (required)
talos.config.oauth.client_secret set to the OAuth client secret (optional)
talos.config.oauth.scope set to the OAuth scopes (optional, repeat the parameter for multiple scopes)
talos.config.oauth.audience set to the OAuth audience (optional)
talos.config.oauth.device_auth_url set to the OAuth device endpoint (if not set defaults to talos.config URL with the path /device/code)
talos.config.oauth.token_url set to the OAuth token endpoint (if not set defaults to talos.config URL with the path /token)
talos.config.oauth.extra_variable set to the extra Talos variables to send to the device auth endpoint (optional, repeat the parameter for multiple variables)
The list of variables supported by the talos.config.oauth.extra_variable parameter is same as the list of variables supported by the talos.config parameter.
Flow
On the initial Talos boot, when machine configuration is not available, Talos will print the following messages:
[talos] downloading config {"component": "controller-runtime", "controller": "config.AcquireController", "platform": "metal"}
[talos] waiting for network to be ready
[talos] [OAuth] starting the authentication device flow with the following settings:
[talos] [OAuth] - client ID: "<REDACTED>"
[talos] [OAuth] - device auth URL: "https://oauth2.googleapis.com/device/code"
[talos] [OAuth] - token URL: "https://oauth2.googleapis.com/token"
[talos] [OAuth] - extra variables: ["uuid" "mac"]
[talos] waiting for variables: [uuid mac]
[talos] waiting for variables: [mac]
[talos] [OAuth] please visit the URL https://www.google.com/device and enter the code <REDACTED>
[talos] [OAuth] waiting for the device to be authorized (expires at 14:46:55)...
If the OAuth service provides the complete verification URL, the QR code to scan is also printed to the console:
How to use META-based network configuration on Talos metal platform.
Note: This is an advanced feature which requires deep understanding of Talos and Linux network configuration.
Talos Linux when running on a cloud platform (e.g. AWS or Azure), uses the platform-provided metadata server to provide initial network configuration to the node.
When running on bare-metal, there is no metadata server, so there are several options to provide initial network configuration (before machine configuration is acquired):
use automatic network configuration via DHCP (Talos default)
use automatic network configuration via DHCP just enough to fetch machine configuration and then use machine configuration to set desired advanced configuration.
If DHCP option is available, it is by far the easiest way to configure networking.
The initial boot kernel command line parameters are not very flexible, and they are not persisted after initial Talos installation.
Talos starting with version 1.4.0 offers a new option to configure networking on bare-metal: META-based network configuration.
Note: META-based network configuration is only available on Talos Linux metal platform.
Talos dashboard provides a way to configure META-based network configuration for a machine using the console, but
it doesn’t support all kinds of network configuration.
Network Configuration Format
Talos META-based network configuration is a YAML file with the following format:
Every section is optional, so you can configure only the parts you need.
The format of each section matches the respective network *Spec resource.spec part, e.g the addresses:
section matches the .spec of AddressSpec resource:
So one way to prepare the network configuration file is to boot Talos Linux, apply necessary network configuration using Talos machine configuration, and grab the resulting
resources from the running Talos instance.
In this guide we will briefly cover the most common examples of the network configuration.
Addresses
The addresses configured are usually routable IP addresses assigned to the machine, so
the scope: should be set to global and flags: to permanent.
Additionally, family: should be set to either inet4 or init6 depending on the address family.
The linkName: property should match the name of the link the address is assigned to, it might be a physical link,
e.g. en9sp0, or the name of a logical link, e.g. bond0, created in the links: section.
If the timeServers: is not set, Talos will use default NTP servers.
Supplying META Network Configuration
Once the network configuration YAML document is ready, it can be supplied to Talos in one of the following ways:
for a running Talos machine, using Talos API (requires already established network connectivity)
for Talos disk images, it can be embedded into the image
for ISO/PXE boot methods, it can be supplied via kernel command line parameters as an environment variable
The metal network configuration is stored in Talos META partition under the key 0xa (decimal 10).
In this guide we will assume that the prepared network configuration is stored in the file network.yaml.
Note: as JSON is a subset of YAML, the network configuration can be also supplied as a JSON document.
Supplying Network Configuration to a Running Talos Machine
Use the talosctl to write a network configuration to a running Talos machine:
talosctl meta write 0xa "$(cat network.yaml)"
Supplying Network Configuration to a Talos Disk Image
Following the boot assets guide, create a disk image passing the network configuration as a --meta flag:
docker run --rm -t -v $PWD/_out:/out -v /dev:/dev --privileged ghcr.io/siderolabs/imager:v1.9.0 metal --meta "0xa=$(cat network.yaml)"
Supplying Network Configuration to a Talos ISO/PXE Boot
As there is no META partition created yet before Talos Linux is installed, META values can be set as an environment variable INSTALLER_META_BASE64 passed to the initial boot of Talos.
The supplied value will be used immediately, and also it will be written to the META partition once Talos is installed.
When using imager to create the ISO, the INSTALLER_META_BASE64 environment variable will be automatically generated from the --meta flag:
$ docker run --rm -t -v $PWD/_out:/out ghcr.io/siderolabs/imager:v1.9.0 iso --meta "0xa=$(cat network.yaml)"...
kernel command line: ... talos.environment=INSTALLER_META_BASE64=MHhhPWZvbw==
When PXE booting, the value of INSTALLER_META_BASE64 should be set manually:
The resulting base64 string should be passed as an environment variable INSTALLER_META_BASE64 to the initial boot of Talos: talos.environment=INSTALLER_META_BASE64=<base64-encoded value>.
Getting Current META Network Configuration
Talos exports META keys as resources:
# talosctl get meta 0x0a -o yaml...
spec:
value: '{"addresses": ...}'
4.15 - Migrating from Kubeadm
Migrating Kubeadm-based clusters to Talos.
It is possible to migrate Talos from a cluster that is created using
kubeadm to Talos.
High-level steps are the following:
Collect CA certificates and a bootstrap token from a control plane node.
Create a Talos machine config with the CA certificates with the ones you collected.
Update control plane endpoint in the machine config to point to the existing control plane (i.e. your load balancer address).
Boot a new Talos machine and apply the machine config.
Verify that the new control plane node is ready.
Remove one of the old control plane nodes.
Repeat the same steps for all control plane nodes.
Verify that all control plane nodes are ready.
Repeat the same steps for all worker nodes, using the machine config generated for the workers.
Remarks on kube-apiserver load balancer
While migrating to Talos, you need to make sure that your kube-apiserver load balancer is in place
and keeps pointing to the correct set of control plane nodes.
This process depends on your load balancer setup.
If you are using an LB that is external to the control plane nodes (e.g. cloud provider LB, F5 BIG-IP, etc.),
you need to make sure that you update the backend IPs of the load balancer to point to the control plane nodes as
you add Talos nodes and remove kubeadm-based ones.
If your load balancing is done on the control plane nodes (e.g. keepalived + haproxy on the control plane nodes),
you can do the following:
Add Talos nodes and remove kubeadm-based ones while updating the haproxy backends
to point to the newly added nodes except the last kubeadm-based control plane node.
Turn off keepalived to drop the virtual IP used by the kubeadm-based nodes (introduces kube-apiserver downtime).
Set up a virtual-IP based new load balancer on the new set of Talos control plane nodes.
Use the previous LB IP as the LB virtual IP.
Verify apiserver connectivity over the Talos-managed virtual IP.
Migrate the last control-plane node.
Prerequisites
Admin access to the kubeadm-based cluster
Access to the /etc/kubernetes/pki directory (e.g. SSH & root permissions)
on the control plane nodes of the kubeadm-based cluster
Access to kube-apiserver load-balancer configuration
Step-by-step guide
Download /etc/kubernetes/pki directory from a control plane node of the kubeadm-based cluster.
Create a new join token for the new control plane nodes:
# inside a control plane nodekubeadm token create --ttl 0
Create Talos secrets from the PKI directory you downloaded on step 1 and the token you generated on step 2:
talosctl gen secrets --kubernetes-bootstrap-token <TOKEN> --from-kubernetes-pki <PKI_DIR>
Create a new Talos config from the secrets:
talosctl gen config --with-secrets secrets.yaml <CLUSTER_NAME> https://<EXISTING_CLUSTER_LB_IP>
Collect the information about the kubeadm-based cluster from the kubeadm configmap:
kubectl get configmap -n kube-system kubeadm-config -oyaml
Take note of the following information in the ClusterConfiguration:
.controlPlaneEndpoint
.networking.dnsDomain
.networking.podSubnet
.networking.serviceSubnet
Replace the following information in the generated controlplane.yaml:
.cluster.network.cni.name with none
.cluster.network.podSubnets[0] with the value of the networking.podSubnet from the previous step
.cluster.network.serviceSubnets[0] with the value of the networking.serviceSubnet from the previous step
.cluster.network.dnsDomain with the value of the networking.dnsDomain from the previous step
Go through the rest of controlplane.yaml and worker.yaml to customize them according to your needs, especially :
.cluster.secretboxEncryptionSecret should be either removed if you don’t currently use EncryptionConfig on your kube-apiserver or set to the correct value
Make sure that, on your current Kubeadm cluster, the first --service-account-issuer= parameter in /etc/kubernetes/manifests/kube-apiserver.yaml is equal to the value of .cluster.controlPlane.endpoint in controlplane.yaml.
If it’s not, add a new --service-account-issuer= parameter with the correct value before your current one in /etc/kubernetes/manifests/kube-apiserver.yaml on all of your control planes nodes, and restart the kube-apiserver containers.
Bring up a Talos node to be the initial Talos control plane node.
Apply the generated controlplane.yaml to the Talos control plane node:
Your kubeadm kube-proxy configuration may not be compatible with the one generated by Talos, which will make the Talos Kubernetes upgrades impossible (labels may not be the same, and selector.matchLabels is an immutable field).
To be sure, export your current kube-proxy daemonset manifest, check the labels, they have to be:
tier: node
k8s-app: kube-proxy
If the are not, modify all the labels fields, save the file, delete your current kube-proxy daemonset, and apply the one you modified.
4.16 - OCI Base Runtime Specification
Adjusting OCI base runtime specification for CRI containers.
Every container initiated by the Container Runtime Interface (CRI) adheres to the OCI runtime specification.
While certain aspects of this specification can be modified through Kubernetes pod and container configurations, others remain fixed.
Talos Linux provides the capability to adjust the OCI base runtime specification for all containers managed by the CRI.
However, it is important to note that the Kubernetes/CRI plugin may still override some settings, meaning changes to the base runtime specification are not always guaranteed to take effect.
Getting Current OCI Base Runtime Specification
To get the current OCI base runtime specification, you can use the following command (yq -P . is used to pretty-print the output):
In this example, the number of open files is adjusted to be 1024 for all containers (OCI default is unset, so it inherits the Talos default of 1048576 open files).
The contents of the baseRuntimeSpecOverrides field are merged with the current base runtime specification, so only the fields that need to be adjusted should be included.
This configuration change will be applied with a machine reboot, and OCI base runtime specification will only affect new containers created after the change on the node.
4.17 - Overlays
Overlays
Overlays provide a way to customize Talos Linux boot image.
Overlays hook into the Talos install steps and can be used to provide additional boot assets (in the case of single board computers),
extra kernel arguments or some custom configuration that is not part of the default Talos installation and specific to a particular overlay.
Overlays v/s Extensions
Overlays are similar to extensions, but they are used to customize the installation process, while extensions are used to customize the root filesystem.
Note: The schematic id shown in the above patch is for a vanilla rpi_generic overlay.
Replace it with the schematic id of the overlay you want to apply.
Authoring Overlays
An Overlay is a container image with the specific folder structure.
Overlays can be built and managed using any tool that produces container images, e.g. docker build.
Let’s assume that you would like to contribute an overlay for a specific board, e.g. by contributing to the sbc-rockchip repository.
Clone the repositry and insepct the existing overlays to understand the structure.
Usually an overlay consist of a few key components:
firmware: contains the firmware files required for the board
bootloader: contains the bootloader, e.g. u-boot for the board
dtb: contains the device tree blobs for the board
installer: contains the installer that will be used to install this overlay on the node
profile: contains information
For the new overlay, create any needed folders and pkg.yaml files.
If your board introduces a new chipset that is not supported yet, make sure to add the firmware build for it.
Add the necessary u-boot and dtb build steps to the pkg.yaml files.
Proceed to add an installer, which is a small go binary that will be used to install the overlay on the node.
Here you need to add the go src/ as well as the pkg.yaml file.
Lastly, add the profile information in the profiles folder.
You are now ready to attempt building the overlay.
It’s recommend to push the build to a container registry to test the overlay with the Talos installer.
The default settings are:
REGISTRY is set to ghcr.io
USERNAME is set to the siderolabs (or value of environment variable USERNAME if it is set)
make sbc-rockchip PUSH=true
If using a custom registry, the REGISTRY and USERNAME variables can be set:
make sbc-rockchip PUSH=trueREGISTRY=<registry> USERNAME=<username>
After building the overlay, take note of the pushed image tag, e.g. 664638a, because you will need it for the next step.
You can now build a flashable image using the command below.
--overlay-option can be used to pass additional options to the overlay installer if they are implemented by the overlay.
An example can be seen in the sbc-raspberrypi overlay repository.
It supports passing multiple options by repeating the flag or can be read from a yaml document by passing --overlay-option=@<path to file>.
IMPORTANT: If this does not succeed, have a look at the documentation of the external dependecies you are pulling in and make sure that the pkg.yaml files are correctly configured.
In some cases it may be required to update the dependencies to an appropriate version via the Pkgfile.
4.18 - Proprietary Kernel Modules
Adding a proprietary kernel module to Talos Linux
Patching and building the kernel image
Clone the pkgs repository from Github and check out the revision corresponding to your version of Talos Linux
Clone the Linux kernel and check out the revision that pkgs uses (this can be found in kernel/kernel-prepare/pkg.yaml and it will be something like the following: https://cdn.kernel.org/pub/linux/kernel/v5.x/linux-x.xx.x.tar.xz)
git clone https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git &&cd linux
git checkout v5.15
Your module will need to be converted to be in-tree.
The steps for this are different depending on the complexity of the module to port, but generally it would involve moving the module source code into the drivers tree and creating a new Makefile and Kconfig.
Stage your changes in Git with git add -A.
Run git diff --cached --no-prefix > foobar.patch to generate a patch from your changes.
Copy this patch to kernel/kernel/patches in the pkgs repo.
Add a patch line in the prepare segment of kernel/kernel/pkg.yaml:
patch -p0 < /pkg/patches/foobar.patch
Build the kernel image.
Make sure you are logged in to ghcr.io before running this command, and you can change or omit PLATFORM depending on what you want to target.
make kernel PLATFORM=linux/amd64 USERNAME=your-username PUSH=true
Make a note of the image name the make command outputs.
Building the installer image
Copy the following into a new Dockerfile:
FROM scratch AS customizationCOPY --from=ghcr.io/your-username/kernel:<kernel version> /lib/modules /lib/modules
FROM ghcr.io/siderolabs/installer:<talos version>COPY --from=ghcr.io/your-username/kernel:<kernel version> /boot/vmlinuz /usr/install/${TARGETARCH}/vmlinuz
Using Talos Linux to set up static pods in Kubernetes.
Static Pods
Static pods are run directly by the kubelet bypassing the Kubernetes API server checks and validations.
Most of the time DaemonSet is a better alternative to static pods, but some workloads need to run
before the Kubernetes API server is available or might need to bypass security restrictions imposed by the API server.
Talos renders static pod definitions to the kubelet using a local HTTP server, kubelet picks up the definition and launches the pod.
Talos accepts changes to the static pod configuration without a reboot.
To see a full list of static pods, use talosctl get staticpods, and to see the status of the static pods (as reported by the kubelet), use talosctl get staticpodstatus.
Usage
Kubelet mirrors pod definition to the API server state, so static pods can be inspected with kubectl get pods, logs can be retrieved with kubectl logs, etc.
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-talos-default-controlplane-2 1/1 Running 0 17s
If the API server is not available, status of the static pod can also be inspected with talosctl containers --kubernetes:
Logs of static pods can be retrieved with talosctl logs --kubernetes:
$ talosctl logs --kubernetes default/nginx-talos-default-controlplane-2:nginx:4183a7d7a771
172.20.0.3: 2022-02-10T15:26:01.289208227Z stderr F 2022/02/10 15:26:01 [notice] 1#1: using the "epoll" event method
172.20.0.3: 2022-02-10T15:26:01.2892466Z stderr F 2022/02/10 15:26:01 [notice] 1#1: nginx/1.21.6
172.20.0.3: 2022-02-10T15:26:01.28925723Z stderr F 2022/02/10 15:26:01 [notice] 1#1: built by gcc 10.2.1 20210110(Debian 10.2.1-6)
Troubleshooting
Talos doesn’t perform any validation on the static pod definitions.
If the pod isn’t running, use kubelet logs (talosctl logs kubelet) to find the problem:
$ talosctl logs kubelet
172.20.0.2: {"ts":1644505520281.427,"caller":"config/file.go:187","msg":"Could not process manifest file","path":"/etc/kubernetes/manifests/talos-default-nginx-gvisor.yaml","err":"invalid pod: [spec.containers: Required value]"}
Resource Definitions
Static pod definitions are available as StaticPod resources combined with Talos-generated control plane static pods:
$ talosctl get staticpods
NODE NAMESPACE TYPE ID VERSION
172.20.0.3 k8s StaticPod default-nginx 1172.20.0.3 k8s StaticPod kube-apiserver 1172.20.0.3 k8s StaticPod kube-controller-manager 1172.20.0.3 k8s StaticPod kube-scheduler 1
Talos assigns ID <namespace>-<name> to the static pods specified in the machine configuration.
On control plane nodes status of the running static pods is available in the StaticPodStatus resource:
$ talosctl get staticpodstatus
NODE NAMESPACE TYPE ID VERSION READY
172.20.0.3 k8s StaticPodStatus default/nginx-talos-default-controlplane-2 2 True
172.20.0.3 k8s StaticPodStatus kube-system/kube-apiserver-talos-default-controlplane-2 2 True
172.20.0.3 k8s StaticPodStatus kube-system/kube-controller-manager-talos-default-controlplane-2 3 True
172.20.0.3 k8s StaticPodStatus kube-system/kube-scheduler-talos-default-controlplane-2 3 True
4.20 - Talos API access from Kubernetes
How to access Talos API from within Kubernetes.
In this guide, we will enable the Talos feature to access the Talos API from within Kubernetes.
Enabling the Feature
Edit the machine configuration to enable the feature, specifying the Kubernetes namespaces from which Talos API
can be accessed and the allowed Talos API roles.
talosctl -n 172.20.0.2 edit machineconfig
Configure the kubernetesTalosAPIAccess like the following:
The cosign tool can be used to verify the signatures of the Talos container images:
$ cosign verify --certificate-identity-regexp '@siderolabs\.com$' --certificate-oidc-issuer https://accounts.google.com ghcr.io/siderolabs/installer:v1.4.0
Verification for ghcr.io/siderolabs/installer:v1.4.0 --
The following checks were performed on each of these signatures:
- The cosign claims were validated
- Existence of the claims in the transparency log was verified offline
- The code-signing certificate was verified using trusted certificate authority certificates
[{"critical":{"identity":{"docker-reference":"ghcr.io/siderolabs/installer"},"image":{"docker-manifest-digest":"sha256:f41795cc88f40eb1bc6b3c638c4a3123f6ef3c90627bfc35c04ebab82581e3ee"},"type":"cosign container image signature"},"optional":{"1.3.6.1.4.1.57264.1.1":"https://accounts.google.com","Bundle":{"SignedEntryTimestamp":"MEQCIERkQpgEnPWnfjUHIWO9QxC9Ute3/xJOc7TO5GUnu59xAiBKcFvrDWHoUYChT0/+gaazTrI+r0/GWSbi+Q+sEQ5AKA==","Payload":{"body":"eyJhcGlWZXJzaW9uIjoiMC4wLjEiLCJraW5kIjoiaGFzaGVkcmVrb3JkIiwic3BlYyI6eyJkYXRhIjp7Imhhc2giOnsiYWxnb3JpdGhtIjoic2hhMjU2IiwidmFsdWUiOiJkYjhjYWUyMDZmODE5MDlmZmI4NjE4ZjRkNjIzM2ZlYmM3NzY5MzliOGUxZmZkMTM1ODA4ZmZjNDgwNjYwNGExIn19LCJzaWduYXR1cmUiOnsiY29udGVudCI6Ik1FVUNJUURQWXhiVG5vSDhJTzBEakRGRE9rNU1HUjRjMXpWMys3YWFjczNHZ2J0TG1RSWdHczN4dVByWUgwQTAvM1BSZmZydDRYNS9nOUtzQVdwdG9JbE9wSDF0NllrPSIsInB1YmxpY0tleSI6eyJjb250ZW50IjoiTFMwdExTMUNSVWRKVGlCRFJWSlVTVVpKUTBGVVJTMHRMUzB0Q2sxSlNVTXhha05EUVd4NVowRjNTVUpCWjBsVlNIbEhaRTFQVEhkV09WbFFSbkJYUVRKb01qSjRVM1ZIZVZGM2QwTm5XVWxMYjFwSmVtb3dSVUYzVFhjS1RucEZWazFDVFVkQk1WVkZRMmhOVFdNeWJHNWpNMUoyWTIxVmRWcEhWakpOVWpSM1NFRlpSRlpSVVVSRmVGWjZZVmRrZW1SSE9YbGFVekZ3WW01U2JBcGpiVEZzV2tkc2FHUkhWWGRJYUdOT1RXcE5kMDVFUlRSTlZHZDZUbXBWTlZkb1kwNU5hazEzVGtSRk5FMVVaekJPYWxVMVYycEJRVTFHYTNkRmQxbElDa3R2V2tsNmFqQkRRVkZaU1V0dldrbDZhakJFUVZGalJGRm5RVVZaUVdKaVkwbDZUVzR3ZERBdlVEZHVUa0pNU0VscU1rbHlORTFQZGpoVVRrVjZUemNLUkVadVRXSldVbGc0TVdWdmExQnVZblJHTVZGMmRWQndTVm95VkV3NFFUUkdSMWw0YldFeGJFTk1kMkk0VEZOVWMzRlBRMEZZYzNkblowWXpUVUUwUndwQk1WVmtSSGRGUWk5M1VVVkJkMGxJWjBSQlZFSm5UbFpJVTFWRlJFUkJTMEpuWjNKQ1owVkdRbEZqUkVGNlFXUkNaMDVXU0ZFMFJVWm5VVlZqYWsweUNrbGpVa1lyTkhOVmRuRk5ia3hsU0ZGMVJIRkdRakZqZDBoM1dVUldVakJxUWtKbmQwWnZRVlV6T1ZCd2VqRlphMFZhWWpWeFRtcHdTMFpYYVhocE5Ga0tXa1E0ZDB0M1dVUldVakJTUVZGSUwwSkRSWGRJTkVWa1dWYzFhMk50VmpWTWJrNTBZVmhLZFdJeldrRmpNbXhyV2xoS2RtSkhSbWxqZVRWcVlqSXdkd3BMVVZsTFMzZFpRa0pCUjBSMmVrRkNRVkZSWW1GSVVqQmpTRTAyVEhrNWFGa3lUblprVnpVd1kzazFibUl5T1c1aVIxVjFXVEk1ZEUxRGMwZERhWE5IQ2tGUlVVSm5OemgzUVZGblJVaFJkMkpoU0ZJd1kwaE5Oa3g1T1doWk1rNTJaRmMxTUdONU5XNWlNamx1WWtkVmRWa3lPWFJOU1VkTFFtZHZja0puUlVVS1FXUmFOVUZuVVVOQ1NIZEZaV2RDTkVGSVdVRXpWREIzWVhOaVNFVlVTbXBIVWpSamJWZGpNMEZ4U2t0WWNtcGxVRXN6TDJnMGNIbG5Remh3TjI4MFFRcEJRVWRJYkdGbVp6Um5RVUZDUVUxQlVucENSa0ZwUVdKSE5tcDZiVUkyUkZCV1dUVXlWR1JhUmtzeGVUSkhZVk5wVW14c1IydHlSRlpRVXpsSmJGTktDblJSU1doQlR6WlZkbnBFYVVOYVFXOXZSU3RLZVdwaFpFdG5hV2xLT1RGS00yb3ZZek5CUTA5clJIcFhOamxaVUUxQmIwZERRM0ZIVTAwME9VSkJUVVFLUVRKblFVMUhWVU5OUVZCSlRUVjJVbVpIY0VGVWNqQTJVR1JDTURjeFpFOXlLMHhFSzFWQ04zbExUVWRMWW10a1UxTnJaMUp5U3l0bGNuZHdVREp6ZGdvd1NGRkdiM2h0WlRkM1NYaEJUM2htWkcxTWRIQnpjazFJZGs5cWFFSmFTMVoxVG14WmRXTkJaMVF4V1VWM1ZuZHNjR2QzYTFWUFdrWjRUemRrUnpONkNtVnZOWFJ3YVdoV1kyTndWMlozUFQwS0xTMHRMUzFGVGtRZ1EwVlNWRWxHU1VOQlZFVXRMUzB0TFFvPSJ9fX19","integratedTime":1681843022,"logIndex":18304044,"logID":"c0d23d6ad406973f9559f3ba2d1ca01f84147d8ffc5b8445c224f98b9591801d"}},"Issuer":"https://accounts.google.com","Subject":"andrey.smirnov@siderolabs.com"}}]
The image should be signed using cosign certificate authority flow by a Sidero Labs employee with and email from siderolabs.com domain.
Reproducible Builds
Talos builds for kernel, initramfs, talosctl, ISO image, and container images are reproducible.
So you can verify that the build is the same as the one as provided on GitHub releases page.
Using hardware watchdogs to workaround hardware/software lockups.
Talos Linux now supports hardware watchdog timers configuration.
Hardware watchdog timers allow to reset (reboot) the system if the software stack becomes unresponsive.
Please consult your hardware/VM documentation for the availability of the hardware watchdog timers.
Configuration
To discover the available watchdog devices, run:
$ talosctl ls /sys/class/watchdog/
NODE NAME
172.20.0.2 .
172.20.0.2 watchdog0
172.20.0.2 watchdog1
The implementation of the watchdog device can be queried with:
Talos Linux will set up the watchdog time with a 5-minute timeout, and it will keep resetting the timer to prevent the system from rebooting.
If the software becomes unresponsive, the watchdog timer will expire, and the system will be reset by the watchdog hardware.
Inspection
To inspect the watchdog timer configuration, run:
$ talosctl get watchdogtimerconfig
NODE NAMESPACE TYPE ID VERSION DEVICE TIMEOUT
172.20.0.2 runtime WatchdogTimerConfig timer 1 /dev/watchdog0 5m0s
To inspect the watchdog timer status, run:
$ talosctl get watchdogtimerstatus
NODE NAMESPACE TYPE ID VERSION DEVICE TIMEOUT
172.20.0.2 runtime WatchdogTimerStatus timer 1 /dev/watchdog0 5m0s
Current status of the watchdog timer can also be inspected via Linux sysfs:
$ talosctl read /sys/class/watchdog/watchdog0/state
active
CriImageCacheCopyStatus describes image cache copy status type.
Name
Number
Description
IMAGE_CACHE_COPY_STATUS_UNKNOWN
0
IMAGE_CACHE_COPY_STATUS_SKIPPED
1
IMAGE_CACHE_COPY_STATUS_PENDING
2
IMAGE_CACHE_COPY_STATUS_READY
3
CriImageCacheStatus
CriImageCacheStatus describes image cache status type.
Name
Number
Description
IMAGE_CACHE_STATUS_UNKNOWN
0
IMAGE_CACHE_STATUS_DISABLED
1
IMAGE_CACHE_STATUS_PREPARING
2
IMAGE_CACHE_STATUS_READY
3
KubespanPeerState
KubespanPeerState is KubeSpan peer current state.
Name
Number
Description
PEER_STATE_UNKNOWN
0
PEER_STATE_UP
1
PEER_STATE_DOWN
2
MachineType
MachineType represents a machine type.
Name
Number
Description
TYPE_UNKNOWN
0
TypeUnknown represents undefined node type, when there is no machine configuration yet.
TYPE_INIT
1
TypeInit type designates the first control plane node to come up. You can think of it like a bootstrap node. This node will perform the initial steps to bootstrap the cluster – generation of TLS assets, starting of the control plane, etc.
TYPE_CONTROL_PLANE
2
TypeControlPlane designates the node as a control plane member. This means it will host etcd along with the Kubernetes controlplane components such as API Server, Controller Manager, Scheduler.
TYPE_WORKER
3
TypeWorker designates the node as a worker node. This means it will be an available compute node for scheduling workloads.
NethelpersADSelect
NethelpersADSelect is ADSelect.
Name
Number
Description
AD_SELECT_STABLE
0
AD_SELECT_BANDWIDTH
1
AD_SELECT_COUNT
2
NethelpersARPAllTargets
NethelpersARPAllTargets is an ARP targets mode.
Name
Number
Description
ARP_ALL_TARGETS_ANY
0
ARP_ALL_TARGETS_ALL
1
NethelpersARPValidate
NethelpersARPValidate is an ARP Validation mode.
Name
Number
Description
ARP_VALIDATE_NONE
0
ARP_VALIDATE_ACTIVE
1
ARP_VALIDATE_BACKUP
2
ARP_VALIDATE_ALL
3
NethelpersAddressFlag
NethelpersAddressFlag wraps IFF_* constants.
Name
Number
Description
NETHELPERS_ADDRESSFLAG_UNSPECIFIED
0
ADDRESS_TEMPORARY
1
ADDRESS_NO_DAD
2
ADDRESS_OPTIMISTIC
4
ADDRESS_DAD_FAILED
8
ADDRESS_HOME
16
ADDRESS_DEPRECATED
32
ADDRESS_TENTATIVE
64
ADDRESS_PERMANENT
128
ADDRESS_MANAGEMENT_TEMP
256
ADDRESS_NO_PREFIX_ROUTE
512
ADDRESS_MC_AUTO_JOIN
1024
ADDRESS_STABLE_PRIVACY
2048
NethelpersAddressSortAlgorithm
NethelpersAddressSortAlgorithm is an internal address sorting algorithm.
Name
Number
Description
ADDRESS_SORT_ALGORITHM_V1
0
ADDRESS_SORT_ALGORITHM_V2
1
NethelpersBondMode
NethelpersBondMode is a bond mode.
Name
Number
Description
BOND_MODE_ROUNDROBIN
0
BOND_MODE_ACTIVE_BACKUP
1
BOND_MODE_XOR
2
BOND_MODE_BROADCAST
3
BOND_MODE8023_AD
4
BOND_MODE_TLB
5
BOND_MODE_ALB
6
NethelpersBondXmitHashPolicy
NethelpersBondXmitHashPolicy is a bond hash policy.
Name
Number
Description
BOND_XMIT_POLICY_LAYER2
0
BOND_XMIT_POLICY_LAYER34
1
BOND_XMIT_POLICY_LAYER23
2
BOND_XMIT_POLICY_ENCAP23
3
BOND_XMIT_POLICY_ENCAP34
4
NethelpersConntrackState
NethelpersConntrackState is a conntrack state.
Name
Number
Description
NETHELPERS_CONNTRACKSTATE_UNSPECIFIED
0
CONNTRACK_STATE_NEW
8
CONNTRACK_STATE_RELATED
4
CONNTRACK_STATE_ESTABLISHED
2
CONNTRACK_STATE_INVALID
1
NethelpersDuplex
NethelpersDuplex wraps ethtool.Duplex for YAML marshaling.
Name
Number
Description
HALF
0
FULL
1
UNKNOWN
255
NethelpersFailOverMAC
NethelpersFailOverMAC is a MAC failover mode.
Name
Number
Description
FAIL_OVER_MAC_NONE
0
FAIL_OVER_MAC_ACTIVE
1
FAIL_OVER_MAC_FOLLOW
2
NethelpersFamily
NethelpersFamily is a network family.
Name
Number
Description
NETHELPERS_FAMILY_UNSPECIFIED
0
FAMILY_INET4
2
FAMILY_INET6
10
NethelpersLACPRate
NethelpersLACPRate is a LACP rate.
Name
Number
Description
LACP_RATE_SLOW
0
LACP_RATE_FAST
1
NethelpersLinkType
NethelpersLinkType is a link type.
Name
Number
Description
LINK_NETROM
0
LINK_ETHER
1
LINK_EETHER
2
LINK_AX25
3
LINK_PRONET
4
LINK_CHAOS
5
LINK_IEE802
6
LINK_ARCNET
7
LINK_ATALK
8
LINK_DLCI
15
LINK_ATM
19
LINK_METRICOM
23
LINK_IEEE1394
24
LINK_EUI64
27
LINK_INFINIBAND
32
LINK_SLIP
256
LINK_CSLIP
257
LINK_SLIP6
258
LINK_CSLIP6
259
LINK_RSRVD
260
LINK_ADAPT
264
LINK_ROSE
270
LINK_X25
271
LINK_HWX25
272
LINK_CAN
280
LINK_PPP
512
LINK_CISCO
513
LINK_HDLC
513
LINK_LAPB
516
LINK_DDCMP
517
LINK_RAWHDLC
518
LINK_TUNNEL
768
LINK_TUNNEL6
769
LINK_FRAD
770
LINK_SKIP
771
LINK_LOOPBCK
772
LINK_LOCALTLK
773
LINK_FDDI
774
LINK_BIF
775
LINK_SIT
776
LINK_IPDDP
777
LINK_IPGRE
778
LINK_PIMREG
779
LINK_HIPPI
780
LINK_ASH
781
LINK_ECONET
782
LINK_IRDA
783
LINK_FCPP
784
LINK_FCAL
785
LINK_FCPL
786
LINK_FCFABRIC
787
LINK_FCFABRIC1
788
LINK_FCFABRIC2
789
LINK_FCFABRIC3
790
LINK_FCFABRIC4
791
LINK_FCFABRIC5
792
LINK_FCFABRIC6
793
LINK_FCFABRIC7
794
LINK_FCFABRIC8
795
LINK_FCFABRIC9
796
LINK_FCFABRIC10
797
LINK_FCFABRIC11
798
LINK_FCFABRIC12
799
LINK_IEE802TR
800
LINK_IEE80211
801
LINK_IEE80211PRISM
802
LINK_IEE80211_RADIOTAP
803
LINK_IEE8021154
804
LINK_IEE8021154MONITOR
805
LINK_PHONET
820
LINK_PHONETPIPE
821
LINK_CAIF
822
LINK_IP6GRE
823
LINK_NETLINK
824
LINK6_LOWPAN
825
LINK_VOID
65535
LINK_NONE
65534
NethelpersMatchOperator
NethelpersMatchOperator is a netfilter match operator.
Name
Number
Description
OPERATOR_EQUAL
0
OPERATOR_NOT_EQUAL
1
NethelpersNfTablesChainHook
NethelpersNfTablesChainHook wraps nftables.ChainHook for YAML marshaling.
Name
Number
Description
CHAIN_HOOK_PREROUTING
0
CHAIN_HOOK_INPUT
1
CHAIN_HOOK_FORWARD
2
CHAIN_HOOK_OUTPUT
3
CHAIN_HOOK_POSTROUTING
4
NethelpersNfTablesChainPriority
NethelpersNfTablesChainPriority wraps nftables.ChainPriority for YAML marshaling.
Name
Number
Description
NETHELPERS_NFTABLESCHAINPRIORITY_UNSPECIFIED
0
CHAIN_PRIORITY_FIRST
-2147483648
CHAIN_PRIORITY_CONNTRACK_DEFRAG
-400
CHAIN_PRIORITY_RAW
-300
CHAIN_PRIORITY_SE_LINUX_FIRST
-225
CHAIN_PRIORITY_CONNTRACK
-200
CHAIN_PRIORITY_MANGLE
-150
CHAIN_PRIORITY_NAT_DEST
-100
CHAIN_PRIORITY_FILTER
0
CHAIN_PRIORITY_SECURITY
50
CHAIN_PRIORITY_NAT_SOURCE
100
CHAIN_PRIORITY_SE_LINUX_LAST
225
CHAIN_PRIORITY_CONNTRACK_HELPER
300
CHAIN_PRIORITY_LAST
2147483647
NethelpersNfTablesVerdict
NethelpersNfTablesVerdict wraps nftables.Verdict for YAML marshaling.
Name
Number
Description
VERDICT_DROP
0
VERDICT_ACCEPT
1
NethelpersOperationalState
NethelpersOperationalState wraps rtnetlink.OperationalState for YAML marshaling.
Name
Number
Description
OPER_STATE_UNKNOWN
0
OPER_STATE_NOT_PRESENT
1
OPER_STATE_DOWN
2
OPER_STATE_LOWER_LAYER_DOWN
3
OPER_STATE_TESTING
4
OPER_STATE_DORMANT
5
OPER_STATE_UP
6
NethelpersPort
NethelpersPort wraps ethtool.Port for YAML marshaling.
Name
Number
Description
TWISTED_PAIR
0
AUI
1
MII
2
FIBRE
3
BNC
4
DIRECT_ATTACH
5
NONE
239
OTHER
255
NethelpersPrimaryReselect
NethelpersPrimaryReselect is an ARP targets mode.
Name
Number
Description
PRIMARY_RESELECT_ALWAYS
0
PRIMARY_RESELECT_BETTER
1
PRIMARY_RESELECT_FAILURE
2
NethelpersProtocol
NethelpersProtocol is a inet protocol.
Name
Number
Description
NETHELPERS_PROTOCOL_UNSPECIFIED
0
PROTOCOL_ICMP
1
PROTOCOL_TCP
6
PROTOCOL_UDP
17
PROTOCOL_ICM_PV6
58
NethelpersRouteFlag
NethelpersRouteFlag wraps RTM_F_* constants.
Name
Number
Description
NETHELPERS_ROUTEFLAG_UNSPECIFIED
0
ROUTE_NOTIFY
256
ROUTE_CLONED
512
ROUTE_EQUALIZE
1024
ROUTE_PREFIX
2048
ROUTE_LOOKUP_TABLE
4096
ROUTE_FIB_MATCH
8192
ROUTE_OFFLOAD
16384
ROUTE_TRAP
32768
NethelpersRouteProtocol
NethelpersRouteProtocol is a routing protocol.
Name
Number
Description
PROTOCOL_UNSPEC
0
PROTOCOL_REDIRECT
1
PROTOCOL_KERNEL
2
PROTOCOL_BOOT
3
PROTOCOL_STATIC
4
PROTOCOL_RA
9
PROTOCOL_MRT
10
PROTOCOL_ZEBRA
11
PROTOCOL_BIRD
12
PROTOCOL_DNROUTED
13
PROTOCOL_XORP
14
PROTOCOL_NTK
15
PROTOCOL_DHCP
16
PROTOCOL_MRTD
17
PROTOCOL_KEEPALIVED
18
PROTOCOL_BABEL
42
PROTOCOL_OPENR
99
PROTOCOL_BGP
186
PROTOCOL_ISIS
187
PROTOCOL_OSPF
188
PROTOCOL_RIP
189
PROTOCOL_EIGRP
192
NethelpersRouteType
NethelpersRouteType is a route type.
Name
Number
Description
TYPE_UNSPEC
0
TYPE_UNICAST
1
TYPE_LOCAL
2
TYPE_BROADCAST
3
TYPE_ANYCAST
4
TYPE_MULTICAST
5
TYPE_BLACKHOLE
6
TYPE_UNREACHABLE
7
TYPE_PROHIBIT
8
TYPE_THROW
9
TYPE_NAT
10
TYPE_X_RESOLVE
11
NethelpersRoutingTable
NethelpersRoutingTable is a routing table ID.
Name
Number
Description
TABLE_UNSPEC
0
TABLE_DEFAULT
253
TABLE_MAIN
254
TABLE_LOCAL
255
NethelpersScope
NethelpersScope is an address scope.
Name
Number
Description
SCOPE_GLOBAL
0
SCOPE_SITE
200
SCOPE_LINK
253
SCOPE_HOST
254
SCOPE_NOWHERE
255
NethelpersVLANProtocol
NethelpersVLANProtocol is a VLAN protocol.
Name
Number
Description
NETHELPERS_VLANPROTOCOL_UNSPECIFIED
0
VLAN_PROTOCOL8021_Q
33024
VLAN_PROTOCOL8021_AD
34984
NetworkConfigLayer
NetworkConfigLayer describes network configuration layers, with lowest priority first.
System_partitions_to_wipe lists specific system disk partitions to be reset (wiped). If system_partitions_to_wipe is empty, all the partitions are erased.
Bootstrap method makes control plane node enter etcd bootstrap mode. Node aborts etcd join sequence and creates single-node etcd cluster. If recover_etcd argument is specified, etcd is recovered from a snapshot uploaded with EtcdRecover.
EtcdRemoveMemberByID removes a member from the etcd cluster identified by member ID. This API should be used to remove members which don’t have an associated Talos node anymore. To remove a member with a running Talos node, use EtcdLeaveCluster API on the node to be removed.
EtcdRecover method uploads etcd data snapshot created with EtcdSnapshot to the node. Snapshot can be later used to recover the cluster via Bootstrap method.
EtcdSnapshot method creates etcd data snapshot (backup) from the local etcd instance and streams it back to the client. This method is available only on control plane nodes (which run etcd).
EtcdDefragment defragments etcd data directory for the current node. Defragmentation is a resource-heavy operation, so it should only run on a specific node. This method is available only on control plane nodes (which run etcd).
BlockDeviceWipeDescriptor represents a single block device to be wiped.
The device can be either a full disk (e.g. vda) or a partition (vda5).
The device should not be used in any of active volumes.
The device should not be used as a secondary (e.g. part of LVM).
The name should be submitted without /dev/ prefix. |
| method | BlockDeviceWipeDescriptor.Method | | Wipe method to use. |
| skip_volume_check | bool | | Skip the volume in use check. |
BlockDeviceWipe performs a wipe of the blockdevice (partition or disk).
The method doesn’t require a reboot, and it can only wipe blockdevices which are not being used as volumes at the moment. Wiping of volumes requires a different API. |
--cert-fingerprint strings list of server certificate fingeprints to accept (defaults to no check)
-p, --config-patch stringArray the list of config patches to apply to the local config file before sending it to the node
--dry-run check how the config change will be applied in dry-run mode
-f, --file string the filename of the updated configuration
-h, --help help for apply-config
-i, --insecure apply the config using the insecure (encrypted with no auth) maintenance service
-m, --mode auto, interactive, no-reboot, reboot, staged, try apply config mode (default auto)
--timeout duration the config will be rolled back after specified timeout (if try mode is selected) (default 1m0s)
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl bootstrap
Bootstrap the etcd cluster on the specified node.
Synopsis
When Talos cluster is created etcd service on control plane nodes enter the join loop waiting
to join etcd peers from other control plane nodes. One node should be picked as the boostrap node.
When boostrap command is issued, the node aborts join process and bootstraps etcd cluster as a single node cluster.
Other control plane nodes will join etcd cluster once Kubernetes is boostrapped on the bootstrap node.
This command should not be used when “init” type node are used.
Talos etcd cluster can be recovered from a known snapshot with ‘–recover-from=’ flag.
talosctl bootstrap [flags]
Options
-h, --help help for bootstrap
--recover-from string recover etcd cluster from the snapshot
--recover-skip-hash-check skip integrity check when recovering etcd (use when recovering from data directory copy)
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl cgroups
Retrieve cgroups usage information
Synopsis
The cgroups command fetches control group v2 (cgroupv2) usage details from the machine.
Several presets are available to focus on specific cgroup subsystems:
cpu
cpuset
io
memory
process
swap
You can specify the preset using the –preset flag.
-h, --help help for cgroups
--preset string preset name (one of: [cpu cpuset io memory process swap])
--schema-file string path to the columns schema file
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl cluster create
Creates a local docker-based or QEMU-based kubernetes cluster
talosctl cluster create [flags]
Options
--arch string cluster architecture (default "amd64")
--bad-rtc launch VM with bad RTC state (QEMU only)
--cidr string CIDR of the cluster network (IPv4, ULA network for IPv6 is derived in automated way) (default "10.5.0.0/24")
--cni-bin-path strings search path for CNI binaries (VM only) (default [/home/user/.talos/cni/bin])
--cni-bundle-url string URL to download CNI bundle from (VM only) (default "https://github.com/siderolabs/talos/releases/download/v1.9.0-alpha.3/talosctl-cni-bundle-${ARCH}.tar.gz")
--cni-cache-dir string CNI cache directory path (VM only) (default "/home/user/.talos/cni/cache")
--cni-conf-dir string CNI config directory path (VM only) (default "/home/user/.talos/cni/conf.d")
--config-injection-method string a method to inject machine config: default is HTTP server, 'metal-iso' to mount an ISO (QEMU only)
--config-patch stringArray patch generated machineconfigs (applied to all node types), use @file to read a patch from file
--config-patch-control-plane stringArray patch generated machineconfigs (applied to 'init' and 'controlplane' types)
--config-patch-worker stringArray patch generated machineconfigs (applied to 'worker' type)
--control-plane-port int control plane port (load balancer and local API port, QEMU only) (default 6443)
--controlplanes int the number of controlplanes to create (default 1)
--cpus string the share of CPUs as fraction (each control plane/VM) (default "2.0")
--cpus-workers string the share of CPUs as fraction (each worker/VM) (default "2.0")
--custom-cni-url string install custom CNI from the URL (Talos cluster)
--disable-dhcp-hostname skip announcing hostname via DHCP (QEMU only)
--disk int default limit on disk size in MB (each VM) (default 6144)
--disk-encryption-key-types stringArray encryption key types to use for disk encryption (uuid, kms) (default [uuid])
--disk-image-path string disk image to use
--disk-preallocate whether disk space should be preallocated (default true)
--dns-domain string the dns domain to use for cluster (default "cluster.local")
--docker-disable-ipv6 skip enabling IPv6 in containers (Docker only)
--docker-host-ip string Host IP to forward exposed ports to (Docker provisioner only) (default "0.0.0.0")
--encrypt-ephemeral enable ephemeral partition encryption
--encrypt-state enable state partition encryption
--endpoint string use endpoint instead of provider defaults
-p, --exposed-ports string Comma-separated list of ports/protocols to expose on init node. Ex -p <hostPort>:<containerPort>/<protocol (tcp or udp)> (Docker provisioner only)
--extra-boot-kernel-args string add extra kernel args to the initial boot from vmlinuz and initramfs (QEMU only)
--extra-disks int number of extra disks to create for each worker VM
--extra-disks-drivers strings driver for each extra disk (virtio, ide, ahci, scsi, nvme)
--extra-disks-size int default limit on disk size in MB (each VM) (default 5120)
--extra-uefi-search-paths strings additional search paths for UEFI firmware (only applies when UEFI is enabled)
-h, --help help for create
--image string the image to use (default "ghcr.io/siderolabs/talos:latest")
--init-node-as-endpoint use init node as endpoint instead of any load balancer endpoint
--initrd-path string initramfs image to use (default "_out/initramfs-${ARCH}.xz")
-i, --input-dir string location of pre-generated config files
--install-image string the installer image to use (default "ghcr.io/siderolabs/installer:latest")
--ipv4 enable IPv4 network in the cluster (default true)
--ipv6 enable IPv6 network in the cluster (QEMU provisioner only)
--ipxe-boot-script string iPXE boot script (URL) to use
--iso-path string the ISO path to use for the initial boot (VM only)
--kubeprism-port int KubePrism port (set to 0 to disable) (default 7445)
--kubernetes-version string desired kubernetes version to run (default "1.32.0")
--memory int the limit on memory usage in MB (each control plane/VM) (default 2048)
--memory-workers int the limit on memory usage in MB (each worker/VM) (default 2048)
--mount mount attach a mount to the container (Docker only)
--mtu int MTU of the cluster network (default 1500)
--nameservers strings list of nameservers to use (default [8.8.8.8,1.1.1.1,2001:4860:4860::8888,2606:4700:4700::1111])
--no-masquerade-cidrs strings list of CIDRs to exclude from NAT (QEMU provisioner only)
--registry-insecure-skip-verify strings list of registry hostnames to skip TLS verification for
--registry-mirror strings list of registry mirrors to use in format: <registry host>=<mirror URL>
--skip-injecting-config skip injecting config from embedded metadata server, write config files to current directory
--skip-k8s-node-readiness-check skip k8s node readiness checks
--skip-kubeconfig skip merging kubeconfig from the created cluster
--talos-version string the desired Talos version to generate config for (if not set, defaults to image version)
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
--usb-path string the USB stick image path to use for the initial boot (VM only)
--use-vip use a virtual IP for the controlplane endpoint instead of the loadbalancer
--user-disk strings list of disks to create for each VM in format: <mount_point1>:<size1>:<mount_point2>:<size2>
--vmlinuz-path string the compressed kernel image to use (default "_out/vmlinuz-${ARCH}")
--wait wait for the cluster to be ready before returning (default true)
--wait-timeout duration timeout to wait for the cluster to be ready (default 20m0s)
--wireguard-cidr string CIDR of the wireguard network
--with-apply-config enable apply config when the VM is starting in maintenance mode
--with-bootloader enable bootloader to load kernel and initramfs from disk image after install (default true)
--with-cluster-discovery enable cluster discovery (default true)
--with-debug enable debug in Talos config to send service logs to the console
--with-firewall string inject firewall rules into the cluster, value is default policy - accept/block (QEMU only)
--with-init-node create the cluster with an init node
--with-json-logs enable JSON logs receiver and configure Talos to send logs there
--with-kubespan enable KubeSpan system
--with-network-bandwidth int specify bandwidth restriction (in kbps) on the bridge interface when creating a qemu cluster
--with-network-chaos enable to use network chaos parameters when creating a qemu cluster
--with-network-jitter duration specify jitter on the bridge interface when creating a qemu cluster
--with-network-latency duration specify latency on the bridge interface when creating a qemu cluster
--with-network-packet-corrupt float specify percent of corrupt packets on the bridge interface when creating a qemu cluster. e.g. 50% = 0.50 (default: 0.0)
--with-network-packet-loss float specify percent of packet loss on the bridge interface when creating a qemu cluster. e.g. 50% = 0.50 (default: 0.0)
--with-network-packet-reorder float specify percent of reordered packets on the bridge interface when creating a qemu cluster. e.g. 50% = 0.50 (default: 0.0)
--with-siderolink true enables the use of siderolink agent as configuration apply mechanism. true or `wireguard` enables the agent, `tunnel` enables the agent with grpc tunneling (default none)
--with-tpm2 enable TPM2 emulation support using swtpm
--with-uefi enable UEFI on x86_64 architecture (default true)
--with-uuid-hostnames use machine UUIDs as default hostnames (QEMU only)
--workers int the number of workers to create (default 1)
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
--name string the name of the cluster (default "talos-default")
-n, --nodes strings target the specified nodes
--provisioner string Talos cluster provisioner to use (default "docker")
--state string directory path to store cluster state (default "/home/user/.talos/clusters")
SEE ALSO
talosctl cluster - A collection of commands for managing local docker-based or QEMU-based clusters
talosctl cluster destroy
Destroys a local docker-based or firecracker-based kubernetes cluster
talosctl cluster destroy [flags]
Options
-f, --force force deletion of cluster directory if there were errors
-h, --help help for destroy
--save-cluster-logs-archive-path string save cluster logs archive to the specified file on destroy
--save-support-archive-path string save support archive to the specified file on destroy
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
--name string the name of the cluster (default "talos-default")
-n, --nodes strings target the specified nodes
--provisioner string Talos cluster provisioner to use (default "docker")
--state string directory path to store cluster state (default "/home/user/.talos/clusters")
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl cluster - A collection of commands for managing local docker-based or QEMU-based clusters
talosctl cluster show
Shows info about a local provisioned kubernetes cluster
talosctl cluster show [flags]
Options
-h, --help help for show
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
--name string the name of the cluster (default "talos-default")
-n, --nodes strings target the specified nodes
--provisioner string Talos cluster provisioner to use (default "docker")
--state string directory path to store cluster state (default "/home/user/.talos/clusters")
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl cluster - A collection of commands for managing local docker-based or QEMU-based clusters
talosctl cluster
A collection of commands for managing local docker-based or QEMU-based clusters
Options
-h, --help help for cluster
--name string the name of the cluster (default "talos-default")
--provisioner string Talos cluster provisioner to use (default "docker")
--state string directory path to store cluster state (default "/home/user/.talos/clusters")
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
Output shell completion code for the specified shell (bash, fish or zsh)
Synopsis
Output shell completion code for the specified shell (bash, fish or zsh).
The shell code must be evaluated to provide interactive
completion of talosctl commands. This can be done by sourcing it from
the .bash_profile.
Note for zsh users: [1] zsh completions are only supported in versions of zsh >= 5.2
talosctl completion SHELL [flags]
Examples
# Installing bash completion on macOS using homebrew
## If running Bash 3.2 included with macOS
brew install bash-completion
## or, if running Bash 4.1+
brew install bash-completion@2
## If talosctl is installed via homebrew, this should start working immediately.
## If you've installed via other means, you may need add the completion to your completion directory
talosctl completion bash > $(brew --prefix)/etc/bash_completion.d/talosctl
# Installing bash completion on Linux
## If bash-completion is not installed on Linux, please install the 'bash-completion' package
## via your distribution's package manager.
## Load the talosctl completion code for bash into the current shell
source <(talosctl completion bash)
## Write bash completion code to a file and source if from .bash_profile
talosctl completion bash > ~/.talos/completion.bash.inc
printf "
# talosctl shell completion
source '$HOME/.talos/completion.bash.inc'
" >> $HOME/.bash_profile
source $HOME/.bash_profile
# Load the talosctl completion code for fish[1] into the current shell
talosctl completion fish | source
# Set the talosctl completion code for fish[1] to autoload on startup
talosctl completion fish > ~/.config/fish/completions/talosctl.fish
# Load the talosctl completion code for zsh[1] into the current shell
source <(talosctl completion zsh)
# Set the talosctl completion code for zsh[1] to autoload on startup
talosctl completion zsh > "${fpath[1]}/_talosctl"
Options
-h, --help help for completion
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl config add
Add a new context
talosctl config add <context> [flags]
Options
--ca string the path to the CA certificate
--crt string the path to the certificate
-h, --help help for add
--key string the path to the key
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl config - Manage the client configuration file (talosconfig)
talosctl config context
Set the current context
talosctl config context <context> [flags]
Options
-h, --help help for context
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl config - Manage the client configuration file (talosconfig)
talosctl config contexts
List defined contexts
talosctl config contexts [flags]
Options
-h, --help help for contexts
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl config - Manage the client configuration file (talosconfig)
talosctl config endpoint
Set the endpoint(s) for the current context
talosctl config endpoint <endpoint>... [flags]
Options
-h, --help help for endpoint
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl config - Manage the client configuration file (talosconfig)
talosctl config info
Show information about the current context
talosctl config info [flags]
Options
-h, --help help for info
-o, --output string output format (json|yaml|text). Default text. (default "text")
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl config - Manage the client configuration file (talosconfig)
talosctl config merge
Merge additional contexts from another client configuration file
Synopsis
Contexts with the same name are renamed while merging configs.
talosctl config merge <from> [flags]
Options
-h, --help help for merge
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl config - Manage the client configuration file (talosconfig)
talosctl config new
Generate a new client configuration file
talosctl config new [<path>] [flags]
Options
--crt-ttl duration certificate TTL (default 8760h0m0s)
-h, --help help for new
--roles strings roles (default [os:admin])
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl config - Manage the client configuration file (talosconfig)
talosctl config node
Set the node(s) for the current context
talosctl config node <endpoint>... [flags]
Options
-h, --help help for node
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl config - Manage the client configuration file (talosconfig)
talosctl config remove
Remove contexts
talosctl config remove <context> [flags]
Options
--dry-run dry run
-h, --help help for remove
-y, --noconfirm do not ask for confirmation
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl config - Manage the client configuration file (talosconfig)
talosctl config
Manage the client configuration file (talosconfig)
Options
-h, --help help for config
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
-h, --help help for kubernetes
--mode string conformance test mode: [fast, certified] (default "fast")
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
-h, --help help for containers
-k, --kubernetes use the k8s.io containerd namespace
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl copy
Copy data out from the node
Synopsis
Creates an .tar.gz archive at the node starting at and
streams it back to the client.
If ‘-’ is given for , archive is written to stdout.
Otherwise archive is extracted to which should be an empty directory or
talosctl creates a directory if doesn’t exist. Command doesn’t preserve
ownership and access mode for the files in extract mode, while streamed .tar archive
captures ownership and permission bits.
talosctl copy <src-path> -|<local-path> [flags]
Options
-h, --help help for copy
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl dashboard
Cluster dashboard with node overview, logs and real-time metrics
Synopsis
Provide a text-based UI to navigate node overview, logs and real-time metrics.
Keyboard shortcuts:
h, <Left> - switch one node to the left
l, <Right> - switch one node to the right
j, <Down> - scroll logs/process list down
k, <Up> - scroll logs/process list up
<C-d> - scroll logs/process list half page down
<C-u> - scroll logs/process list half page up
<C-f> - scroll logs/process list one page down
<C-b> - scroll logs/process list one page up
talosctl dashboard [flags]
Options
-h, --help help for dashboard
-d, --update-interval duration interval between updates (default 3s)
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl dmesg
Retrieve kernel logs
talosctl dmesg [flags]
Options
-f, --follow specify if the kernel log should be streamed
-h, --help help for dmesg
--tail specify if only new messages should be sent (makes sense only when combined with --follow)
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl edit
Edit a resource from the default editor.
Synopsis
The edit command allows you to directly edit any API resource
you can retrieve via the command line tools.
It will open the editor defined by your TALOS_EDITOR,
or EDITOR environment variables, or fall back to ‘vi’ for Linux
or ’notepad’ for Windows.
talosctl edit <type> [<id>] [flags]
Options
--dry-run do not apply the change after editing and print the change summary instead
-h, --help help for edit
-m, --mode auto, no-reboot, reboot, staged, try apply config mode (default auto)
--namespace string resource namespace (default is to use default namespace per resource)
--timeout duration the config will be rolled back after specified timeout (if try mode is selected) (default 1m0s)
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl etcd alarm disarm
Disarm the etcd alarms for the node.
talosctl etcd alarm disarm [flags]
Options
-h, --help help for disarm
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
Defragmentation is a maintenance operation that releases unused space from the etcd database file.
Defragmentation is a resource heavy operation and should be performed only when necessary on a single node at a time.
talosctl etcd defrag [flags]
Options
-h, --help help for defrag
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
Use this command only if you want to remove a member which is in broken state.
If there is no access to the node, or the node can’t access etcd to call etcd leave.
Always prefer etcd leave over this command.
talosctl etcd remove-member <member ID> [flags]
Options
-h, --help help for remove-member
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
Returns the status of etcd member on the node, use multiple nodes to get status of all members.
talosctl etcd status [flags]
Options
-h, --help help for status
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
--actor-id string filter events by the specified actor ID (default is no filter)
--duration duration show events for the past duration interval (one second resolution, default is to show no history)
-h, --help help for events
--since string show events after the specified event ID (default is to show no history)
--tail int32 show specified number of past events (use -1 to show full history, default is to show no history)
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl gen ca
Generates a self-signed X.509 certificate authority
talosctl gen ca [flags]
Options
-h, --help help for ca
--hours int the hours from now on which the certificate validity period ends (default 87600)
--organization string X.509 distinguished name for the Organization
--rsa generate in RSA format
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-f, --force will overwrite existing files
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl gen - Generate CAs, certificates, and private keys
talosctl gen config
Generates a set of configuration files for Talos cluster
Synopsis
The cluster endpoint is the URL for the Kubernetes API. If you decide to use
a control plane node, common in a single node control plane setup, use port 6443 as
this is the port that the API server binds to on every control plane node. For an HA
setup, usually involving a load balancer, use the IP and port of the load balancer.
talosctl gen config <cluster name> <cluster endpoint> [flags]
Options
--additional-sans strings additional Subject-Alt-Names for the APIServer certificate
--config-patch stringArray patch generated machineconfigs (applied to all node types), use @file to read a patch from file
--config-patch-control-plane stringArray patch generated machineconfigs (applied to 'init' and 'controlplane' types)
--config-patch-worker stringArray patch generated machineconfigs (applied to 'worker' type)
--dns-domain string the dns domain to use for cluster (default "cluster.local")
-h, --help help for config
--install-disk string the disk to install to (default "/dev/sda")
--install-image string the image used to perform an installation (default "ghcr.io/siderolabs/installer:latest")
--kubernetes-version string desired kubernetes version to run (default "1.32.0")
-o, --output string destination to output generated files. when multiple output types are specified, it must be a directory. for a single output type, it must either be a file path, or "-" for stdout
-t, --output-types strings types of outputs to be generated. valid types are: ["controlplane" "worker" "talosconfig"] (default [controlplane,worker,talosconfig])
-p, --persist the desired persist value for configs (default true)
--registry-mirror strings list of registry mirrors to use in format: <registry host>=<mirror URL>
--talos-version string the desired Talos version to generate config for (backwards compatibility, e.g. v0.8)
--version string the desired machine config version to generate (default "v1alpha1")
--with-cluster-discovery enable cluster discovery feature (default true)
--with-docs renders all machine configs adding the documentation for each field (default true)
--with-examples renders all machine configs with the commented examples (default true)
--with-kubespan enable KubeSpan feature
--with-secrets string use a secrets file generated using 'gen secrets'
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-f, --force will overwrite existing files
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl gen - Generate CAs, certificates, and private keys
talosctl gen crt
Generates an X.509 Ed25519 certificate
talosctl gen crt [flags]
Options
--ca string path to the PEM encoded CERTIFICATE
--csr string path to the PEM encoded CERTIFICATE REQUEST
-h, --help help for crt
--hours int the hours from now on which the certificate validity period ends (default 24)
--name string the basename of the generated file
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-f, --force will overwrite existing files
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl gen - Generate CAs, certificates, and private keys
talosctl gen csr
Generates a CSR using an Ed25519 private key
talosctl gen csr [flags]
Options
-h, --help help for csr
--ip string generate the certificate for this IP address
--key string path to the PEM encoded EC or RSA PRIVATE KEY
--roles strings roles (default [os:admin])
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-f, --force will overwrite existing files
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl gen - Generate CAs, certificates, and private keys
talosctl gen key
Generates an Ed25519 private key
talosctl gen key [flags]
Options
-h, --help help for key
--name string the basename of the generated file
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-f, --force will overwrite existing files
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl gen - Generate CAs, certificates, and private keys
talosctl gen keypair
Generates an X.509 Ed25519 key pair
talosctl gen keypair [flags]
Options
-h, --help help for keypair
--ip string generate the certificate for this IP address
--organization string X.509 distinguished name for the Organization
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-f, --force will overwrite existing files
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl gen - Generate CAs, certificates, and private keys
talosctl gen secrets
Generates a secrets bundle file which can later be used to generate a config
talosctl gen secrets [flags]
Options
--from-controlplane-config string use the provided controlplane Talos machine configuration as input
-p, --from-kubernetes-pki string use a Kubernetes PKI directory (e.g. /etc/kubernetes/pki) as input
-h, --help help for secrets
-t, --kubernetes-bootstrap-token string use the provided bootstrap token as input
-o, --output-file string path of the output file (default "secrets.yaml")
--talos-version string the desired Talos version to generate secrets bundle for (backwards compatibility, e.g. v0.8)
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-f, --force will overwrite existing files
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl gen - Generate CAs, certificates, and private keys
talosctl gen secureboot database
Generates a UEFI database to enroll the signing certificate
talosctl gen secureboot database [flags]
Options
--enrolled-certificate string path to the certificate to enroll (default "_out/uki-signing-cert.pem")
-h, --help help for database
--include-well-known-uefi-certs include well-known UEFI (Microsoft) certificates in the database
--signing-certificate string path to the certificate used to sign the database (default "_out/uki-signing-cert.pem")
--signing-key string path to the key used to sign the database (default "_out/uki-signing-key.pem")
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-f, --force will overwrite existing files
-n, --nodes strings target the specified nodes
-o, --output string path to the directory storing the generated files (default "_out")
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
Generates a key which is used to sign TPM PCR values
talosctl gen secureboot pcr [flags]
Options
-h, --help help for pcr
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-f, --force will overwrite existing files
-n, --nodes strings target the specified nodes
-o, --output string path to the directory storing the generated files (default "_out")
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
Generates a certificate which is used to sign boot assets (UKI)
talosctl gen secureboot uki [flags]
Options
--common-name string common name for the certificate (default "Test UKI Signing Key")
-h, --help help for uki
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-f, --force will overwrite existing files
-n, --nodes strings target the specified nodes
-o, --output string path to the directory storing the generated files (default "_out")
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
-h, --help help for secureboot
-o, --output string path to the directory storing the generated files (default "_out")
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-f, --force will overwrite existing files
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl gen - Generate CAs, certificates, and private keys
-f, --force will overwrite existing files
-h, --help help for gen
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl gen ca - Generates a self-signed X.509 certificate authority
talosctl gen config - Generates a set of configuration files for Talos cluster
Get a specific resource or list of resources (use ’talosctl get rd’ to see all available resource types).
Synopsis
Similar to ‘kubectl get’, ’talosctl get’ returns a set of resources from the OS.
To get a list of all available resource definitions, issue ’talosctl get rd’
talosctl get <type> [<id>] [flags]
Options
-h, --help help for get
-i, --insecure get resources using the insecure (encrypted with no auth) maintenance service
--namespace string resource namespace (default is to use default namespace per resource)
-o, --output string output mode (json, table, yaml, jsonpath) (default "table")
-w, --watch watch resource changes
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl health
Check cluster health
talosctl health [flags]
Options
--control-plane-nodes strings specify IPs of control plane nodes
-h, --help help for health
--init-node string specify IPs of init node
--k8s-endpoint string use endpoint instead of kubeconfig default
--run-e2e run Kubernetes e2e test
--server run server-side check (default true)
--wait-timeout duration timeout to wait for the cluster to be ready (default 20m0s)
--worker-nodes strings specify IPs of worker nodes
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl image cache-create
Create a cache of images in OCI format into a directory
Synopsis
Create a cache of images in OCI format into a directory
talosctl image cache-create [flags]
Examples
talosctl images cache-create --images=ghcr.io/siderolabs/kubelet:1.32.0 --image-cache-path=/tmp/talos-image-cache
Alternatively, stdin can be piped to the command:
talosctl images default | talosctl images cache-create --image-cache-path=/tmp/talos-image-cache --images=-
Options
--force force overwrite of existing image cache
-h, --help help for cache-create
--image-cache-path string directory to save the image cache in OCI format
--image-layer-cache-path string directory to save the image layer cache
--images strings images to cache
--insecure allow insecure registries
--platform string platform to use for the cache (default "linux/amd64")
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
--namespace system namespace to use: system (etcd and kubelet images) or `cri` for all Kubernetes workloads (default "cri")
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
--namespace system namespace to use: system (etcd and kubelet images) or `cri` for all Kubernetes workloads (default "cri")
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
--namespace system namespace to use: system (etcd and kubelet images) or `cri` for all Kubernetes workloads (default "cri")
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
--namespace system namespace to use: system (etcd and kubelet images) or `cri` for all Kubernetes workloads (default "cri")
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
-h, --help help for image
--namespace system namespace to use: system (etcd and kubelet images) or `cri` for all Kubernetes workloads (default "cri")
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl inject serviceaccount --roles="os:admin" -f deployment.yaml > deployment-injected.yaml
Alternatively, stdin can be piped to the command:
cat deployment.yaml | talosctl inject serviceaccount --roles="os:admin" -f - > deployment-injected.yaml
Options
-f, --file string file with Kubernetes manifests to be injected with ServiceAccount
-h, --help help for serviceaccount
-r, --roles strings roles to add to the generated ServiceAccount manifests (default [os:reader])
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl inject - Inject Talos API resources into Kubernetes manifests
talosctl inject
Inject Talos API resources into Kubernetes manifests
Options
-h, --help help for inject
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
-h, --help help for dependencies
--with-resources display live resource information with dependencies
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
Download the admin kubeconfig from the node.
If merge flag is defined, config will be merged with ~/.kube/config or [local-path] if specified.
Otherwise kubeconfig will be written to PWD or [local-path] if specified.
talosctl kubeconfig [local-path] [flags]
Options
-f, --force Force overwrite of kubeconfig if already present, force overwrite on kubeconfig merge
--force-context-name string Force context name for kubeconfig merge
-h, --help help for kubeconfig
-m, --merge Merge with existing kubeconfig (default true)
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl list
Retrieve a directory listing
talosctl list [path] [flags]
Options
-d, --depth int32 maximum recursion depth (default 1)
-h, --help help for list
-H, --humanize humanize size and time in the output
-l, --long display additional file details
-r, --recurse recurse into subdirectories
-t, --type strings filter by specified types:
f regular file
d directory
l, L symbolic link
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl logs
Retrieve logs for a service
talosctl logs <service name> [flags]
Options
-f, --follow specify if the logs should be streamed
-h, --help help for logs
-k, --kubernetes use the k8s.io containerd namespace
--tail int32 lines of log file to display (default is to show from the beginning) (default -1)
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl machineconfig gen
Generates a set of configuration files for Talos cluster
Synopsis
The cluster endpoint is the URL for the Kubernetes API. If you decide to use
a control plane node, common in a single node control plane setup, use port 6443 as
this is the port that the API server binds to on every control plane node. For an HA
setup, usually involving a load balancer, use the IP and port of the load balancer.
talosctl machineconfig gen <cluster name> <cluster endpoint> [flags]
Options
-h, --help help for gen
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
-h, --help help for patch
-o, --output string output destination. if not specified, output will be printed to stdout
-p, --patch stringArray patch generated machineconfigs (applied to all node types), use @file to read a patch from file
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
-h, --help help for memory
-v, --verbose display extended memory statistics
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl meta delete
Delete a key from the META partition.
talosctl meta delete key [flags]
Options
-h, --help help for delete
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-i, --insecure write|delete meta using the insecure (encrypted with no auth) maintenance service
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl meta - Write and delete keys in the META partition
talosctl meta write
Write a key-value pair to the META partition.
talosctl meta write key value [flags]
Options
-h, --help help for write
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-i, --insecure write|delete meta using the insecure (encrypted with no auth) maintenance service
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl meta - Write and delete keys in the META partition
talosctl meta
Write and delete keys in the META partition
Options
-h, --help help for meta
-i, --insecure write|delete meta using the insecure (encrypted with no auth) maintenance service
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl netstat
Show network connections and sockets
Synopsis
Show network connections and sockets.
You can pass an optional argument to view a specific pod’s connections.
To do this, format the argument as “namespace/pod”.
Note that only pods with a pod network namespace are allowed.
If you don’t pass an argument, the command will show host connections.
talosctl netstat [flags]
Options
-a, --all display all sockets states (default: connected)
-x, --extend show detailed socket information
-h, --help help for netstat
-4, --ipv4 display only ipv4 sockets
-6, --ipv6 display only ipv6 sockets
-l, --listening display listening server sockets
-k, --pods show sockets used by Kubernetes pods
-p, --programs show process using socket
-w, --raw display only RAW sockets
-t, --tcp display only TCP sockets
-o, --timers display timers
-u, --udp display only UDP sockets
-U, --udplite display only UDPLite sockets
-v, --verbose display sockets of all supported transport protocols
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl patch
Update field(s) of a resource using a JSON patch.
talosctl patch <type> [<id>] [flags]
Options
--dry-run print the change summary and patch preview without applying the changes
-h, --help help for patch
-m, --mode auto, no-reboot, reboot, staged, try apply config mode (default auto)
--namespace string resource namespace (default is to use default namespace per resource)
-p, --patch stringArray the patch to be applied to the resource file, use @file to read a patch from file.
--patch-file string a file containing a patch to be applied to the resource.
--timeout duration the config will be rolled back after specified timeout (if try mode is selected) (default 1m0s)
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl pcap
Capture the network packets from the node.
Synopsis
The command launches packet capture on the node and streams back the packets as raw pcap file.
Default behavior is to decode the packets with internal decoder to stdout:
talosctl pcap -i eth0
Raw pcap file can be saved with --output flag:
talosctl pcap -i eth0 --output eth0.pcap
Output can be piped to tcpdump:
talosctl pcap -i eth0 -o - | tcpdump -vvv -r -
BPF filter can be applied, but it has to compiled to BPF instructions first using tcpdump.
Correct link type should be specified for the tcpdump: EN10MB for Ethernet links and RAW
for e.g. Wireguard tunnels:
talosctl pcap -i eth0 --bpf-filter "$(tcpdump -dd -y EN10MB 'tcp and dst port 80')"
talosctl pcap -i kubespan --bpf-filter "$(tcpdump -dd -y RAW 'port 50000')"
As packet capture is transmitted over the network, it is recommended to filter out the Talos API traffic,
e.g. by excluding packets with the port 50000.
talosctl pcap [flags]
Options
--bpf-filter string bpf filter to apply, tcpdump -dd format
--duration duration duration of the capture
-h, --help help for pcap
-i, --interface string interface name to capture packets on (default "eth0")
-o, --output string if not set, decode packets to stdout; if set write raw pcap data to a file, use '-' for stdout
--promiscuous put interface into promiscuous mode
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl processes
List running processes
talosctl processes [flags]
Options
-h, --help help for processes
-s, --sort string Column to sort output by. [rss|cpu] (default "rss")
-w, --watch Stream running processes
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl read
Read a file on the machine
talosctl read <path> [flags]
Options
-h, --help help for read
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl reboot
Reboot a node
talosctl reboot [flags]
Options
--debug debug operation from kernel logs. --wait is set to true when this flag is set
-h, --help help for reboot
-m, --mode string select the reboot mode: "default", "powercycle" (skips kexec) (default "default")
--timeout duration time to wait for the operation is complete if --debug or --wait is set (default 30m0s)
--wait wait for the operation to complete, tracking its progress. always set to true when --debug is set (default true)
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl reset
Reset a node
talosctl reset [flags]
Options
--debug debug operation from kernel logs. --wait is set to true when this flag is set
--graceful if true, attempt to cordon/drain node and leave etcd (if applicable) (default true)
-h, --help help for reset
--insecure reset using the insecure (encrypted with no auth) maintenance service
--reboot if true, reboot the node after resetting instead of shutting down
--system-labels-to-wipe strings if set, just wipe selected system disk partitions by label but keep other partitions intact
--timeout duration time to wait for the operation is complete if --debug or --wait is set (default 30m0s)
--user-disks-to-wipe strings if set, wipes defined devices in the list
--wait wait for the operation to complete, tracking its progress. always set to true when --debug is set (default true)
--wipe-mode all, system-disk, user-disks disk reset mode (default all)
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl restart
Restart a process
talosctl restart <id> [flags]
Options
-h, --help help for restart
-k, --kubernetes use the k8s.io containerd namespace
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl rollback
Rollback a node to the previous installation
talosctl rollback [flags]
Options
-h, --help help for rollback
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl rotate-ca
Rotate cluster CAs (Talos and Kubernetes APIs).
Synopsis
The command can rotate both Talos and Kubernetes root CAs (for the API).
By default both CAs are rotated, but you can choose to rotate just one or another.
The command starts by generating new CAs, and gracefully applying it to the cluster.
For Kubernetes, the command only rotates the API server issuing CA, and other Kubernetes
PKI can be rotated by applying machine config changes to the controlplane nodes.
talosctl rotate-ca [flags]
Options
--control-plane-nodes strings specify IPs of control plane nodes
--dry-run dry-run mode (no changes to the cluster) (default true)
-h, --help help for rotate-ca
--init-node string specify IPs of init node
--k8s-endpoint string use endpoint instead of kubeconfig default
--kubernetes rotate Kubernetes API CA (default true)
-o, --output talosconfig path to the output new talosconfig (default "talosconfig")
--talos rotate Talos API CA (default true)
--with-docs patch all machine configs adding the documentation for each field (default true)
--with-examples patch all machine configs with the commented examples (default true)
--worker-nodes strings specify IPs of worker nodes
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl service
Retrieve the state of a service (or all services), control service state
Synopsis
Service control command. If run without arguments, lists all the services and their state.
If service ID is specified, default action ‘status’ is executed which shows status of a single list service.
With actions ‘start’, ‘stop’, ‘restart’, service state is updated respectively.
talosctl service [<id> [start|stop|restart|status]] [flags]
Options
-h, --help help for service
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl shutdown
Shutdown a node
talosctl shutdown [flags]
Options
--debug debug operation from kernel logs. --wait is set to true when this flag is set
--force if true, force a node to shutdown without a cordon/drain
-h, --help help for shutdown
--timeout duration time to wait for the operation is complete if --debug or --wait is set (default 30m0s)
--wait wait for the operation to complete, tracking its progress. always set to true when --debug is set (default true)
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl stats
Get container stats
talosctl stats [flags]
Options
-h, --help help for stats
-k, --kubernetes use the k8s.io containerd namespace
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl support
Dump debug information about the cluster
Synopsis
Generated bundle contains the following debug information:
For each node:
Kernel logs.
All Talos internal services logs.
All kube-system pods logs.
Talos COSI resources without secrets.
COSI runtime state graph.
Processes snapshot.
IO pressure snapshot.
Mounts list.
PCI devices info.
Talos version.
For the cluster:
Kubernetes nodes and kube-system pods manifests.
talosctl support [flags]
Options
-h, --help help for support
-w, --num-workers int number of workers per node (default 1)
-O, --output string output file to write support archive to
-v, --verbose verbose output
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl time
Gets current server time
talosctl time [--check server] [flags]
Options
-c, --check string checks server time against specified ntp server
-h, --help help for time
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl upgrade
Upgrade Talos on the target node
talosctl upgrade [flags]
Options
--debug debug operation from kernel logs. --wait is set to true when this flag is set
-f, --force force the upgrade (skip checks on etcd health and members, might lead to data loss)
-h, --help help for upgrade
-i, --image string the container image to use for performing the install (default "ghcr.io/siderolabs/installer:v1.9.0-alpha.3")
--insecure upgrade using the insecure (encrypted with no auth) maintenance service
-m, --reboot-mode string select the reboot mode during upgrade. Mode "powercycle" bypasses kexec. Valid values are: ["default" "powercycle"]. (default "default")
-s, --stage stage the upgrade to perform it after a reboot
--timeout duration time to wait for the operation is complete if --debug or --wait is set (default 30m0s)
--wait wait for the operation to complete, tracking its progress. always set to true when --debug is set (default true)
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl upgrade-k8s
Upgrade Kubernetes control plane in the Talos cluster.
Synopsis
Command runs upgrade of Kubernetes control plane components between specified versions.
talosctl upgrade-k8s [flags]
Options
--apiserver-image string kube-apiserver image to use (default "registry.k8s.io/kube-apiserver")
--controller-manager-image string kube-controller-manager image to use (default "registry.k8s.io/kube-controller-manager")
--dry-run skip the actual upgrade and show the upgrade plan instead
--endpoint string the cluster control plane endpoint
--from string the Kubernetes control plane version to upgrade from
-h, --help help for upgrade-k8s
--kubelet-image string kubelet image to use (default "ghcr.io/siderolabs/kubelet")
--pre-pull-images pre-pull images before upgrade (default true)
--proxy-image string kube-proxy image to use (default "registry.k8s.io/kube-proxy")
--scheduler-image string kube-scheduler image to use (default "registry.k8s.io/kube-scheduler")
--to string the Kubernetes control plane version to upgrade to (default "1.32.0")
--upgrade-kubelet upgrade kubelet service (default true)
--with-docs patch all machine configs adding the documentation for each field (default true)
--with-examples patch all machine configs with the commented examples (default true)
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
-a, --all write counts for all files, not just directories
-d, --depth int32 maximum recursion depth
-h, --help help for usage
-H, --humanize humanize size and time in the output
-t, --threshold int threshold exclude entries smaller than SIZE if positive, or entries greater than SIZE if negative
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl validate
Validate config
talosctl validate [flags]
Options
-c, --config string the path of the config file
-h, --help help for validate
-m, --mode string the mode to validate the config for (valid values are metal, cloud, and container)
--strict treat validation warnings as errors
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl version
Prints the version
talosctl version [flags]
Options
--client Print client version only
-h, --help help for version
-i, --insecure use Talos maintenance mode API
--short Print the short version
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl wipe disk
Wipe a block device (disk or partition) which is not used as a volume
Synopsis
Wipe a block device (disk or partition) which is not used as a volume.
Use device names as arguments, for example: vda or sda5.
talosctl wipe disk <device names>... [flags]
Options
-h, --help help for disk
--method string wipe method to use [FAST ZEROES] (default "FAST")
Options inherited from parent commands
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl wipe disk - Wipe a block device (disk or partition) which is not used as a volume
talosctl
A CLI for out-of-band management of Kubernetes nodes created by Talos
Options
--cluster string Cluster to connect to if a proxy endpoint is used.
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-h, --help help for talosctl
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file. Defaults to 'TALOSCONFIG' env variable if set, otherwise '$HOME/.talos/config' and '/var/run/secrets/talos.dev/config' in order.
Talos Linux machine is fully configured via a single YAML file called machine configuration.
The file might contain one or more configuration documents separated by --- (three dashes) lines.
At the moment, majority of the configuration options are within the v1alpha1 document, so
this is the only mandatory document in the configuration file.
Configuration documents might be named (contain a name: field) or unnamed.
Unnamed documents can be supplied to the machine configuration file only once, while named documents can be supplied multiple times with unique names.
The v1alpha1 document has its own (legacy) structure, while every other document has the following set of fields:
apiVersion: v1alpha1 # version of the documentkind: NetworkRuleConfig # type of documentname: rule1 # only for named documents
This section contains the configuration reference, to learn more about Talos Linux machine configuration management, please see:
Package block provides block device and volume configuration documents.
5.3.1.1 - VolumeConfig
VolumeConfig is a volume configuration document.
apiVersion: v1alpha1
kind: VolumeConfig
name: EPHEMERAL # Name of the volume.# The provisioning describes how the volume is provisioned.provisioning:
# The disk selector expression.diskSelector:
match: disk.transport == "nvme" # The Common Expression Language (CEL) expression to match the disk.maxSize: 50GiB # The maximum size of the volume, if not specified the volume can grow to the size of the# # The minimum size of the volume.# minSize: 2.5GiB
ExtensionServiceConfig is a extensionserviceconfig document.
apiVersion: v1alpha1
kind: ExtensionServiceConfig
name: nut-client # Name of the extension service.# The config files for the extension service.configFiles:
- content: MONITOR ${upsmonHost} 1 remote username password # The content of the extension service config file.mountPath: /usr/local/etc/nut/upsmon.conf # The mount path of the extension service config file.# The environment for the extension service.environment:
- NUT_UPS=upsname
KubeSpanEndpointsConfig is a config document to configure KubeSpan endpoints.
apiVersion: v1alpha1
kind: KubeSpanEndpointsConfig
# A list of extra Wireguard endpoints to announce from this machine.extraAnnouncedEndpoints:
- 192.168.13.46:52000
Field
Type
Description
Value(s)
extraAnnouncedEndpoints
[]AddrPort
A list of extra Wireguard endpoints to announce from this machine. Talos automatically adds endpoints based on machine addresses, public IP, etc. This field allows to add extra endpoints which are managed outside of Talos, e.g. NAT mapping.
5.3.4.2 - NetworkDefaultActionConfig
NetworkDefaultActionConfig is a ingress firewall default action configuration document.
apiVersion: v1alpha1
kind: NetworkDefaultActionConfig
ingress: accept # Default action for all not explicitly configured ingress traffic: accept or block.
Field
Type
Description
Value(s)
ingress
DefaultAction
Default action for all not explicitly configured ingress traffic: accept or block.
accept block
5.3.4.3 - NetworkRuleConfig
NetworkRuleConfig is a network firewall rule config document.
apiVersion: v1alpha1
kind: NetworkRuleConfig
name: ingress-apid # Name of the config document.# Port selector defines which ports and protocols on the host are affected by the rule.portSelector:
# Ports defines a list of port ranges or single ports.ports:
- 50000protocol: tcp # Protocol defines traffic protocol (e.g. TCP or UDP).# Ingress defines which source subnets are allowed to access the host ports/protocols defined by the `portSelector`.ingress:
- subnet: 192.168.0.0/16 # Subnet defines a source subnet.
apiVersion: v1alpha1
kind: EventSinkConfig
endpoint: 192.168.10.3:3247# The endpoint for the event sink as 'host:port'.
Field
Type
Description
Value(s)
endpoint
string
The endpoint for the event sink as ‘host:port’. Show example(s)
endpoint: 10.3.7.3:2810
5.3.5.2 - KmsgLogConfig
KmsgLogConfig is a event sink config document.
apiVersion: v1alpha1
kind: KmsgLogConfig
name: remote-log # Name of the config document.url: tcp://192.168.3.7:3478/ # The URL encodes the log destination.
Field
Type
Description
Value(s)
name
string
Name of the config document.
url
URL
The URL encodes the log destination.The scheme must be tcp:// or udp://. The path must be empty. The port is required.Show example(s)
url: udp://10.3.7.3:2810
5.3.5.3 - WatchdogTimerConfig
WatchdogTimerConfig is a watchdog timer config document.
apiVersion: v1alpha1
kind: WatchdogTimerConfig
device: /dev/watchdog0 # Path to the watchdog device.timeout: 2m0s # Timeout for the watchdog.
Field
Type
Description
Value(s)
device
string
Path to the watchdog device. Show example(s)
device: /dev/watchdog0
timeout
Duration
Timeout for the watchdog. If Talos is unresponsive for this duration, the watchdog will reset the system.
Default value is 1 minute, minimum value is 10 seconds.
TrustedRootsConfig allows to configure additional trusted CA roots.
apiVersion: v1alpha1
kind: TrustedRootsConfig
name: my-enterprise-ca # Name of the config document.certificates: | # List of additional trusted certificate authorities (as PEM-encoded certificates). -----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----
Field
Type
Description
Value(s)
name
string
Name of the config document.
certificates
string
List of additional trusted certificate authorities (as PEM-encoded certificates). Multiple certificates can be provided in a single config document, separated by newline characters.
SideroLinkConfig is a SideroLink connection machine configuration document.
apiVersion: v1alpha1
kind: SideroLinkConfig
apiUrl: https://siderolink.api/join?token=secret # SideroLink API URL to connect to.
Field
Type
Description
Value(s)
apiUrl
URL
SideroLink API URL to connect to. Show example(s)
apiUrl: https://siderolink.api/join?token=secret
5.3.8 - v1alpha1
Package v1alpha1 contains definition of the v1alpha1 configuration document.
Even though the machine configuration in Talos Linux is multi-document, at the moment
this configuration document contains most of the configuration options.
It is expected that new configuration options will be added as new documents, and existing ones
migrated to their own documents.
5.3.8.1 - Config
Config defines the v1alpha1.Config Talos machine configuration document.
version: v1alpha1
machine: # ...cluster: # ...
Field
Type
Description
Value(s)
version
string
Indicates the schema used to decode the contents.
v1alpha1
debug
bool
Enable verbose logging to the console.All system containers logs will flow into serial console.
Note: To avoid breaking Talos bootstrap flow enable this option only if serial console can handle high message throughput.
MachineConfig represents the machine-specific config values.
machine:
type: controlplane
# InstallConfig represents the installation options for preparing a node.install:
disk: /dev/sda # The disk used for installations.# Allows for supplying extra kernel args via the bootloader.extraKernelArgs:
- console=ttyS1
- panic=10
image: ghcr.io/siderolabs/installer:latest # Allows for supplying the image used to perform the installation.wipe: false# Indicates if the installation disk should be wiped at installation time.# # Look up disk using disk attributes like model, size, serial and others.# diskSelector:# size: 4GB # Disk size.# model: WDC* # Disk model `/sys/block/<dev>/device/model`.# busPath: /pci0000:00/0000:00:17.0/ata1/host0/target0:0:0/0:0:0:0 # Disk bus path.# # Allows for supplying additional system extension images to install on top of base Talos image.# extensions:# - image: ghcr.io/siderolabs/gvisor:20220117.0-v1.0.0 # System extension image.
Field
Type
Description
Value(s)
type
string
Defines the role of the machine within the cluster. Control Plane
Control Plane node type designates the node as a control plane member. This means it will host etcd along with the Kubernetes controlplane components such as API Server, Controller Manager, Scheduler.
Worker
Worker node type designates the node as a worker node. This means it will be an available compute node for scheduling workloads.
This node type was previously known as “join”; that value is still supported but deprecated.
controlplane worker
token
string
The token is used by a machine to join the PKI of the cluster.Using this token, a machine will create a certificate signing request (CSR), and request a certificate that will be used as its’ identity.Show example(s)
token: 328hom.uqjzh6jnn2eie9oi
ca
PEMEncodedCertificateAndKey
The root certificate authority of the PKI.It is composed of a base64 encoded crt and key.Show example(s)
The certificates issued by certificate authorities are accepted in addition to issuing ‘ca’.It is composed of a base64 encoded `crt``.
certSANs
[]string
Extra certificate subject alternative names for the machine’s certificate.By default, all non-loopback interface IPs are automatically added to the certificate’s SANs.Show example(s)
Provides machine specific control plane configuration options. Show example(s)
controlPlane:
# Controller manager machine specific configuration options.controllerManager:
disabled: false# Disable kube-controller-manager on the node.# Scheduler machine specific configuration options.scheduler:
disabled: true# Disable kube-scheduler on the node.
Used to provide additional options to the kubelet. Show example(s)
kubelet:
image: ghcr.io/siderolabs/kubelet:v1.32.0 # The `image` field is an optional reference to an alternative kubelet image.# The `extraArgs` field is used to provide additional flags to the kubelet.extraArgs:
feature-gates: ServerSideApply=true
# # The `ClusterDNS` field is an optional reference to an alternative kubelet clusterDNS ip list.# clusterDNS:# - 10.96.0.10# - 169.254.2.53# # The `extraMounts` field is used to add additional mounts to the kubelet container.# extraMounts:# - destination: /var/lib/example # Destination is the absolute path where the mount will be placed in the container.# type: bind # Type specifies the mount kind.# source: /var/lib/example # Source specifies the source path of the mount.# # Options are fstab style mount options.# options:# - bind# - rshared# - rw# # The `extraConfig` field is used to provide kubelet configuration overrides.# extraConfig:# serverTLSBootstrap: true# # The `KubeletCredentialProviderConfig` field is used to provide kubelet credential configuration.# credentialProviderConfig:# apiVersion: kubelet.config.k8s.io/v1# kind: CredentialProviderConfig# providers:# - apiVersion: credentialprovider.kubelet.k8s.io/v1# defaultCacheDuration: 12h# matchImages:# - '*.dkr.ecr.*.amazonaws.com'# - '*.dkr.ecr.*.amazonaws.com.cn'# - '*.dkr.ecr-fips.*.amazonaws.com'# - '*.dkr.ecr.us-iso-east-1.c2s.ic.gov'# - '*.dkr.ecr.us-isob-east-1.sc2s.sgov.gov'# name: ecr-credential-provider# # The `nodeIP` field is used to configure `--node-ip` flag for the kubelet.# nodeIP:# # The `validSubnets` field configures the networks to pick kubelet node IP from.# validSubnets:# - 10.0.0.0/8# - '!10.0.0.3/32'# - fdc7::/16
pods
[]Unstructured
Used to provide static pod definitions to be run by the kubelet directly bypassing the kube-apiserver. Static pods can be used to run components which should be started before the Kubernetes control plane is up. Talos doesn’t validate the pod definition. Updates to this field can be applied without a reboot.
Provides machine specific network configuration options. Show example(s)
network:
hostname: worker-1 # Used to statically set the hostname for the machine.# `interfaces` is used to define the network interface configuration.interfaces:
- interface: enp0s1 # The interface name.# Assigns static IP addresses to the interface.addresses:
- 192.168.2.0/24
# A list of routes associated with the interface.routes:
- network: 0.0.0.0/0 # The route's network (destination).gateway: 192.168.2.1# The route's gateway (if empty, creates link scope route).metric: 1024# The optional metric for the route.mtu: 1500# The interface's MTU.# # Picks a network device using the selector.# # select a device with bus prefix 00:*.# deviceSelector:# busPath: 00:* # PCI, USB bus prefix, supports matching by wildcard.# # select a device with mac address matching `*:f0:ab` and `virtio` kernel driver.# deviceSelector:# hardwareAddr: '*:f0:ab' # Device hardware (MAC) address, supports matching by wildcard.# driver: virtio_net # Kernel driver, supports matching by wildcard.# # select a device with bus prefix 00:*, a device with mac address matching `*:f0:ab` and `virtio` kernel driver.# deviceSelector:# - busPath: 00:* # PCI, USB bus prefix, supports matching by wildcard.# - hardwareAddr: '*:f0:ab' # Device hardware (MAC) address, supports matching by wildcard.# driver: virtio_net # Kernel driver, supports matching by wildcard.# # Bond specific options.# bond:# # The interfaces that make up the bond.# interfaces:# - enp2s0# - enp2s1# # Picks a network device using the selector.# deviceSelectors:# - busPath: 00:* # PCI, USB bus prefix, supports matching by wildcard.# - hardwareAddr: '*:f0:ab' # Device hardware (MAC) address, supports matching by wildcard.# driver: virtio_net # Kernel driver, supports matching by wildcard.# mode: 802.3ad # A bond option.# lacpRate: fast # A bond option.# # Bridge specific options.# bridge:# # The interfaces that make up the bridge.# interfaces:# - enxda4042ca9a51# - enxae2a6774c259# # Enable STP on this bridge.# stp:# enabled: true # Whether Spanning Tree Protocol (STP) is enabled.# # Configure this device as a bridge port.# bridgePort:# master: br0 # The name of the bridge master interface# # Indicates if DHCP should be used to configure the interface.# dhcp: true# # DHCP specific options.# dhcpOptions:# routeMetric: 1024 # The priority of all routes received via DHCP.# # Wireguard specific configuration.# # wireguard server example# wireguard:# privateKey: ABCDEF... # Specifies a private key configuration (base64 encoded).# listenPort: 51111 # Specifies a device's listening port.# # Specifies a list of peer configurations to apply to a device.# peers:# - publicKey: ABCDEF... # Specifies the public key of this peer.# endpoint: 192.168.1.3 # Specifies the endpoint of this peer entry.# # AllowedIPs specifies a list of allowed IP addresses in CIDR notation for this peer.# allowedIPs:# - 192.168.1.0/24# # wireguard peer example# wireguard:# privateKey: ABCDEF... # Specifies a private key configuration (base64 encoded).# # Specifies a list of peer configurations to apply to a device.# peers:# - publicKey: ABCDEF... # Specifies the public key of this peer.# endpoint: 192.168.1.2:51822 # Specifies the endpoint of this peer entry.# persistentKeepaliveInterval: 10s # Specifies the persistent keepalive interval for this peer.# # AllowedIPs specifies a list of allowed IP addresses in CIDR notation for this peer.# allowedIPs:# - 192.168.1.0/24# # Virtual (shared) IP address configuration.# # layer2 vip example# vip:# ip: 172.16.199.55 # Specifies the IP address to be used.# Used to statically set the nameservers for the machine.nameservers:
- 9.8.7.6 - 8.7.6.5# Used to statically set arbitrary search domains.searchDomains:
- example.org
- example.com
# # Allows for extra entries to be added to the `/etc/hosts` file# extraHostEntries:# - ip: 192.168.1.100 # The IP of the host.# # The host alias.# aliases:# - example# - example.domain.tld# # Configures KubeSpan feature.# kubespan:# enabled: true # Enable the KubeSpan feature.
Used to partition, format and mount additional disks.Since the rootfs is read only with the exception of /var, mounts are only valid if they are under /var. Note that the partitioning and formatting is done only once, if and only if no existing XFS partitions are found. If size: is omitted, the partition is sized to occupy the full disk.Show example(s)
disks:
- device: /dev/sdb # The name of the disk to use.# A list of partitions to create on the disk.partitions:
- mountpoint: /var/mnt/extra # Where to mount the partition.# # The size of partition: either bytes or human readable representation. If `size:` is omitted, the partition is sized to occupy the full disk.# # Human readable representation.# size: 100 MB# # Precise value in bytes.# size: 1073741824
Used to provide instructions for installations. Note that this configuration section gets silently ignored by Talos images that are considered pre-installed. To make sure Talos installs according to the provided configuration, Talos should be booted with ISO or PXE-booted.Show example(s)
install:
disk: /dev/sda # The disk used for installations.# Allows for supplying extra kernel args via the bootloader.extraKernelArgs:
- console=ttyS1
- panic=10
image: ghcr.io/siderolabs/installer:latest # Allows for supplying the image used to perform the installation.wipe: false# Indicates if the installation disk should be wiped at installation time.# # Look up disk using disk attributes like model, size, serial and others.# diskSelector:# size: 4GB # Disk size.# model: WDC* # Disk model `/sys/block/<dev>/device/model`.# busPath: /pci0000:00/0000:00:17.0/ata1/host0/target0:0:0/0:0:0:0 # Disk bus path.# # Allows for supplying additional system extension images to install on top of base Talos image.# extensions:# - image: ghcr.io/siderolabs/gvisor:20220117.0-v1.0.0 # System extension image.
Allows the addition of user specified files.The value of op can be create, overwrite, or append. In the case of create, path must not exist. In the case of overwrite, and append, path must be a valid file. If an op value of append is used, the existing file will be appended. Note that the file contents are not required to be base64 encoded.Show example(s)
files:
- content: '...'# The contents of the file.permissions: 0o666# The file's permissions in octal.path: /tmp/file.txt # The path of the file.op: append # The operation to use
env
Env
The env field allows for the addition of environment variables.All environment variables are set on PID 1 in addition to every service.Show example(s)
env:
GRPC_GO_LOG_SEVERITY_LEVEL: info
GRPC_GO_LOG_VERBOSITY_LEVEL: "99"https_proxy: http://SERVER:PORT/
Used to configure the machine’s time settings. Show example(s)
time:
disabled: false# Indicates if the time service is disabled for the machine.# description: |servers:
- time.cloudflare.com
bootTimeout: 2m0s # Specifies the timeout when the node time is considered to be in sync unlocking the boot sequence.
sysctls
map[string]string
Used to configure the machine’s sysctls. Show example(s)
Used to configure the machine’s container image registry mirrors. Automatically generates matching CRI configuration for registry mirrors.
The mirrors section allows to redirect requests for images to a non-default registry, which might be a local registry or a caching mirror.
The config section provides a way to authenticate to the registry with TLS client identity, provide registry CA, or authentication information. Authentication information has same meaning with the corresponding field in .docker/config.json.
registries:
# Specifies mirror configuration for each registry host namespace.mirrors:
docker.io:
# List of endpoints (URLs) for registry mirrors to use.endpoints:
- https://registry.local
# Specifies TLS & auth configuration for HTTPS image registries.config:
registry.local:
# The TLS configuration for the registry.tls:
# Enable mutual TLS authentication with the registry.clientIdentity:
crt: LS0tIEVYQU1QTEUgQ0VSVElGSUNBVEUgLS0t
key: LS0tIEVYQU1QTEUgS0VZIC0tLQ==
# The auth configuration for this registry.auth:
username: username # Optional registry authentication.password: password # Optional registry authentication.
Machine system disk encryption configuration.Defines each system partition encryption parameters.Show example(s)
systemDiskEncryption:
# Ephemeral partition encryption.ephemeral:
provider: luks2 # Encryption provider to use for the encryption.# Defines the encryption keys generation and storage method.keys:
- # Deterministically generated key from the node UUID and PartitionLabel.nodeID: {}
slot: 0# Key slot number for LUKS2 encryption.# # KMS managed encryption key.# kms:# endpoint: https://192.168.88.21:4443 # KMS endpoint to Seal/Unseal the key.# # Cipher kind to use for the encryption. Depends on the encryption provider.# cipher: aes-xts-plain64# # Defines the encryption sector size.# blockSize: 4096# # Additional --perf parameters for the LUKS2 encryption.# options:# - no_read_workqueue# - no_write_workqueue
Features describe individual Talos features that can be switched on or off. Show example(s)
features:
rbac: true# Enable role-based access control (RBAC).# # Configure Talos API access from Kubernetes pods.# kubernetesTalosAPIAccess:# enabled: true # Enable Talos API access from Kubernetes pods.# # The list of Talos API roles which can be granted for access from Kubernetes pods.# allowedRoles:# - os:reader# # The list of Kubernetes namespaces Talos API access is available from.# allowedKubernetesNamespaces:# - kube-system
Configures the seccomp profiles for the machine. Show example(s)
seccompProfiles:
- name: audit.json # The `name` field is used to provide the file name of the seccomp profile.# The `value` field is used to provide the seccomp profile.value:
defaultAction: SCMP_ACT_LOG
baseRuntimeSpecOverrides
Unstructured
Override (patch) settings in the default OCI runtime spec for CRI containers. It can be used to set some default container settings which are not configurable in Kubernetes, for example default ulimits. Note: this change applies to all newly created containers, and it requires a reboot to take effect.Show example(s)
Configures the node labels for the machine. Note: In the default Kubernetes configuration, worker nodes are restricted to set labels with some prefixes (see NodeRestriction admission plugin).Show example(s)
nodeLabels:
exampleLabel: exampleLabelValue
nodeAnnotations
map[string]string
Configures the node annotations for the machine. Show example(s)
nodeAnnotations:
customer.io/rack: r13a25
nodeTaints
map[string]string
Configures the node taints for the machine. Effect is optional. Note: In the default Kubernetes configuration, worker nodes are not allowed to modify the taints (see NodeRestriction admission plugin).Show example(s)
MachineControlPlaneConfig machine specific configuration options.
machine:
controlPlane:
# Controller manager machine specific configuration options.controllerManager:
disabled: false# Disable kube-controller-manager on the node.# Scheduler machine specific configuration options.scheduler:
disabled: true# Disable kube-scheduler on the node.
MachineControllerManagerConfig represents the machine specific ControllerManager config values.
Field
Type
Description
Value(s)
disabled
bool
Disable kube-controller-manager on the node.
scheduler
MachineSchedulerConfig represents the machine specific Scheduler config values.
Field
Type
Description
Value(s)
disabled
bool
Disable kube-scheduler on the node.
kubelet
KubeletConfig represents the kubelet config values.
machine:
kubelet:
image: ghcr.io/siderolabs/kubelet:v1.32.0 # The `image` field is an optional reference to an alternative kubelet image.# The `extraArgs` field is used to provide additional flags to the kubelet.extraArgs:
feature-gates: ServerSideApply=true
# # The `ClusterDNS` field is an optional reference to an alternative kubelet clusterDNS ip list.# clusterDNS:# - 10.96.0.10# - 169.254.2.53# # The `extraMounts` field is used to add additional mounts to the kubelet container.# extraMounts:# - destination: /var/lib/example # Destination is the absolute path where the mount will be placed in the container.# type: bind # Type specifies the mount kind.# source: /var/lib/example # Source specifies the source path of the mount.# # Options are fstab style mount options.# options:# - bind# - rshared# - rw# # The `extraConfig` field is used to provide kubelet configuration overrides.# extraConfig:# serverTLSBootstrap: true# # The `KubeletCredentialProviderConfig` field is used to provide kubelet credential configuration.# credentialProviderConfig:# apiVersion: kubelet.config.k8s.io/v1# kind: CredentialProviderConfig# providers:# - apiVersion: credentialprovider.kubelet.k8s.io/v1# defaultCacheDuration: 12h# matchImages:# - '*.dkr.ecr.*.amazonaws.com'# - '*.dkr.ecr.*.amazonaws.com.cn'# - '*.dkr.ecr-fips.*.amazonaws.com'# - '*.dkr.ecr.us-iso-east-1.c2s.ic.gov'# - '*.dkr.ecr.us-isob-east-1.sc2s.sgov.gov'# name: ecr-credential-provider# # The `nodeIP` field is used to configure `--node-ip` flag for the kubelet.# nodeIP:# # The `validSubnets` field configures the networks to pick kubelet node IP from.# validSubnets:# - 10.0.0.0/8# - '!10.0.0.3/32'# - fdc7::/16
Field
Type
Description
Value(s)
image
string
The image field is an optional reference to an alternative kubelet image. Show example(s)
image: ghcr.io/siderolabs/kubelet:v1.32.0
clusterDNS
[]string
The ClusterDNS field is an optional reference to an alternative kubelet clusterDNS ip list. Show example(s)
clusterDNS:
- 10.96.0.10 - 169.254.2.53
extraArgs
map[string]string
The extraArgs field is used to provide additional flags to the kubelet. Show example(s)
The extraMounts field is used to add additional mounts to the kubelet container.Note that either bind or rbind are required in the options.Show example(s)
extraMounts:
- destination: /var/lib/example # Destination is the absolute path where the mount will be placed in the container.type: bind # Type specifies the mount kind.source: /var/lib/example # Source specifies the source path of the mount.# Options are fstab style mount options.options:
- bind
- rshared
- rw
extraConfig
Unstructured
The extraConfig field is used to provide kubelet configuration overrides. Some fields are not allowed to be overridden: authentication and authorization, cgroups configuration, ports, etc.Show example(s)
extraConfig:
serverTLSBootstrap: true
credentialProviderConfig
Unstructured
The KubeletCredentialProviderConfig field is used to provide kubelet credential configuration. Show example(s)
The nodeIP field is used to configure --node-ip flag for the kubelet.This is used when a node has multiple addresses to choose from.Show example(s)
nodeIP:
# The `validSubnets` field configures the networks to pick kubelet node IP from.validSubnets:
- 10.0.0.0/8
- '!10.0.0.3/32' - fdc7::/16
skipNodeRegistration
bool
The skipNodeRegistration is used to run the kubelet without registering with the apiserver.This runs kubelet as standalone and only runs static pods.
true yes false no
disableManifestsDirectory
bool
The disableManifestsDirectory field configures the kubelet to get static pod manifests from the /etc/kubernetes/manifests directory.It’s recommended to configure static pods with the “pods” key instead.
true yes false no
extraMounts[]
ExtraMount wraps OCI Mount specification.
machine:
kubelet:
extraMounts:
- destination: /var/lib/example # Destination is the absolute path where the mount will be placed in the container.type: bind # Type specifies the mount kind.source: /var/lib/example # Source specifies the source path of the mount.# Options are fstab style mount options.options:
- bind
- rshared
- rw
Field
Type
Description
Value(s)
destination
string
Destination is the absolute path where the mount will be placed in the container.
UID/GID mappings used for changing file owners w/o calling chown, fs should support it. Every mount point could have its own mapping.
uidMappings[]
LinuxIDMapping represents the Linux ID mapping.
Field
Type
Description
Value(s)
containerID
uint32
ContainerID is the starting UID/GID in the container.
hostID
uint32
HostID is the starting UID/GID on the host to be mapped to ‘ContainerID’.
size
uint32
Size is the number of IDs to be mapped.
gidMappings[]
LinuxIDMapping represents the Linux ID mapping.
Field
Type
Description
Value(s)
containerID
uint32
ContainerID is the starting UID/GID in the container.
hostID
uint32
HostID is the starting UID/GID on the host to be mapped to ‘ContainerID’.
size
uint32
Size is the number of IDs to be mapped.
nodeIP
KubeletNodeIPConfig represents the kubelet node IP configuration.
machine:
kubelet:
nodeIP:
# The `validSubnets` field configures the networks to pick kubelet node IP from.validSubnets:
- 10.0.0.0/8
- '!10.0.0.3/32' - fdc7::/16
Field
Type
Description
Value(s)
validSubnets
[]string
The validSubnets field configures the networks to pick kubelet node IP from.For dual stack configuration, there should be two subnets: one for IPv4, another for IPv6. IPs can be excluded from the list by using negative match with !, e.g !10.0.0.0/8. Negative subnet matches should be specified last to filter out IPs picked by positive matches. If not specified, node IP is picked based on cluster podCIDRs: IPv4/IPv6 address or both.
network
NetworkConfig represents the machine’s networking config values.
machine:
network:
hostname: worker-1 # Used to statically set the hostname for the machine.# `interfaces` is used to define the network interface configuration.interfaces:
- interface: enp0s1 # The interface name.# Assigns static IP addresses to the interface.addresses:
- 192.168.2.0/24
# A list of routes associated with the interface.routes:
- network: 0.0.0.0/0 # The route's network (destination).gateway: 192.168.2.1# The route's gateway (if empty, creates link scope route).metric: 1024# The optional metric for the route.mtu: 1500# The interface's MTU.# # Picks a network device using the selector.# # select a device with bus prefix 00:*.# deviceSelector:# busPath: 00:* # PCI, USB bus prefix, supports matching by wildcard.# # select a device with mac address matching `*:f0:ab` and `virtio` kernel driver.# deviceSelector:# hardwareAddr: '*:f0:ab' # Device hardware (MAC) address, supports matching by wildcard.# driver: virtio_net # Kernel driver, supports matching by wildcard.# # select a device with bus prefix 00:*, a device with mac address matching `*:f0:ab` and `virtio` kernel driver.# deviceSelector:# - busPath: 00:* # PCI, USB bus prefix, supports matching by wildcard.# - hardwareAddr: '*:f0:ab' # Device hardware (MAC) address, supports matching by wildcard.# driver: virtio_net # Kernel driver, supports matching by wildcard.# # Bond specific options.# bond:# # The interfaces that make up the bond.# interfaces:# - enp2s0# - enp2s1# # Picks a network device using the selector.# deviceSelectors:# - busPath: 00:* # PCI, USB bus prefix, supports matching by wildcard.# - hardwareAddr: '*:f0:ab' # Device hardware (MAC) address, supports matching by wildcard.# driver: virtio_net # Kernel driver, supports matching by wildcard.# mode: 802.3ad # A bond option.# lacpRate: fast # A bond option.# # Bridge specific options.# bridge:# # The interfaces that make up the bridge.# interfaces:# - enxda4042ca9a51# - enxae2a6774c259# # Enable STP on this bridge.# stp:# enabled: true # Whether Spanning Tree Protocol (STP) is enabled.# # Configure this device as a bridge port.# bridgePort:# master: br0 # The name of the bridge master interface# # Indicates if DHCP should be used to configure the interface.# dhcp: true# # DHCP specific options.# dhcpOptions:# routeMetric: 1024 # The priority of all routes received via DHCP.# # Wireguard specific configuration.# # wireguard server example# wireguard:# privateKey: ABCDEF... # Specifies a private key configuration (base64 encoded).# listenPort: 51111 # Specifies a device's listening port.# # Specifies a list of peer configurations to apply to a device.# peers:# - publicKey: ABCDEF... # Specifies the public key of this peer.# endpoint: 192.168.1.3 # Specifies the endpoint of this peer entry.# # AllowedIPs specifies a list of allowed IP addresses in CIDR notation for this peer.# allowedIPs:# - 192.168.1.0/24# # wireguard peer example# wireguard:# privateKey: ABCDEF... # Specifies a private key configuration (base64 encoded).# # Specifies a list of peer configurations to apply to a device.# peers:# - publicKey: ABCDEF... # Specifies the public key of this peer.# endpoint: 192.168.1.2:51822 # Specifies the endpoint of this peer entry.# persistentKeepaliveInterval: 10s # Specifies the persistent keepalive interval for this peer.# # AllowedIPs specifies a list of allowed IP addresses in CIDR notation for this peer.# allowedIPs:# - 192.168.1.0/24# # Virtual (shared) IP address configuration.# # layer2 vip example# vip:# ip: 172.16.199.55 # Specifies the IP address to be used.# Used to statically set the nameservers for the machine.nameservers:
- 9.8.7.6 - 8.7.6.5# Used to statically set arbitrary search domains.searchDomains:
- example.org
- example.com
# # Allows for extra entries to be added to the `/etc/hosts` file# extraHostEntries:# - ip: 192.168.1.100 # The IP of the host.# # The host alias.# aliases:# - example# - example.domain.tld# # Configures KubeSpan feature.# kubespan:# enabled: true # Enable the KubeSpan feature.
Field
Type
Description
Value(s)
hostname
string
Used to statically set the hostname for the machine.
interfaces is used to define the network interface configuration.By default all network interfaces will attempt a DHCP discovery. This can be further tuned through this configuration parameter.Show example(s)
interfaces:
- interface: enp0s1 # The interface name.# Assigns static IP addresses to the interface.addresses:
- 192.168.2.0/24
# A list of routes associated with the interface.routes:
- network: 0.0.0.0/0 # The route's network (destination).gateway: 192.168.2.1# The route's gateway (if empty, creates link scope route).metric: 1024# The optional metric for the route.mtu: 1500# The interface's MTU.# # Picks a network device using the selector.# # select a device with bus prefix 00:*.# deviceSelector:# busPath: 00:* # PCI, USB bus prefix, supports matching by wildcard.# # select a device with mac address matching `*:f0:ab` and `virtio` kernel driver.# deviceSelector:# hardwareAddr: '*:f0:ab' # Device hardware (MAC) address, supports matching by wildcard.# driver: virtio_net # Kernel driver, supports matching by wildcard.# # select a device with bus prefix 00:*, a device with mac address matching `*:f0:ab` and `virtio` kernel driver.# deviceSelector:# - busPath: 00:* # PCI, USB bus prefix, supports matching by wildcard.# - hardwareAddr: '*:f0:ab' # Device hardware (MAC) address, supports matching by wildcard.# driver: virtio_net # Kernel driver, supports matching by wildcard.# # Bond specific options.# bond:# # The interfaces that make up the bond.# interfaces:# - enp2s0# - enp2s1# # Picks a network device using the selector.# deviceSelectors:# - busPath: 00:* # PCI, USB bus prefix, supports matching by wildcard.# - hardwareAddr: '*:f0:ab' # Device hardware (MAC) address, supports matching by wildcard.# driver: virtio_net # Kernel driver, supports matching by wildcard.# mode: 802.3ad # A bond option.# lacpRate: fast # A bond option.# # Bridge specific options.# bridge:# # The interfaces that make up the bridge.# interfaces:# - enxda4042ca9a51# - enxae2a6774c259# # Enable STP on this bridge.# stp:# enabled: true # Whether Spanning Tree Protocol (STP) is enabled.# # Configure this device as a bridge port.# bridgePort:# master: br0 # The name of the bridge master interface# # Indicates if DHCP should be used to configure the interface.# dhcp: true# # DHCP specific options.# dhcpOptions:# routeMetric: 1024 # The priority of all routes received via DHCP.# # Wireguard specific configuration.# # wireguard server example# wireguard:# privateKey: ABCDEF... # Specifies a private key configuration (base64 encoded).# listenPort: 51111 # Specifies a device's listening port.# # Specifies a list of peer configurations to apply to a device.# peers:# - publicKey: ABCDEF... # Specifies the public key of this peer.# endpoint: 192.168.1.3 # Specifies the endpoint of this peer entry.# # AllowedIPs specifies a list of allowed IP addresses in CIDR notation for this peer.# allowedIPs:# - 192.168.1.0/24# # wireguard peer example# wireguard:# privateKey: ABCDEF... # Specifies a private key configuration (base64 encoded).# # Specifies a list of peer configurations to apply to a device.# peers:# - publicKey: ABCDEF... # Specifies the public key of this peer.# endpoint: 192.168.1.2:51822 # Specifies the endpoint of this peer entry.# persistentKeepaliveInterval: 10s # Specifies the persistent keepalive interval for this peer.# # AllowedIPs specifies a list of allowed IP addresses in CIDR notation for this peer.# allowedIPs:# - 192.168.1.0/24# # Virtual (shared) IP address configuration.# # layer2 vip example# vip:# ip: 172.16.199.55 # Specifies the IP address to be used.
nameservers
[]string
Used to statically set the nameservers for the machine.Defaults to 1.1.1.1 and 8.8.8.8Show example(s)
nameservers:
- 8.8.8.8 - 1.1.1.1
searchDomains
[]string
Used to statically set arbitrary search domains. Show example(s)
kubespan:
enabled: true# Enable the KubeSpan feature.
disableSearchDomain
bool
Disable generating a default search domain in /etc/resolv.confbased on the machine hostname. Defaults to false.
true yes false no
interfaces[]
Device represents a network interface.
machine:
network:
interfaces:
- interface: enp0s1 # The interface name.# Assigns static IP addresses to the interface.addresses:
- 192.168.2.0/24
# A list of routes associated with the interface.routes:
- network: 0.0.0.0/0 # The route's network (destination).gateway: 192.168.2.1# The route's gateway (if empty, creates link scope route).metric: 1024# The optional metric for the route.mtu: 1500# The interface's MTU.# # Picks a network device using the selector.# # select a device with bus prefix 00:*.# deviceSelector:# busPath: 00:* # PCI, USB bus prefix, supports matching by wildcard.# # select a device with mac address matching `*:f0:ab` and `virtio` kernel driver.# deviceSelector:# hardwareAddr: '*:f0:ab' # Device hardware (MAC) address, supports matching by wildcard.# driver: virtio_net # Kernel driver, supports matching by wildcard.# # select a device with bus prefix 00:*, a device with mac address matching `*:f0:ab` and `virtio` kernel driver.# deviceSelector:# - busPath: 00:* # PCI, USB bus prefix, supports matching by wildcard.# - hardwareAddr: '*:f0:ab' # Device hardware (MAC) address, supports matching by wildcard.# driver: virtio_net # Kernel driver, supports matching by wildcard.# # Bond specific options.# bond:# # The interfaces that make up the bond.# interfaces:# - enp2s0# - enp2s1# # Picks a network device using the selector.# deviceSelectors:# - busPath: 00:* # PCI, USB bus prefix, supports matching by wildcard.# - hardwareAddr: '*:f0:ab' # Device hardware (MAC) address, supports matching by wildcard.# driver: virtio_net # Kernel driver, supports matching by wildcard.# mode: 802.3ad # A bond option.# lacpRate: fast # A bond option.# # Bridge specific options.# bridge:# # The interfaces that make up the bridge.# interfaces:# - enxda4042ca9a51# - enxae2a6774c259# # Enable STP on this bridge.# stp:# enabled: true # Whether Spanning Tree Protocol (STP) is enabled.# # Configure this device as a bridge port.# bridgePort:# master: br0 # The name of the bridge master interface# # Indicates if DHCP should be used to configure the interface.# dhcp: true# # DHCP specific options.# dhcpOptions:# routeMetric: 1024 # The priority of all routes received via DHCP.# # Wireguard specific configuration.# # wireguard server example# wireguard:# privateKey: ABCDEF... # Specifies a private key configuration (base64 encoded).# listenPort: 51111 # Specifies a device's listening port.# # Specifies a list of peer configurations to apply to a device.# peers:# - publicKey: ABCDEF... # Specifies the public key of this peer.# endpoint: 192.168.1.3 # Specifies the endpoint of this peer entry.# # AllowedIPs specifies a list of allowed IP addresses in CIDR notation for this peer.# allowedIPs:# - 192.168.1.0/24# # wireguard peer example# wireguard:# privateKey: ABCDEF... # Specifies a private key configuration (base64 encoded).# # Specifies a list of peer configurations to apply to a device.# peers:# - publicKey: ABCDEF... # Specifies the public key of this peer.# endpoint: 192.168.1.2:51822 # Specifies the endpoint of this peer entry.# persistentKeepaliveInterval: 10s # Specifies the persistent keepalive interval for this peer.# # AllowedIPs specifies a list of allowed IP addresses in CIDR notation for this peer.# allowedIPs:# - 192.168.1.0/24# # Virtual (shared) IP address configuration.# # layer2 vip example# vip:# ip: 172.16.199.55 # Specifies the IP address to be used.
Field
Type
Description
Value(s)
interface
string
The interface name.Mutually exclusive with deviceSelector.Show example(s)
Picks a network device using the selector.Mutually exclusive with interface. Supports partial match using wildcard syntax.Show example(s)
deviceSelector:
busPath: 00:* # PCI, USB bus prefix, supports matching by wildcard.
deviceSelector:
hardwareAddr: '*:f0:ab'# Device hardware (MAC) address, supports matching by wildcard.driver: virtio_net # Kernel driver, supports matching by wildcard.
addresses
[]string
Assigns static IP addresses to the interface.An address can be specified either in proper CIDR notation or as a standalone address (netmask of all ones is assumed).Show example(s)
A list of routes associated with the interface.If used in combination with DHCP, these routes will be appended to routes returned by DHCP server.Show example(s)
routes:
- network: 0.0.0.0/0 # The route's network (destination).gateway: 10.5.0.1# The route's gateway (if empty, creates link scope route). - network: 10.2.0.0/16 # The route's network (destination).gateway: 10.2.0.1# The route's gateway (if empty, creates link scope route).
bond:
# The interfaces that make up the bond.interfaces:
- enp2s0
- enp2s1
mode: 802.3ad # A bond option.lacpRate: fast # A bond option.# # Picks a network device using the selector.# # select a device with bus prefix 00:*, a device with mac address matching `*:f0:ab` and `virtio` kernel driver.# deviceSelectors:# - busPath: 00:* # PCI, USB bus prefix, supports matching by wildcard.# - hardwareAddr: '*:f0:ab' # Device hardware (MAC) address, supports matching by wildcard.# driver: virtio_net # Kernel driver, supports matching by wildcard.
bridge:
# The interfaces that make up the bridge.interfaces:
- enxda4042ca9a51
- enxae2a6774c259
# Enable STP on this bridge.stp:
enabled: true# Whether Spanning Tree Protocol (STP) is enabled.
Wireguard specific configuration.Includes things like private key, listen port, peers.Show example(s)
wireguard:
privateKey: ABCDEF... # Specifies a private key configuration (base64 encoded).listenPort: 51111# Specifies a device's listening port.# Specifies a list of peer configurations to apply to a device.peers:
- publicKey: ABCDEF... # Specifies the public key of this peer.endpoint: 192.168.1.3# Specifies the endpoint of this peer entry.# AllowedIPs specifies a list of allowed IP addresses in CIDR notation for this peer.allowedIPs:
- 192.168.1.0/24
wireguard:
privateKey: ABCDEF... # Specifies a private key configuration (base64 encoded).# Specifies a list of peer configurations to apply to a device.peers:
- publicKey: ABCDEF... # Specifies the public key of this peer.endpoint: 192.168.1.2:51822# Specifies the endpoint of this peer entry.persistentKeepaliveInterval: 10s # Specifies the persistent keepalive interval for this peer.# AllowedIPs specifies a list of allowed IP addresses in CIDR notation for this peer.allowedIPs:
- 192.168.1.0/24
machine:
network:
interfaces:
- deviceSelector:
- busPath: 00:* # PCI, USB bus prefix, supports matching by wildcard. - hardwareAddr: '*:f0:ab'# Device hardware (MAC) address, supports matching by wildcard.driver: virtio_net # Kernel driver, supports matching by wildcard.
Field
Type
Description
Value(s)
busPath
string
PCI, USB bus prefix, supports matching by wildcard.
hardwareAddr
string
Device hardware (MAC) address, supports matching by wildcard.
permanentAddr
string
Device permanent hardware address, supports matching by wildcard.The permanent address doesn’t change when the link is enslaved to a bond, so it’s recommended to use this field for bond members.
pciID
string
PCI ID (vendor ID, product ID), supports matching by wildcard.
driver
string
Kernel driver, supports matching by wildcard.
physical
bool
Select only physical devices.
routes[]
Route represents a network route.
machine:
network:
interfaces:
- routes:
- network: 0.0.0.0/0 # The route's network (destination).gateway: 10.5.0.1# The route's gateway (if empty, creates link scope route). - network: 10.2.0.0/16 # The route's network (destination).gateway: 10.2.0.1# The route's gateway (if empty, creates link scope route).
Field
Type
Description
Value(s)
network
string
The route’s network (destination).
gateway
string
The route’s gateway (if empty, creates link scope route).
source
string
The route’s source address (optional).
metric
uint32
The optional metric for the route.
mtu
uint32
The optional MTU for the route.
bond
Bond contains the various options for configuring a bonded interface.
machine:
network:
interfaces:
- bond:
# The interfaces that make up the bond.interfaces:
- enp2s0
- enp2s1
mode: 802.3ad # A bond option.lacpRate: fast # A bond option.# # Picks a network device using the selector.# # select a device with bus prefix 00:*, a device with mac address matching `*:f0:ab` and `virtio` kernel driver.# deviceSelectors:# - busPath: 00:* # PCI, USB bus prefix, supports matching by wildcard.# - hardwareAddr: '*:f0:ab' # Device hardware (MAC) address, supports matching by wildcard.# driver: virtio_net # Kernel driver, supports matching by wildcard.
machine:
network:
interfaces:
- bond:
deviceSelectors:
- busPath: 00:* # PCI, USB bus prefix, supports matching by wildcard. - hardwareAddr: '*:f0:ab'# Device hardware (MAC) address, supports matching by wildcard.driver: virtio_net # Kernel driver, supports matching by wildcard.
Field
Type
Description
Value(s)
busPath
string
PCI, USB bus prefix, supports matching by wildcard.
hardwareAddr
string
Device hardware (MAC) address, supports matching by wildcard.
permanentAddr
string
Device permanent hardware address, supports matching by wildcard.The permanent address doesn’t change when the link is enslaved to a bond, so it’s recommended to use this field for bond members.
pciID
string
PCI ID (vendor ID, product ID), supports matching by wildcard.
driver
string
Kernel driver, supports matching by wildcard.
physical
bool
Select only physical devices.
bridge
Bridge contains the various options for configuring a bridge interface.
machine:
network:
interfaces:
- bridge:
# The interfaces that make up the bridge.interfaces:
- enxda4042ca9a51
- enxae2a6774c259
# Enable STP on this bridge.stp:
enabled: true# Whether Spanning Tree Protocol (STP) is enabled.
Specifies the Hetzner Cloud API settings to assign VIP to the node.
equinixMetal
VIPEquinixMetalConfig contains settings for Equinix Metal VIP management.
Field
Type
Description
Value(s)
apiToken
string
Specifies the Equinix Metal API Token.
hcloud
VIPHCloudConfig contains settings for Hetzner Cloud VIP management.
Field
Type
Description
Value(s)
apiToken
string
Specifies the Hetzner Cloud API Token.
dhcpOptions
DHCPOptions contains options for configuring the DHCP settings for a given interface.
machine:
network:
interfaces:
- vlans:
- dhcpOptions:
routeMetric: 1024# The priority of all routes received via DHCP.
Field
Type
Description
Value(s)
routeMetric
uint32
The priority of all routes received via DHCP.
ipv4
bool
Enables DHCPv4 protocol for the interface (default is enabled).
ipv6
bool
Enables DHCPv6 protocol for the interface (default is disabled).
duidv6
string
Set client DUID (hex string).
dhcpOptions
DHCPOptions contains options for configuring the DHCP settings for a given interface.
machine:
network:
interfaces:
- dhcpOptions:
routeMetric: 1024# The priority of all routes received via DHCP.
Field
Type
Description
Value(s)
routeMetric
uint32
The priority of all routes received via DHCP.
ipv4
bool
Enables DHCPv4 protocol for the interface (default is enabled).
ipv6
bool
Enables DHCPv6 protocol for the interface (default is disabled).
duidv6
string
Set client DUID (hex string).
wireguard
DeviceWireguardConfig contains settings for configuring Wireguard network interface.
machine:
network:
interfaces:
- wireguard:
privateKey: ABCDEF... # Specifies a private key configuration (base64 encoded).listenPort: 51111# Specifies a device's listening port.# Specifies a list of peer configurations to apply to a device.peers:
- publicKey: ABCDEF... # Specifies the public key of this peer.endpoint: 192.168.1.3# Specifies the endpoint of this peer entry.# AllowedIPs specifies a list of allowed IP addresses in CIDR notation for this peer.allowedIPs:
- 192.168.1.0/24
machine:
network:
interfaces:
- wireguard:
privateKey: ABCDEF... # Specifies a private key configuration (base64 encoded).# Specifies a list of peer configurations to apply to a device.peers:
- publicKey: ABCDEF... # Specifies the public key of this peer.endpoint: 192.168.1.2:51822# Specifies the endpoint of this peer entry.persistentKeepaliveInterval: 10s # Specifies the persistent keepalive interval for this peer.# AllowedIPs specifies a list of allowed IP addresses in CIDR notation for this peer.allowedIPs:
- 192.168.1.0/24
Field
Type
Description
Value(s)
privateKey
string
Specifies a private key configuration (base64 encoded).Can be generated by wg genkey.
machine:
network:
kubespan:
enabled: true# Enable the KubeSpan feature.
Field
Type
Description
Value(s)
enabled
bool
Enable the KubeSpan feature.Cluster discovery should be enabled with .cluster.discovery.enabled for KubeSpan to be enabled.
advertiseKubernetesNetworks
bool
Control whether Kubernetes pod CIDRs are announced over KubeSpan from the node.If disabled, CNI handles encapsulating pod-to-pod traffic into some node-to-node tunnel, and KubeSpan handles the node-to-node traffic. If enabled, KubeSpan will take over pod-to-pod traffic and send it over KubeSpan directly. When enabled, KubeSpan should have a way to detect complete pod CIDRs of the node which is not always the case with CNIs not relying on Kubernetes for IPAM.
allowDownPeerBypass
bool
Skip sending traffic via KubeSpan if the peer connection state is not up.This provides configurable choice between connectivity and security: either traffic is always forced to go via KubeSpan (even if Wireguard peer connection is not up), or traffic can go directly to the peer if Wireguard connection can’t be established.
harvestExtraEndpoints
bool
KubeSpan can collect and publish extra endpoints for each member of the clusterbased on Wireguard endpoint information for each peer. This feature is disabled by default, don’t enable it with high number of peers (>50) in the KubeSpan network (performance issues).
Filter node addresses which will be advertised as KubeSpan endpoints for peer-to-peer Wireguard connections. By default, all addresses are advertised, and KubeSpan cycles through all endpoints until it finds one that works.
Default value: no filtering.Show example(s)
endpoints:
- 0.0.0.0/0
- '!192.168.0.0/16' - ::/0
disks[]
MachineDisk represents the options available for partitioning, formatting, and
mounting extra disks.
machine:
disks:
- device: /dev/sdb # The name of the disk to use.# A list of partitions to create on the disk.partitions:
- mountpoint: /var/mnt/extra # Where to mount the partition.# # The size of partition: either bytes or human readable representation. If `size:` is omitted, the partition is sized to occupy the full disk.# # Human readable representation.# size: 100 MB# # Precise value in bytes.# size: 1073741824
DiskPartition represents the options for a disk partition.
Field
Type
Description
Value(s)
size
DiskSize
The size of partition: either bytes or human readable representation. If size: is omitted, the partition is sized to occupy the full disk. Show example(s)
size: 100 MB
size: 1073741824
mountpoint
string
Where to mount the partition.
install
InstallConfig represents the installation options for preparing a node.
machine:
install:
disk: /dev/sda # The disk used for installations.# Allows for supplying extra kernel args via the bootloader.extraKernelArgs:
- console=ttyS1
- panic=10
image: ghcr.io/siderolabs/installer:latest # Allows for supplying the image used to perform the installation.wipe: false# Indicates if the installation disk should be wiped at installation time.# # Look up disk using disk attributes like model, size, serial and others.# diskSelector:# size: 4GB # Disk size.# model: WDC* # Disk model `/sys/block/<dev>/device/model`.# busPath: /pci0000:00/0000:00:17.0/ata1/host0/target0:0:0/0:0:0:0 # Disk bus path.# # Allows for supplying additional system extension images to install on top of base Talos image.# extensions:# - image: ghcr.io/siderolabs/gvisor:20220117.0-v1.0.0 # System extension image.
Look up disk using disk attributes like model, size, serial and others.Always has priority over disk.Show example(s)
diskSelector:
size: '>= 1TB'# Disk size.model: WDC* # Disk model `/sys/block/<dev>/device/model`.# # Disk bus path.# busPath: /pci0000:00/0000:00:17.0/ata1/host0/target0:0:0/0:0:0:0# busPath: /pci0000:00/*
extraKernelArgs
[]string
Allows for supplying extra kernel args via the bootloader.Existing kernel args can be removed by prefixing the argument with a -. For example -console removes all console=<value> arguments, whereas -console=tty0 removes the console=tty0 default argument.Show example(s)
Allows for supplying the image used to perform the installation.Image reference for each Talos release can be found on GitHub releases page.Show example(s)
Allows for supplying additional system extension images to install on top of base Talos image. Show example(s)
extensions:
- image: ghcr.io/siderolabs/gvisor:20220117.0-v1.0.0 # System extension image.
wipe
bool
Indicates if the installation disk should be wiped at installation time.Defaults to true.
true yes false no
legacyBIOSSupport
bool
Indicates if MBR partition should be marked as bootable (active).Should be enabled only for the systems with legacy BIOS that doesn’t support GPT partitioning scheme.
diskSelector
InstallDiskSelector represents a disk query parameters for the install disk lookup.
machine:
install:
diskSelector:
size: '>= 1TB'# Disk size.model: WDC* # Disk model `/sys/block/<dev>/device/model`.# # Disk bus path.# busPath: /pci0000:00/0000:00:17.0/ata1/host0/target0:0:0/0:0:0:0# busPath: /pci0000:00/*
InstallExtensionConfig represents a configuration for a system extension.
machine:
install:
extensions:
- image: ghcr.io/siderolabs/gvisor:20220117.0-v1.0.0 # System extension image.
Field
Type
Description
Value(s)
image
string
System extension image.
files[]
MachineFile represents a file to write to disk.
machine:
files:
- content: '...'# The contents of the file.permissions: 0o666# The file's permissions in octal.path: /tmp/file.txt # The path of the file.op: append # The operation to use
Field
Type
Description
Value(s)
content
string
The contents of the file.
permissions
FileMode
The file’s permissions in octal.
path
string
The path of the file.
op
string
The operation to use
create append overwrite
time
TimeConfig represents the options for configuring time on a machine.
machine:
time:
disabled: false# Indicates if the time service is disabled for the machine.# description: |servers:
- time.cloudflare.com
bootTimeout: 2m0s # Specifies the timeout when the node time is considered to be in sync unlocking the boot sequence.
Field
Type
Description
Value(s)
disabled
bool
Indicates if the time service is disabled for the machine.Defaults to false.
servers
[]string
description:
Specifies time (NTP) servers to use for setting the system time. Defaults to time.cloudflare.com.
Talos can also sync to the PTP time source (e.g provided by the hypervisor), provide the path to the PTP device as “/dev/ptp0” or “/dev/ptp_kvm”.
bootTimeout
Duration
Specifies the timeout when the node time is considered to be in sync unlocking the boot sequence.NTP sync will be still running in the background. Defaults to “infinity” (waiting forever for time sync)
registries
RegistriesConfig represents the image pull options.
machine:
registries:
# Specifies mirror configuration for each registry host namespace.mirrors:
docker.io:
# List of endpoints (URLs) for registry mirrors to use.endpoints:
- https://registry.local
# Specifies TLS & auth configuration for HTTPS image registries.config:
registry.local:
# The TLS configuration for the registry.tls:
# Enable mutual TLS authentication with the registry.clientIdentity:
crt: LS0tIEVYQU1QTEUgQ0VSVElGSUNBVEUgLS0t
key: LS0tIEVYQU1QTEUgS0VZIC0tLQ==
# The auth configuration for this registry.auth:
username: username # Optional registry authentication.password: password # Optional registry authentication.
Specifies mirror configuration for each registry host namespace.This setting allows to configure local pull-through caching registires, air-gapped installations, etc.
For example, when pulling an image with the reference example.com:123/image:v1, the example.com:123 key will be used to lookup the mirror configuration.
Optionally the * key can be used to configure a fallback mirror.
Registry name is the first segment of image identifier, with ‘docker.io’ being default one.Show example(s)
mirrors:
ghcr.io:
# List of endpoints (URLs) for registry mirrors to use.endpoints:
- https://registry.insecure
- https://ghcr.io/v2/
Specifies TLS & auth configuration for HTTPS image registries.Mutual TLS can be enabled with ‘clientIdentity’ option.
The full hostname and port (if not using a default port 443) should be used as the key. The fallback key * can’t be used for TLS configuration.
TLS configuration can be skipped if registry has trusted server certificate.Show example(s)
config:
registry.insecure:
# The TLS configuration for the registry.tls:
insecureSkipVerify: true# Skip TLS server certificate verification (not recommended).# # Enable mutual TLS authentication with the registry.# clientIdentity:# crt: LS0tIEVYQU1QTEUgQ0VSVElGSUNBVEUgLS0t# key: LS0tIEVYQU1QTEUgS0VZIC0tLQ==# # The auth configuration for this registry.# auth:# username: username # Optional registry authentication.# password: password # Optional registry authentication.
mirrors.*
RegistryMirrorConfig represents mirror configuration for a registry.
machine:
registries:
mirrors:
ghcr.io:
# List of endpoints (URLs) for registry mirrors to use.endpoints:
- https://registry.insecure
- https://ghcr.io/v2/
Field
Type
Description
Value(s)
endpoints
[]string
List of endpoints (URLs) for registry mirrors to use.Endpoint configures HTTP/HTTPS access mode, host name, port and path (if path is not set, it defaults to /v2).
overridePath
bool
Use the exact path specified for the endpoint (don’t append /v2/).This setting is often required for setting up multiple mirrors on a single instance of a registry.
skipFallback
bool
Skip fallback to the upstream endpoint, for example the mirror configurationfor docker.io will not fallback to registry-1.docker.io.
config.*
RegistryConfig specifies auth & TLS config per registry.
machine:
registries:
config:
registry.insecure:
# The TLS configuration for the registry.tls:
insecureSkipVerify: true# Skip TLS server certificate verification (not recommended).# # Enable mutual TLS authentication with the registry.# clientIdentity:# crt: LS0tIEVYQU1QTEUgQ0VSVElGSUNBVEUgLS0t# key: LS0tIEVYQU1QTEUgS0VZIC0tLQ==# # The auth configuration for this registry.# auth:# username: username # Optional registry authentication.# password: password # Optional registry authentication.
The auth configuration for this registry.Note: changes to the registry auth will not be picked up by the CRI containerd plugin without a reboot.Show example(s)
Optional registry authentication.The meaning of each field is the same with the corresponding field in .docker/config.json.
password
string
Optional registry authentication.The meaning of each field is the same with the corresponding field in .docker/config.json.
auth
string
Optional registry authentication.The meaning of each field is the same with the corresponding field in .docker/config.json.
identityToken
string
Optional registry authentication.The meaning of each field is the same with the corresponding field in .docker/config.json.
systemDiskEncryption
SystemDiskEncryptionConfig specifies system disk partitions encryption settings.
machine:
systemDiskEncryption:
# Ephemeral partition encryption.ephemeral:
provider: luks2 # Encryption provider to use for the encryption.# Defines the encryption keys generation and storage method.keys:
- # Deterministically generated key from the node UUID and PartitionLabel.nodeID: {}
slot: 0# Key slot number for LUKS2 encryption.# # KMS managed encryption key.# kms:# endpoint: https://192.168.88.21:4443 # KMS endpoint to Seal/Unseal the key.# # Cipher kind to use for the encryption. Depends on the encryption provider.# cipher: aes-xts-plain64# # Defines the encryption sector size.# blockSize: 4096# # Additional --perf parameters for the LUKS2 encryption.# options:# - no_read_workqueue# - no_write_workqueue
EncryptionKeyNodeID represents deterministically generated key from the node UUID and PartitionLabel.
kms
EncryptionKeyKMS represents a key that is generated and then sealed/unsealed by the KMS server.
machine:
systemDiskEncryption:
state:
keys:
- kms:
endpoint: https://192.168.88.21:4443 # KMS endpoint to Seal/Unseal the key.
Field
Type
Description
Value(s)
endpoint
string
KMS endpoint to Seal/Unseal the key.
tpm
EncryptionKeyTPM represents a key that is generated and then sealed/unsealed by the TPM.
Field
Type
Description
Value(s)
checkSecurebootStatusOnEnroll
bool
Check that Secureboot is enabled in the EFI firmware.If Secureboot is not enabled, the enrollment of the key will fail. As the TPM key is anyways bound to the value of PCR 7, changing Secureboot status or configuration after the initial enrollment will make the key unusable.
EncryptionKeyNodeID represents deterministically generated key from the node UUID and PartitionLabel.
kms
EncryptionKeyKMS represents a key that is generated and then sealed/unsealed by the KMS server.
machine:
systemDiskEncryption:
ephemeral:
keys:
- kms:
endpoint: https://192.168.88.21:4443 # KMS endpoint to Seal/Unseal the key.
Field
Type
Description
Value(s)
endpoint
string
KMS endpoint to Seal/Unseal the key.
tpm
EncryptionKeyTPM represents a key that is generated and then sealed/unsealed by the TPM.
Field
Type
Description
Value(s)
checkSecurebootStatusOnEnroll
bool
Check that Secureboot is enabled in the EFI firmware.If Secureboot is not enabled, the enrollment of the key will fail. As the TPM key is anyways bound to the value of PCR 7, changing Secureboot status or configuration after the initial enrollment will make the key unusable.
features
FeaturesConfig describes individual Talos features that can be switched on or off.
machine:
features:
rbac: true# Enable role-based access control (RBAC).# # Configure Talos API access from Kubernetes pods.# kubernetesTalosAPIAccess:# enabled: true # Enable Talos API access from Kubernetes pods.# # The list of Talos API roles which can be granted for access from Kubernetes pods.# allowedRoles:# - os:reader# # The list of Kubernetes namespaces Talos API access is available from.# allowedKubernetesNamespaces:# - kube-system
Configure Talos API access from Kubernetes pods. This feature is disabled if the feature config is not specified.Show example(s)
kubernetesTalosAPIAccess:
enabled: true# Enable Talos API access from Kubernetes pods.# The list of Talos API roles which can be granted for access from Kubernetes pods.allowedRoles:
- os:reader
# The list of Kubernetes namespaces Talos API access is available from.allowedKubernetesNamespaces:
- kube-system
apidCheckExtKeyUsage
bool
Enable checks for extended key usage of client certificates in apid.
diskQuotaSupport
bool
Enable XFS project quota support for EPHEMERAL partition and user disks.Also enables kubelet tracking of ephemeral disk usage in the kubelet via quota.
Select the node address sort algorithm.The ‘v1’ algorithm sorts addresses by the address itself. The ‘v2’ algorithm prefers more specific prefixes. If unset, defaults to ‘v1’.
kubernetesTalosAPIAccess
KubernetesTalosAPIAccessConfig describes the configuration for the Talos API access from Kubernetes pods.
machine:
features:
kubernetesTalosAPIAccess:
enabled: true# Enable Talos API access from Kubernetes pods.# The list of Talos API roles which can be granted for access from Kubernetes pods.allowedRoles:
- os:reader
# The list of Kubernetes namespaces Talos API access is available from.allowedKubernetesNamespaces:
- kube-system
Field
Type
Description
Value(s)
enabled
bool
Enable Talos API access from Kubernetes pods.
allowedRoles
[]string
The list of Talos API roles which can be granted for access from Kubernetes pods. Empty list means that no roles can be granted, so access is blocked.
allowedKubernetesNamespaces
[]string
The list of Kubernetes namespaces Talos API access is available from.
kubePrism
KubePrism describes the configuration for the KubePrism load balancer.
Field
Type
Description
Value(s)
enabled
bool
Enable KubePrism support - will start local load balancing proxy.
port
int
KubePrism port.
hostDNS
HostDNSConfig describes the configuration for the host DNS resolver.
Field
Type
Description
Value(s)
enabled
bool
Enable host DNS caching resolver.
forwardKubeDNSToHost
bool
Use the host DNS resolver as upstream for Kubernetes CoreDNS pods. When enabled, CoreDNS pods use host DNS server as the upstream DNS (instead of using configured upstream DNS resolvers directly).
resolveMemberNames
bool
Resolve member hostnames using the host DNS resolver. When enabled, cluster member hostnames and node names are resolved using the host DNS resolver. This requires service discovery to be enabled.
imageCache
ImageCacheConfig describes the configuration for the Image Cache feature.
Field
Type
Description
Value(s)
localEnabled
bool
Enable local image cache.
udev
UdevConfig describes how the udev system should be configured.
machine:
udev:
# List of udev rules to apply to the udev systemrules:
- SUBSYSTEM=="drm", KERNEL=="renderD*", GROUP="44", MODE="0660"
Field
Type
Description
Value(s)
rules
[]string
List of udev rules to apply to the udev system
logging
LoggingConfig struct configures Talos logging.
machine:
logging:
# Logging destination.destinations:
- endpoint: tcp://1.2.3.4:12345 # Where to send logs. Supported protocols are "tcp" and "udp".format: json_lines # Logs format.
KernelModuleConfig struct configures Linux kernel modules to load.
Field
Type
Description
Value(s)
name
string
Module name.
parameters
[]string
Module parameters, changes applied after reboot.
seccompProfiles[]
MachineSeccompProfile defines seccomp profiles for the machine.
machine:
seccompProfiles:
- name: audit.json # The `name` field is used to provide the file name of the seccomp profile.# The `value` field is used to provide the seccomp profile.value:
defaultAction: SCMP_ACT_LOG
Field
Type
Description
Value(s)
name
string
The name field is used to provide the file name of the seccomp profile.
value
Unstructured
The value field is used to provide the seccomp profile.
cluster
ClusterConfig represents the cluster-wide config values.
cluster:
# ControlPlaneConfig represents the control plane configuration options.controlPlane:
endpoint: https://1.2.3.4 # Endpoint is the canonical controlplane endpoint, which can be an IP address or a DNS hostname.localAPIServerPort: 443# The port that the API server listens on internally.clusterName: talos.local
# ClusterNetworkConfig represents kube networking configuration options.network:
# The CNI used.cni:
name: flannel # Name of CNI to use.dnsDomain: cluster.local # The domain used by Kubernetes DNS.# The pod subnet CIDR.podSubnets:
- 10.244.0.0/16
# The service subnet CIDR.serviceSubnets:
- 10.96.0.0/12
Field
Type
Description
Value(s)
id
string
Globally unique identifier for this cluster (base64 encoded random 32 bytes).
secret
string
Shared secret of cluster (base64 encoded random 32 bytes).This secret is shared among cluster members but should never be sent over the network.
Provides control plane specific configuration options. Show example(s)
controlPlane:
endpoint: https://1.2.3.4 # Endpoint is the canonical controlplane endpoint, which can be an IP address or a DNS hostname.localAPIServerPort: 443# The port that the API server listens on internally.
Provides cluster specific network configuration options. Show example(s)
network:
# The CNI used.cni:
name: flannel # Name of CNI to use.dnsDomain: cluster.local # The domain used by Kubernetes DNS.# The pod subnet CIDR.podSubnets:
- 10.244.0.0/16
# The service subnet CIDR.serviceSubnets:
- 10.96.0.0/12
token
string
The bootstrap token used to join the cluster. Show example(s)
The list of base64 encoded accepted certificate authorities used by Kubernetes.
aggregatorCA
PEMEncodedCertificateAndKey
The base64 encoded aggregator certificate authority used by Kubernetes for front-proxy certificate generation. This CA can be self-signed.Show example(s)
API server specific configuration options. Show example(s)
apiServer:
image: registry.k8s.io/kube-apiserver:v1.32.0 # The container image used in the API server manifest.# Extra arguments to supply to the API server.extraArgs:
feature-gates: ServerSideApply=true
http2-max-streams-per-connection: "32"# Extra certificate subject alternative names for the API server's certificate.certSANs:
- 1.2.3.4 - 4.5.6.7# # Configure the API server admission plugins.# admissionControl:# - name: PodSecurity # Name is the name of the admission controller.# # Configuration is an embedded configuration object to be used as the plugin's# configuration:# apiVersion: pod-security.admission.config.k8s.io/v1alpha1# defaults:# audit: restricted# audit-version: latest# enforce: baseline# enforce-version: latest# warn: restricted# warn-version: latest# exemptions:# namespaces:# - kube-system# runtimeClasses: []# usernames: []# kind: PodSecurityConfiguration# # Configure the API server audit policy.# auditPolicy:# apiVersion: audit.k8s.io/v1# kind: Policy# rules:# - level: Metadata# # Configure the API server authorization config. Node and RBAC authorizers are always added irrespective of the configuration.# authorizationConfig:# - type: Webhook # Type is the name of the authorizer. Allowed values are `Node`, `RBAC`, and `Webhook`.# name: webhook # Name is used to describe the authorizer.# # webhook is the configuration for the webhook authorizer.# webhook:# connectionInfo:# type: InClusterConfig# failurePolicy: Deny# matchConditionSubjectAccessReviewVersion: v1# matchConditions:# - expression: has(request.resourceAttributes)# - expression: '!(\''system:serviceaccounts:kube-system\'' in request.groups)'# subjectAccessReviewVersion: v1# timeout: 3s# - type: Webhook # Type is the name of the authorizer. Allowed values are `Node`, `RBAC`, and `Webhook`.# name: in-cluster-authorizer # Name is used to describe the authorizer.# # webhook is the configuration for the webhook authorizer.# webhook:# connectionInfo:# type: InClusterConfig# failurePolicy: NoOpinion# matchConditionSubjectAccessReviewVersion: v1# subjectAccessReviewVersion: v1# timeout: 3s
Controller manager server specific configuration options. Show example(s)
controllerManager:
image: registry.k8s.io/kube-controller-manager:v1.32.0 # The container image used in the controller manager manifest.# Extra arguments to supply to the controller manager.extraArgs:
feature-gates: ServerSideApply=true
Kube-proxy server-specific configuration options Show example(s)
proxy:
image: registry.k8s.io/kube-proxy:v1.32.0 # The container image used in the kube-proxy manifest.mode: ipvs # proxy mode of kube-proxy.# Extra arguments to supply to kube-proxy.extraArgs:
proxy-mode: iptables
# # Disable kube-proxy deployment on cluster bootstrap.# disabled: false
Scheduler server specific configuration options. Show example(s)
scheduler:
image: registry.k8s.io/kube-scheduler:v1.32.0 # The container image used in the scheduler manifest.# Extra arguments to supply to the scheduler.extraArgs:
feature-gates: AllBeta=true
Configures cluster member discovery. Show example(s)
discovery:
enabled: true# Enable the cluster membership discovery feature.# Configure registries used for cluster member discovery.registries:
# Kubernetes registry uses Kubernetes API server to discover cluster members and stores additional informationkubernetes: {}
# Service registry is using an external service to push and pull information about cluster members.service:
endpoint: https://discovery.talos.dev/ # External service endpoint.
Etcd specific configuration options. Show example(s)
etcd:
image: gcr.io/etcd-development/etcd:v3.5.17 # The container image used to create the etcd service.# The `ca` is the root certificate authority of the PKI.ca:
crt: LS0tIEVYQU1QTEUgQ0VSVElGSUNBVEUgLS0t
key: LS0tIEVYQU1QTEUgS0VZIC0tLQ==
# Extra arguments to supply to etcd.extraArgs:
election-timeout: "5000"# # The `advertisedSubnets` field configures the networks to pick etcd advertised IP from.# advertisedSubnets:# - 10.0.0.0/8
External cloud provider configuration. Show example(s)
externalCloudProvider:
enabled: true# Enable external cloud provider.# A list of urls that point to additional manifests for an external cloud provider.manifests:
- https://raw.githubusercontent.com/kubernetes/cloud-provider-aws/v1.20.0-alpha.0/manifests/rbac.yaml
- https://raw.githubusercontent.com/kubernetes/cloud-provider-aws/v1.20.0-alpha.0/manifests/aws-cloud-controller-manager-daemonset.yaml
extraManifests
[]string
A list of urls that point to additional manifests.These will get automatically deployed as part of the bootstrap.Show example(s)
A list of inline Kubernetes manifests.These will get automatically deployed as part of the bootstrap.Show example(s)
inlineManifests:
- name: namespace-ci # Name of the manifest.contents: |- # Manifest contents as a string.apiVersion: v1
kind: Namespace
metadata:
name: ci
Allows running workload on control-plane nodes. Show example(s)
allowSchedulingOnControlPlanes: true
true yes false no
controlPlane
ControlPlaneConfig represents the control plane configuration options.
cluster:
controlPlane:
endpoint: https://1.2.3.4 # Endpoint is the canonical controlplane endpoint, which can be an IP address or a DNS hostname.localAPIServerPort: 443# The port that the API server listens on internally.
Endpoint is the canonical controlplane endpoint, which can be an IP address or a DNS hostname.It is single-valued, and may optionally include a port number.Show example(s)
endpoint: https://1.2.3.4:6443
endpoint: https://cluster1.internal:6443
localAPIServerPort
int
The port that the API server listens on internally.This may be different than the port portion listed in the endpoint field above. The default is 6443.
endpoint
Endpoint represents the endpoint URL parsed out of the machine config.
cluster:
network:
# The CNI used.cni:
name: flannel # Name of CNI to use.dnsDomain: cluster.local # The domain used by Kubernetes DNS.# The pod subnet CIDR.podSubnets:
- 10.244.0.0/16
# The service subnet CIDR.serviceSubnets:
- 10.96.0.0/12
The CNI used.Composed of “name” and “urls”. The “name” key supports the following options: “flannel”, “custom”, and “none”. “flannel” uses Talos-managed Flannel CNI, and that’s the default option. “custom” uses custom manifests that should be provided in “urls”. “none” indicates that Talos will not manage any CNI installation.Show example(s)
cni:
name: custom # Name of CNI to use.# URLs containing manifests to apply for the CNI.urls:
- https://docs.projectcalico.org/archive/v3.20/manifests/canal.yaml
dnsDomain
string
The domain used by Kubernetes DNS.The default is cluster.localShow example(s)
dnsDomain: cluser.local
podSubnets
[]string
The pod subnet CIDR. Show example(s)
podSubnets:
- 10.244.0.0/16
serviceSubnets
[]string
The service subnet CIDR. Show example(s)
serviceSubnets:
- 10.96.0.0/12
cni
CNIConfig represents the CNI configuration options.
cluster:
network:
cni:
name: custom # Name of CNI to use.# URLs containing manifests to apply for the CNI.urls:
- https://docs.projectcalico.org/archive/v3.20/manifests/canal.yaml
Field
Type
Description
Value(s)
name
string
Name of CNI to use.
flannel custom none
urls
[]string
URLs containing manifests to apply for the CNI.Should be present for “custom”, must be empty for “flannel” and “none”.
FlannelCNIConfig represents the Flannel CNI configuration options.
Field
Type
Description
Value(s)
extraArgs
[]string
Extra arguments for ‘flanneld’. Show example(s)
extraArgs:
- --iface-can-reach=192.168.1.1
apiServer
APIServerConfig represents the kube apiserver configuration options.
cluster:
apiServer:
image: registry.k8s.io/kube-apiserver:v1.32.0 # The container image used in the API server manifest.# Extra arguments to supply to the API server.extraArgs:
feature-gates: ServerSideApply=true
http2-max-streams-per-connection: "32"# Extra certificate subject alternative names for the API server's certificate.certSANs:
- 1.2.3.4 - 4.5.6.7# # Configure the API server admission plugins.# admissionControl:# - name: PodSecurity # Name is the name of the admission controller.# # Configuration is an embedded configuration object to be used as the plugin's# configuration:# apiVersion: pod-security.admission.config.k8s.io/v1alpha1# defaults:# audit: restricted# audit-version: latest# enforce: baseline# enforce-version: latest# warn: restricted# warn-version: latest# exemptions:# namespaces:# - kube-system# runtimeClasses: []# usernames: []# kind: PodSecurityConfiguration# # Configure the API server audit policy.# auditPolicy:# apiVersion: audit.k8s.io/v1# kind: Policy# rules:# - level: Metadata# # Configure the API server authorization config. Node and RBAC authorizers are always added irrespective of the configuration.# authorizationConfig:# - type: Webhook # Type is the name of the authorizer. Allowed values are `Node`, `RBAC`, and `Webhook`.# name: webhook # Name is used to describe the authorizer.# # webhook is the configuration for the webhook authorizer.# webhook:# connectionInfo:# type: InClusterConfig# failurePolicy: Deny# matchConditionSubjectAccessReviewVersion: v1# matchConditions:# - expression: has(request.resourceAttributes)# - expression: '!(\''system:serviceaccounts:kube-system\'' in request.groups)'# subjectAccessReviewVersion: v1# timeout: 3s# - type: Webhook # Type is the name of the authorizer. Allowed values are `Node`, `RBAC`, and `Webhook`.# name: in-cluster-authorizer # Name is used to describe the authorizer.# # webhook is the configuration for the webhook authorizer.# webhook:# connectionInfo:# type: InClusterConfig# failurePolicy: NoOpinion# matchConditionSubjectAccessReviewVersion: v1# subjectAccessReviewVersion: v1# timeout: 3s
Field
Type
Description
Value(s)
image
string
The container image used in the API server manifest. Show example(s)
Configure the API server admission plugins. Show example(s)
admissionControl:
- name: PodSecurity # Name is the name of the admission controller.# Configuration is an embedded configuration object to be used as the plugin'sconfiguration:
apiVersion: pod-security.admission.config.k8s.io/v1alpha1
defaults:
audit: restricted
audit-version: latest
enforce: baseline
enforce-version: latest
warn: restricted
warn-version: latest
exemptions:
namespaces:
- kube-system
runtimeClasses: []
usernames: []
kind: PodSecurityConfiguration
auditPolicy
Unstructured
Configure the API server audit policy. Show example(s)
Configure the API server authorization config. Node and RBAC authorizers are always added irrespective of the configuration. Show example(s)
authorizationConfig:
- type: Webhook # Type is the name of the authorizer. Allowed values are `Node`, `RBAC`, and `Webhook`.name: webhook # Name is used to describe the authorizer.# webhook is the configuration for the webhook authorizer.webhook:
connectionInfo:
type: InClusterConfig
failurePolicy: Deny
matchConditionSubjectAccessReviewVersion: v1
matchConditions:
- expression: has(request.resourceAttributes)
- expression: '!(\''system:serviceaccounts:kube-system\'' in request.groups)'
subjectAccessReviewVersion: v1
timeout: 3s
- type: Webhook # Type is the name of the authorizer. Allowed values are `Node`, `RBAC`, and `Webhook`.name: in-cluster-authorizer # Name is used to describe the authorizer.# webhook is the configuration for the webhook authorizer.webhook:
connectionInfo:
type: InClusterConfig
failurePolicy: NoOpinion
matchConditionSubjectAccessReviewVersion: v1
subjectAccessReviewVersion: v1
timeout: 3s
extraVolumes[]
VolumeMountConfig struct describes extra volume mount for the static pods.
Field
Type
Description
Value(s)
hostPath
string
Path on the host. Show example(s)
hostPath: /var/lib/auth
mountPath
string
Path in the container. Show example(s)
mountPath: /etc/kubernetes/auth
readonly
bool
Mount the volume read only. Show example(s)
readonly: true
admissionControl[]
AdmissionPluginConfig represents the API server admission plugin configuration.
cluster:
apiServer:
admissionControl:
- name: PodSecurity # Name is the name of the admission controller.# Configuration is an embedded configuration object to be used as the plugin'sconfiguration:
apiVersion: pod-security.admission.config.k8s.io/v1alpha1
defaults:
audit: restricted
audit-version: latest
enforce: baseline
enforce-version: latest
warn: restricted
warn-version: latest
exemptions:
namespaces:
- kube-system
runtimeClasses: []
usernames: []
kind: PodSecurityConfiguration
Field
Type
Description
Value(s)
name
string
Name is the name of the admission controller.It must match the registered admission plugin name.
configuration
Unstructured
Configuration is an embedded configuration object to be used as the plugin’sconfiguration.
resources
ResourcesConfig represents the pod resources.
Field
Type
Description
Value(s)
requests
Unstructured
Requests configures the reserved cpu/memory resources. Show example(s)
requests:
cpu: 1memory: 1Gi
limits
Unstructured
Limits configures the maximum cpu/memory resources a container can use. Show example(s)
limits:
cpu: 2memory: 2500Mi
authorizationConfig[]
AuthorizationConfigAuthorizerConfig represents the API server authorization config authorizer configuration.
cluster:
apiServer:
authorizationConfig:
- type: Webhook # Type is the name of the authorizer. Allowed values are `Node`, `RBAC`, and `Webhook`.name: webhook # Name is used to describe the authorizer.# webhook is the configuration for the webhook authorizer.webhook:
connectionInfo:
type: InClusterConfig
failurePolicy: Deny
matchConditionSubjectAccessReviewVersion: v1
matchConditions:
- expression: has(request.resourceAttributes)
- expression: '!(\''system:serviceaccounts:kube-system\'' in request.groups)'
subjectAccessReviewVersion: v1
timeout: 3s
- type: Webhook # Type is the name of the authorizer. Allowed values are `Node`, `RBAC`, and `Webhook`.name: in-cluster-authorizer # Name is used to describe the authorizer.# webhook is the configuration for the webhook authorizer.webhook:
connectionInfo:
type: InClusterConfig
failurePolicy: NoOpinion
matchConditionSubjectAccessReviewVersion: v1
subjectAccessReviewVersion: v1
timeout: 3s
Field
Type
Description
Value(s)
type
string
Type is the name of the authorizer. Allowed values are Node, RBAC, and Webhook.
name
string
Name is used to describe the authorizer.
webhook
Unstructured
webhook is the configuration for the webhook authorizer.
controllerManager
ControllerManagerConfig represents the kube controller manager configuration options.
cluster:
controllerManager:
image: registry.k8s.io/kube-controller-manager:v1.32.0 # The container image used in the controller manager manifest.# Extra arguments to supply to the controller manager.extraArgs:
feature-gates: ServerSideApply=true
Field
Type
Description
Value(s)
image
string
The container image used in the controller manager manifest. Show example(s)
VolumeMountConfig struct describes extra volume mount for the static pods.
Field
Type
Description
Value(s)
hostPath
string
Path on the host. Show example(s)
hostPath: /var/lib/auth
mountPath
string
Path in the container. Show example(s)
mountPath: /etc/kubernetes/auth
readonly
bool
Mount the volume read only. Show example(s)
readonly: true
resources
ResourcesConfig represents the pod resources.
Field
Type
Description
Value(s)
requests
Unstructured
Requests configures the reserved cpu/memory resources. Show example(s)
requests:
cpu: 1memory: 1Gi
limits
Unstructured
Limits configures the maximum cpu/memory resources a container can use. Show example(s)
limits:
cpu: 2memory: 2500Mi
proxy
ProxyConfig represents the kube proxy configuration options.
cluster:
proxy:
image: registry.k8s.io/kube-proxy:v1.32.0 # The container image used in the kube-proxy manifest.mode: ipvs # proxy mode of kube-proxy.# Extra arguments to supply to kube-proxy.extraArgs:
proxy-mode: iptables
# # Disable kube-proxy deployment on cluster bootstrap.# disabled: false
Field
Type
Description
Value(s)
disabled
bool
Disable kube-proxy deployment on cluster bootstrap. Show example(s)
disabled: false
image
string
The container image used in the kube-proxy manifest. Show example(s)
image: registry.k8s.io/kube-proxy:v1.32.0
mode
string
proxy mode of kube-proxy.The default is ‘iptables’.
extraArgs
map[string]string
Extra arguments to supply to kube-proxy.
scheduler
SchedulerConfig represents the kube scheduler configuration options.
cluster:
scheduler:
image: registry.k8s.io/kube-scheduler:v1.32.0 # The container image used in the scheduler manifest.# Extra arguments to supply to the scheduler.extraArgs:
feature-gates: AllBeta=true
Field
Type
Description
Value(s)
image
string
The container image used in the scheduler manifest. Show example(s)
cluster:
discovery:
enabled: true# Enable the cluster membership discovery feature.# Configure registries used for cluster member discovery.registries:
# Kubernetes registry uses Kubernetes API server to discover cluster members and stores additional informationkubernetes: {}
# Service registry is using an external service to push and pull information about cluster members.service:
endpoint: https://discovery.talos.dev/ # External service endpoint.
Field
Type
Description
Value(s)
enabled
bool
Enable the cluster membership discovery feature.Cluster discovery is based on individual registries which are configured under the registries field.
EtcdConfig represents the etcd configuration options.
cluster:
etcd:
image: gcr.io/etcd-development/etcd:v3.5.17 # The container image used to create the etcd service.# The `ca` is the root certificate authority of the PKI.ca:
crt: LS0tIEVYQU1QTEUgQ0VSVElGSUNBVEUgLS0t
key: LS0tIEVYQU1QTEUgS0VZIC0tLQ==
# Extra arguments to supply to etcd.extraArgs:
election-timeout: "5000"# # The `advertisedSubnets` field configures the networks to pick etcd advertised IP from.# advertisedSubnets:# - 10.0.0.0/8
Field
Type
Description
Value(s)
image
string
The container image used to create the etcd service. Show example(s)
image: gcr.io/etcd-development/etcd:v3.5.17
ca
PEMEncodedCertificateAndKey
The ca is the root certificate authority of the PKI.It is composed of a base64 encoded crt and key.Show example(s)
The advertisedSubnets field configures the networks to pick etcd advertised IP from. IPs can be excluded from the list by using negative match with !, e.g !10.0.0.0/8. Negative subnet matches should be specified last to filter out IPs picked by positive matches. If not specified, advertised IP is selected as the first routable address of the node.Show example(s)
advertisedSubnets:
- 10.0.0.0/8
listenSubnets
[]string
The listenSubnets field configures the networks for the etcd to listen for peer and client connections. If listenSubnets is not set, but advertisedSubnets is set, listenSubnets defaults to advertisedSubnets.
If neither advertisedSubnets nor listenSubnets is set, listenSubnets defaults to listen on all addresses.
IPs can be excluded from the list by using negative match with !, e.g !10.0.0.0/8. Negative subnet matches should be specified last to filter out IPs picked by positive matches. If not specified, advertised IP is selected as the first routable address of the node.
coreDNS
CoreDNS represents the CoreDNS config values.
cluster:
coreDNS:
image: registry.k8s.io/coredns/coredns:v1.12.0 # The `image` field is an override to the default coredns image.
Field
Type
Description
Value(s)
disabled
bool
Disable coredns deployment on cluster bootstrap.
image
string
The image field is an override to the default coredns image.
cluster:
externalCloudProvider:
enabled: true# Enable external cloud provider.# A list of urls that point to additional manifests for an external cloud provider.manifests:
- https://raw.githubusercontent.com/kubernetes/cloud-provider-aws/v1.20.0-alpha.0/manifests/rbac.yaml
- https://raw.githubusercontent.com/kubernetes/cloud-provider-aws/v1.20.0-alpha.0/manifests/aws-cloud-controller-manager-daemonset.yaml
Field
Type
Description
Value(s)
enabled
bool
Enable external cloud provider.
true yes false no
manifests
[]string
A list of urls that point to additional manifests for an external cloud provider.These will get automatically deployed as part of the bootstrap.Show example(s)
ClusterInlineManifest struct describes inline bootstrap manifests for the user.
cluster:
inlineManifests:
- name: namespace-ci # Name of the manifest.contents: |- # Manifest contents as a string.apiVersion: v1
kind: Namespace
metadata:
name: ci
Field
Type
Description
Value(s)
name
string
Name of the manifest.Name should be unique.Show example(s)
Admin kubeconfig certificate lifetime (default is 1 year).Field format accepts any Go time.Duration format (‘1h’ for one hour, ‘10m’ for ten minutes).
5.4 - Kernel
Linux kernel reference.
Commandline Parameters
Talos supports a number of kernel commandline parameters. Some are required for
it to operate. Others are optional and useful in certain circumstances.
Several of these are enforced by the Kernel Self Protection Project KSPP.
Required parameters:
talos.platform: can be one of akamai, aws, azure, container, digitalocean, equinixMetal, gcp, hcloud, metal, nocloud, openstack, oracle, scaleway, upcloud, vmware or vultr
slab_nomerge: required by KSPP
pti=on: required by KSPP
Recommended parameters:
init_on_alloc=1: advised by KSPP, enabled by default in kernel config
init_on_free=1: advised by KSPP, enabled by default in kernel config
Available Talos-specific parameters
ip
Initial configuration of the interface, routes, DNS, NTP servers (multiple ip= kernel parameters are accepted).
Talos will use the configuration supplied via the kernel parameter as the initial network configuration.
This parameter is useful in the environments where DHCP doesn’t provide IP addresses or when default DNS and NTP servers should be overridden
before loading machine configuration.
Partial configuration can be applied as well, e.g. ip=:::::::<dns0-ip>:<dns1-ip>:<ntp0-ip> sets only the DNS and NTP servers.
IPv6 addresses can be specified by enclosing them in the square brackets, e.g. ip=[2001:db8::a]:[2001:db8::b]:[fe80::1]::controlplane1:eth1::[2001:4860:4860::6464]:[2001:4860:4860::64]:[2001:4860:4806::].
<netmask> can use either an IP address notation (IPv4: 255.255.255.0, IPv6: [ffff:ffff:ffff:ffff::0]), or simply a number of one bits in the netmask (24).
<device> can be traditional interface naming scheme eth0, eth1 or enx<MAC>, example: enx78e7d1ea46da
DHCP can be enabled by setting <autoconf> to dhcp, example: ip=:::::eth0.3:dhcp.
Alternative syntax is ip=eth0.3:dhcp.
Talos will use the bond= kernel parameter if supplied to set the initial bond configuration.
This parameter is useful in environments where the switch ports are suspended if the machine doesn’t setup a LACP bond.
If only the bond name is supplied, the bond will be created with eth0 and eth1 as slaves and bond mode set as balance-rr
All these below configurations are equivalent:
bond=bond0
bond=bond0:
bond=bond0::
bond=bond0:::
bond=bond0:eth0,eth1
bond=bond0:eth0,eth1:balance-rr
An example of a bond configuration with all options specified:
This will create a bond interface named bond1 with eth3 and eth4 as slaves and set the bond mode to 802.3ad, the transmit hash policy to layer2+3 and bond interface MTU to 1450.
Talos will use the vlan= kernel parameter if supplied to set the initial vlan configuration.
This parameter is useful in environments where the switch ports are VLAN tagged with no native VLAN.
Only one vlan can be configured at this stage.
An example of a vlan configuration including static ip configuration:
This will create a vlan interface named eth0.100 with eth0 as the underlying interface and set the vlan id to 100 with static IP 172.20.0.2/24 and 172.20.0.1 as default gateway.
net.ifnames=0
Disable the predictable network interface names by specifying net.ifnames=0 on the kernel command line.
panic
The amount of time to wait after a panic before a reboot is issued.
Talos will always reboot if it encounters an unrecoverable error.
However, when collecting debug information, it may reboot too quickly for
humans to read the logs.
This option allows the user to delay the reboot to give time to collect debug
information from the console screen.
A value of 0 disables automatic rebooting entirely.
talos.config
The URL at which the machine configuration data may be found (only for metal platform, with the kernel parameter talos.platform=metal).
This parameter supports variable substitution inside URL query values for the following case-insensitive placeholders:
${uuid} the SMBIOS UUID
${serial} the SMBIOS Serial Number
${mac} the MAC address of the first network interface attaining link state up
For backwards compatibility we insert the system UUID into the query parameter uuid if its value is empty. As in
http://example.com/metadata?uuid= => http://example.com/metadata?uuid=40dcbd19-3b10-444e-bfff-aaee44a51fda
metal-iso
When the kernel parameter talos.config=metal-iso is set, Talos will attempt to load the machine configuration from any block device with a filesystem label of metal-iso.
Talos will look for a file named config.yaml in the root of the filesystem.
For example, such ISO filesystem can be created with:
The kernel parameter talos.config.inline can be used to provide initial minimal machine configuration directly on the kernel command line, when other means of providing the configuration are not available.
The machine configuration should be zstd compressed and base64-encoded to be passed as a kernel parameter.
Note: The kernel command line has a limited size (4096 bytes), so this method is only suitable for small configuration documents.
One such example is to provide a custom CA certificate via TrustedRootsConfig in the machine configuration:
The board name, if Talos is being used on an ARM64 SBC.
Supported boards are:
bananapi_m64: Banana Pi M64
libretech_all_h3_cc_h5: Libre Computer ALL-H3-CC
rock64: Pine64 Rock64
…
talos.hostname
The hostname to be used.
The hostname is generally specified in the machine config.
However, in some cases, the DHCP server needs to know the hostname
before the machine configuration has been acquired.
Unless specifically required, the machine configuration should be used
instead.
talos.shutdown
The type of shutdown to use when Talos is told to shutdown.
Valid options are:
halt
poweroff
talos.network.interface.ignore
A network interface which should be ignored and not configured by Talos.
Before a configuration is applied (early on each boot), Talos attempts to
configure each network interface by DHCP.
If there are many network interfaces on the machine which have link but no
DHCP server, this can add significant boot delays.
This option may be specified multiple times for multiple network interfaces.
talos.experimental.wipe
Resets the disk before starting up the system.
Valid options are:
system resets system disk.
system:EPHEMERAL,STATE resets ephemeral and state partitions. Doing this reverts Talos into maintenance mode.
talos.unified_cgroup_hierarchy
Deprecated: From the 1.10 release it is planned that cgroupsv1 will only be supported in the container mode.
Talos defaults to always using the unified cgroup hierarchy (cgroupsv2), but cgroupsv1
can be forced with talos.unified_cgroup_hierarchy=0.
Note: cgroupsv1 is deprecated and it should be used only for compatibility with workloads which don’t support cgroupsv2 yet.
talos.dashboard.disabled
By default, Talos redirects kernel logs to virtual console /dev/tty1 and starts the dashboard on /dev/tty2,
then switches to the dashboard tty.
If you set talos.dashboard.disabled=1, this behavior will be disabled.
Kernel logs will be sent to the currently active console and the dashboard will not be started.
It is set to be 1 by default on SBCs.
talos.environment
Each value of the argument sets a default environment variable.
The expected format is key=value.
The time in Go duration format to wait for devices to settle before starting the boot process.
By default, Talos waits for udevd to scan and settle, but with some RAID controllers udevd might
report settled devices before they are actually ready.
Adding this kernel argument provides extra settle time on top of udevd settle time.
The maximum value is 10m (10 minutes).
Example:
talos.device.settle_time=3m
talos.halt_if_installed
If set to 1, Talos will pause the boot sequence and keeps printing a message until the boot timeout is reached if it detects that it is already installed.
This is useful if booting from ISO/PXE and you want to prevent the machine accidentally booting from the ISO/PXE after installation to the disk.
6 - Learn More
6.1 - Philosophy
Learn about the philosophy behind the need for Talos Linux.
Distributed
Talos is intended to be operated in a distributed manner: it is built for a high-availability dataplane first.
Its etcd cluster is built in an ad-hoc manner, with each appointed node joining on its own directive (with proper security validations enforced, of course).
Like Kubernetes, workloads are intended to be distributed across any number of compute nodes.
There should be no single points of failure, and the level of required coordination is as low as each platform allows.
Immutable
Talos takes immutability very seriously.
Talos itself, even when installed on a disk, always runs from a SquashFS image, meaning that even if a directory is mounted to be writable, the image itself is never modified.
All images are signed and delivered as single, versioned files.
We can always run integrity checks on our image to verify that it has not been modified.
While Talos does allow a few, highly-controlled write points to the filesystem, we strive to make them as non-unique and non-critical as possible.
We call the writable partition the “ephemeral” partition precisely because we want to make sure none of us ever uses it for unique, non-replicated, non-recreatable data.
Thus, if all else fails, we can always wipe the disk and get back up and running.
Minimal
We are always trying to reduce Talos’ footprint.
Because nearly the entire OS is built from scratch in Go, we are
in a good position.
We have no shell.
We have no SSH.
We have none of the GNU utilities, not even a rollup tool such as busybox.
Everything in Talos is there because it is necessary, and
nothing is included which isn’t.
As a result, the OS right now produces a SquashFS image size of less than 80 MB.
Ephemeral
Everything Talos writes to its disk is either replicated or reconstructable.
Since the controlplane is highly available, the loss of any node will cause
neither service disruption nor loss of data.
No writes are even allowed to the vast majority of the filesystem.
We even call the writable partition “ephemeral” to keep this idea always in
focus.
Secure
Talos has always been designed with security in mind.
With its immutability, its minimalism, its signing, and its componenture, we are
able to simply bypass huge classes of vulnerabilities.
Moreover, because of the way we have designed Talos, we are able to take
advantage of a number of additional settings, such as the recommendations of the Kernel Self Protection Project (kspp) and completely disabling dynamic modules.
There are no passwords in Talos.
All networked communication is encrypted and key-authenticated.
The Talos certificates are short-lived and automatically-rotating.
Kubernetes is always constructed with its own separate PKI structure which is
enforced.
Declarative
Everything which can be configured in Talos is done through a single YAML
manifest.
There is no scripting and no procedural steps.
Everything is defined by the one declarative YAML file.
This configuration includes that of both Talos itself and the Kubernetes which
it forms.
This is achievable because Talos is tightly focused to do one thing: run
Kubernetes, in the easiest, most secure, most reliable way it can.
Not based on X distro
Talos Linux isn’t based on any other distribution.
We think of ourselves as being the second-generation of
container-optimised operating systems, where things like CoreOS, Flatcar, and Rancher represent the first generation (but the technology is not derived from any of those.)
Talos Linux is actually a ground-up rewrite of the userspace, from PID 1.
We run the Linux kernel, but everything downstream of that is our own custom
code, written in Go, rigorously-tested, and published as an immutable,
integrated image.
The Linux kernel launches what we call machined, for instance, not systemd.
There is no systemd on our system.
There are no GNU utilities, no shell, no SSH, no packages, nothing you could associate with
any other distribution.
An Operating System designed for Kubernetes
Technically, Talos Linux installs to a computer like any other operating system.
Unlike other operating systems, Talos is not meant to run alone, on a
single machine.
A design goal of Talos Linux is eliminating the management
of individual nodes as much as possible.
In order to do that, Talos Linux operates as a cluster of machines, with lots of
checking and coordination between them, at all levels.
There is only a cluster.
Talos is meant to do one thing: maintain a Kubernetes cluster, and it does this
very, very well.
The entirety of the configuration of any machine is specified by a single
configuration file, which can often be the same configuration file used
across many machines.
Much like a biological system, if some component misbehaves, just cut it out and
let a replacement grow.
Rebuilds of Talos are remarkably fast, whether they be new machines, upgrades,
or reinstalls.
Never get hung up on an individual machine.
6.2 - Architecture
Learn the system architecture of Talos Linux itself.
Talos is designed to be atomic in deployment and modular in composition.
It is atomic in that the entirety of Talos is distributed as a
single, self-contained image, which is versioned, signed, and immutable.
It is modular in that it is composed of many separate components
which have clearly defined gRPC interfaces which facilitate internal flexibility
and external operational guarantees.
All of the main Talos components communicate with each other by gRPC, through a socket on the local machine.
This imposes a clear separation of concerns and ensures that changes over time which affect the interoperation of components are a part of the public git record.
The benefit is that each component may be iterated and changed as its needs dictate, so long as the external API is controlled.
This is a key component in reducing coupling and maintaining modularity.
File system partitions
Talos uses these partitions with the following labels:
EFI - stores EFI boot data.
BIOS - used for GRUB’s second stage boot.
BOOT - used for the boot loader, stores initramfs and kernel data.
META - stores metadata about the talos node, such as node id’s.
STATE - stores machine configuration, node identity data for cluster discovery and KubeSpan info
EPHEMERAL - stores ephemeral state information, mounted at /var
The File System
One of the unique design decisions in Talos is the layout of the root file system.
There are three “layers” to the Talos root file system.
At its core the rootfs is a read-only squashfs.
The squashfs is then mounted as a loop device into memory.
This provides Talos with an immutable base.
The next layer is a set of tmpfs file systems for runtime specific needs.
Aside from the standard pseudo file systems such as /dev, /proc, /run, /sys and /tmp, a special /system is created for internal needs.
One reason for this is that we need special files such as /etc/hosts, and /etc/resolv.conf to be writable (remember that the rootfs is read-only).
For example, at boot Talos will write /system/etc/hosts and then bind mount it over /etc/hosts.
This means that instead of making all of /etc writable, Talos only makes very specific files writable under /etc.
All files under /system are completely recreated on each boot.
For files and directories that need to persist across boots, Talos creates overlayfs file systems.
The /etc/kubernetes is a good example of this.
Directories like this are overlayfs backed by an XFS file system mounted at /var.
The /var directory is owned by Kubernetes with the exception of the above overlayfs file systems.
This directory is writable and used by etcd (in the case of control plane nodes), the kubelet, and the CRI (containerd).
Its content survives machine reboots, but it is wiped and lost on machine upgrades and resets, unless the
--preserve option of talosctl upgrade or the
--system-labels-to-wipe option of talosctl reset
is used.
6.3 - Components
Understand the system components that make up Talos Linux.
In this section, we discuss the various components that underpin Talos.
Components
Talos Linux and Kubernetes are tightly integrated.
In the following, the focus is on the Talos Linux specific components.
Component
Description
apid
When interacting with Talos, the gRPC API endpoint you interact with directly is provided by apid. apid acts as the gateway for all component interactions and forwards the requests to machined.
containerd
An industry-standard container runtime with an emphasis on simplicity, robustness, and portability. To learn more, see the containerd website.
machined
Talos replacement for the traditional Linux init-process. Specially designed to run Kubernetes and does not allow starting arbitrary user services.
kernel
The Linux kernel included with Talos is configured according to the recommendations outlined in the Kernel Self Protection Project.
trustd
To run and operate a Kubernetes cluster, a certain level of trust is required. Based on the concept of a ‘Root of Trust’, trustd is a simple daemon responsible for establishing trust within the system.
udevd
Implementation of eudev into machined. eudev is Gentoo’s fork of udev, systemd’s device file manager for the Linux kernel. It manages device nodes in /dev and handles all user space actions when adding or removing devices. To learn more, see the Gentoo Wiki.
apid
When interacting with Talos, the gRPC api endpoint you will interact with directly is apid.
Apid acts as the gateway for all component interactions.
Apid provides a mechanism to route requests to the appropriate destination when running on a control plane node.
We’ll use some examples below to illustrate what apid is doing.
When a user wants to interact with a Talos component via talosctl, there are two flags that control the interaction with apid.
The -e | --endpoints flag specifies which Talos node ( via apid ) should handle the connection.
Typically this is a public-facing server.
The -n | --nodes flag specifies which Talos node(s) should respond to the request.
If --nodes is omitted, the first endpoint will be used.
Note: Typically, there will be an endpoint already defined in the Talos config file.
Optionally, nodes can be included here as well.
For example, if a user wants to interact with machined, a command like talosctl -e cluster.talos.dev memory may be used.
$ talosctl -e cluster.talos.dev memory
NODE TOTAL USED FREE SHARED BUFFERS CACHE AVAILABLE
cluster.talos.dev 7938176823901455337246571
In this case, talosctl is interacting with apid running on cluster.talos.dev and forwarding the request to the machined api.
If we wanted to extend our example to retrieve memory from another node in our cluster, we could use the command talosctl -e cluster.talos.dev -n node02 memory.
$ talosctl -e cluster.talos.dev -n node02 memory
NODE TOTAL USED FREE SHARED BUFFERS CACHE AVAILABLE
node02 7938176823901455337246571
The apid instance on cluster.talos.dev receives the request and forwards it to apid running on node02, which forwards the request to the machined api.
We can further extend our example to retrieve memory for all nodes in our cluster by appending additional -n node flags or using a comma separated list of nodes ( -n node01,node02,node03 ):
$ talosctl -e cluster.talos.dev -n node01 -n node02 -n node03 memory
NODE TOTAL USED FREE SHARED BUFFERS CACHE AVAILABLE
node01 793887140711374929457042node02 25784414408190796181384952589227492node03 257844183025518612549777254556
The apid instance on cluster.talos.dev receives the request and forwards it to node01, node02, and node03, which then forwards the request to their local machined api.
containerd
Containerd provides the container runtime to launch workloads on Talos and Kubernetes.
Talos services are namespaced under the system namespace in containerd, whereas the Kubernetes services are namespaced under the k8s.io namespace.
machined
A common theme throughout the design of Talos is minimalism.
We believe strongly in the UNIX philosophy that each program should do one job well.
The init included in Talos is one example of this, and we are calling it “machined”.
We wanted to create a focused init that had one job - run Kubernetes.
To that extent, machined is relatively static in that it does not allow for arbitrary user-defined services.
Only the services necessary to run Kubernetes and manage the node are available.
This includes:
The machined process handles all machine configuration, API handling, resource and controller management.
kernel
The Linux kernel included with Talos is configured according to the recommendations outlined in the Kernel Self Protection Project (KSPP).
trustd
Security is one of the highest priorities within Talos.
To run a Kubernetes cluster, a certain level of trust is required to operate a cluster.
For example, orchestrating the bootstrap of a highly available control plane requires sensitive PKI data distribution.
To that end, we created trustd.
Based on a Root of Trust concept, trustd is a simple daemon responsible for establishing trust within the system.
Once trust is established, various methods become available to the trustee.
For example, it can accept a write request from another node to place a file on disk.
Additional methods and capabilities will be added to the trustd component to support new functionality in the rest of the Talos environment.
udevd
Udevd handles the kernel device notifications and sets up the necessary links in /dev.
6.4 - Control Plane
Understand the Kubernetes Control Plane.
This guide provides information about the Kubernetes control plane, and details on how Talos runs and bootstraps the Kubernetes control plane.
What is a control plane node?
A control plane node is a node which:
runs etcd, the Kubernetes database
runs the Kubernetes control plane
kube-apiserver
kube-controller-manager
kube-scheduler
serves as an administrative proxy to the worker nodes
These nodes are critical to the operation of your cluster.
Without control plane nodes, Kubernetes will not respond to changes in the
system, and certain central services may not be available.
Talos nodes which have .machine.type of controlplane are control plane nodes.
(check via talosctl get member)
Control plane nodes are tainted by default to prevent workloads from being scheduled onto them.
This is both to protect the control plane from workloads consuming resources and starving the control plane processes, and also to reduce the risk of a vulnerability exposes the control plane’s credentials to a workload.
The Control Plane and Etcd
A critical design concept of Kubernetes (and Talos) is the etcd database.
Properly managed (which Talos Linux does), etcd should never have split brain or noticeable down time.
In order to do this, etcd maintains the concept of “membership” and of
“quorum”.
To perform any operation, read or write, the database requires
quorum.
That is, a majority of members must agree on the current leader, and absenteeism (members that are down, or not reachable)
counts as a negative.
For example, if there are three members, at least two out
of the three must agree on the current leader.
If two disagree or fail to answer, the etcd database will lock itself
until quorum is achieved in order to protect the integrity of
the data.
This design means that having two controlplane nodes is worse than having only one, because if either goes down, your database will lock (and the chance of one of two nodes going down is greater than the chance of just a single node going down).
Similarly, a 4 node etcd cluster is worse than a 3 node etcd cluster - a 4 node cluster requires 3 nodes to be up to achieve quorum (in order to have a majority), while the 3 node cluster requires 2 nodes:
i.e. both can support a single node failure and keep running - but the chance of a node failing in a 4 node cluster is higher than that in a 3 node cluster.
Another note about etcd: due to the need to replicate data amongst members, performance of etcd decreases as the cluster scales.
A 5 node cluster can commit about 5% less writes per second than a 3 node cluster running on the same hardware.
Recommendations for your control plane
Run your clusters with three or five control plane nodes.
Three is enough for most use cases.
Five will give you better availability (in that it can tolerate two node failures simultaneously), but cost you more both in the number of nodes required, and also as each node may require more hardware resources to offset the performance degradation seen in larger clusters.
Implement good monitoring and put processes in place to deal with a failed node in a timely manner (and test them!)
Even with robust monitoring and procedures for replacing failed nodes in place, backup etcd and your control plane node configuration to guard against unforeseen disasters.
Monitor the performance of your etcd clusters.
If etcd performance is slow, vertically scale the nodes, not the number of nodes.
If a control plane node fails, remove it first, then add the replacement node.
(This ensures that the failed node does not “vote” when adding in the new node, minimizing the chances of a quorum violation.)
If replacing a node that has not failed, add the new one, then remove the old.
Bootstrapping the Control Plane
Every new cluster must be bootstrapped only once, which is achieved by telling a single control plane node to initiate the bootstrap.
Bootstrapping itself does not do anything with Kubernetes.
Bootstrapping only tells etcd to form a cluster, so don’t judge the success of
a bootstrap by the failure of Kubernetes to start.
Kubernetes relies on etcd, so bootstrapping is required, but it is not
sufficient for Kubernetes to start.
If your Kubernetes cluster fails to form for other reasons (say, a bad
configuration option or unavailable container repository), if the bootstrap API
call returns successfully, you do NOT need to bootstrap again:
just fix the config or let Kubernetes retry.
High-level Overview
Talos cluster bootstrap flow:
The etcd service is started on control plane nodes.
Instances of etcd on control plane nodes build the etcd cluster.
The kubelet service is started.
Control plane components are started as static pods via the kubelet, and the kube-apiserver component connects to the local (running on the same node) etcd instance.
The kubelet issues client certificate using the bootstrap token using the control plane endpoint (via kube-apiserver and kube-controller-manager).
The kubelet registers the node in the API server.
Kubernetes control plane schedules pods on the nodes.
Cluster Bootstrapping
All nodes start the kubelet service.
The kubelet tries to contact the control plane endpoint, but as it is not up yet, it keeps retrying.
One of the control plane nodes is chosen as the bootstrap node, and promoted using the bootstrap API (talosctl bootstrap).
The bootstrap node initiates the etcd bootstrap process by initializing etcd as the first member of the cluster.
Once etcd is bootstrapped, the bootstrap node has no special role and acts the same way as other control plane nodes.
Services etcd on non-bootstrap nodes try to get Endpoints resource via control plane endpoint, but that request fails as control plane endpoint is not up yet.
As soon as etcd is up on the bootstrap node, static pod definitions for the Kubernetes control plane components (kube-apiserver, kube-controller-manager, kube-scheduler) are rendered to disk.
The kubelet service on the bootstrap node picks up the static pod definitions and starts the Kubernetes control plane components.
As soon as kube-apiserver is launched, the control plane endpoint comes up.
The bootstrap node acquires an etcd mutex and injects the bootstrap manifests into the API server.
The set of the bootstrap manifests specify the Kubernetes join token and kubelet CSR auto-approval.
The kubelet service on all the nodes is now able to issue client certificates for themselves and register nodes in the API server.
Other bootstrap manifests specify additional resources critical for Kubernetes operations (i.e. CNI, PSP, etc.)
The etcd service on non-bootstrap nodes is now able to discover other members of the etcd cluster via the Kubernetes Endpoints resource.
The etcd cluster is now formed and consists of all control plane nodes.
All control plane nodes render static pod manifests for the control plane components.
Each node now runs a full set of components to make the control plane HA.
The kubelet service on worker nodes is now able to issue the client certificate and register itself with the API server.
Scaling Up the Control Plane
When new nodes are added to the control plane, the process is the same as the bootstrap process above: the etcd service discovers existing members of the control plane via the
control plane endpoint, joins the etcd cluster, and the control plane components are scheduled on the node.
Scaling Down the Control Plane
Scaling down the control plane involves removing a node from the cluster.
The most critical part is making sure that the node which is being removed leaves the etcd cluster.
The recommended way to do this is to use:
talosctl -n IP.of.node.to.remove reset
kubectl delete node
When using talosctl reset command, the targeted control plane node leaves the etcd cluster as part of the reset sequence, and its disks are erased.
Upgrading Talos on Control Plane Nodes
When a control plane node is upgraded, Talos leaves etcd, wipes the system disk, installs a new version of itself, and reboots.
The upgraded node then joins the etcd cluster on reboot.
So upgrading a control plane node is equivalent to scaling down the control plane node followed by scaling up with a new version of Talos.
6.5 - Image Factory
Image Factory generates customized Talos Linux images based on configured schematics.
The Image Factory provides a way to download Talos Linux artifacts.
Artifacts can be generated with customizations defined by a “schematic”.
A schematic can be applied to any of the versions of Talos Linux offered by the Image Factory to produce a “model”.
The following assets are provided:
ISO
kernel, initramfs, and kernel command line
UKI
disk images in various formats (e.g. AWS, GCP, VMware, etc.)
See Boot Assets for an example of how to use the Image Factory to boot and upgrade Talos on different platforms.
Full API documentation for the Image Factory is available at GitHub.
Schematics
Schematics are YAML files that define customizations to be applied to a Talos Linux image.
Schematics can be applied to any of the versions of Talos Linux offered by the Image Factory to produce a “model”, which is a Talos Linux image with the customizations applied.
Schematics are content-addressable, that is, the content of the schematic is used to generate a unique ID.
The schematic should be uploaded to the Image Factory first, and then the ID can be used to reference the schematic in a model.
Schematics can be generated using the Image Factory UI, or using the Image Factory API:
customization:
extraKernelArgs: # optional - vga=791
meta: # optional, allows to set initial Talos META - key: 0xavalue: "{}"systemExtensions: # optionalofficialExtensions: # optional - siderolabs/gvisor
- siderolabs/amd-ucode
overlay: # optionalname: rpi_generic
image: siderolabs/sbc-raspberry-pi
options: # optional, any valid yaml, depends on the overlay implementationdata: "mydata"
The “vanilla” schematic is:
customization:
and has an ID of 376567988ad370138ad8b2698212367b8edcb69b5fd68c80be1f2ec7d603b4ba.
The schematic can be applied by uploading it to the Image Factory:
curl -X POST --data-binary @schematic.yaml https://factory.talos.dev/schematics
As the schematic is content-addressable, the same schematic can be uploaded multiple times, and the Image Factory will return the same ID.
Models
Models are Talos Linux images with customizations applied.
The inputs to generate a model are:
schematic ID
Talos Linux version
model type (e.g. ISO, UKI, etc.)
architecture (e.g. amd64, arm64)
various model type specific options (e.g. disk image format, disk image size, etc.)
Frontends
Image Factory provides several frontends to retrieve models:
HTTP frontend to download models (e.g. download an ISO or a disk image)
PXE frontend to boot bare-metal machines (PXE script references kernel/initramfs from HTTP frontend)
Registry frontend to fetch customized installer images (for initial Talos Linux installation and upgrades)
The links to different models are available in the Image Factory UI, and a full list of possible models is documented at GitHub.
The installer image can be used to install Talos Linux on a bare-metal machine, or to upgrade an existing Talos Linux installation.
As the Talos version and schematic ID can be changed, via an upgrade process, the installer image can be used to upgrade to any version of Talos Linux, or replace a set of installed system extensions.
UI
The Image Factory UI is available at https://factory.talos.dev.
The UI provides a way to list supported Talos Linux versions, list of system extensions available for each release, and a way to generate schematic based on the selected system extensions.
The UI operations are equivalent to API operations.
Find Schematic ID from Talos Installation
Image Factory always appends “virtual” system extension with the version matching schematic ID used to generate the model.
So, for any running Talos Linux instance the schematic ID can be found by looking at the list of system extensions:
$ talosctl get extensions
NAMESPACE TYPE ID VERSION NAME VERSION
runtime ExtensionStatus 01 schematic 376567988ad370138ad8b2698212367b8edcb69b5fd68c80be1f2ec7d603b4ba
Restrictions
Some models don’t include every customization of the schematic:
installer and initramfs images only support system extensions (kernel args and META are ignored)
kernel assets don’t depend on the schematic
Other models have full support for all customizations:
any disk image format
ISO, PXE boot script
When installing Talos Linux using ISO/PXE boot, Talos will be installed on the disk using the installer image, so the installer image in the machine configuration
should be using the same schematic as the ISO/PXE boot image.
Some system extensions are not available for all Talos Linux versions, so an attempt to generate a model with an unsupported system extension will fail.
List of supported Talos versions and supported system extensions for each version is available in the Image Factory UI and API.
Under the Hood
Image Factory is based on the Talos imager container which provides both the Talos base boot assets, and the ability to generate custom assets based on a configuration.
Image Factory manages a set of imager container images to acquire base Talos Linux boot assets (kernel, initramfs), a set of Talos Linux system extension images, and a set of schematics.
When a model is requested, Image Factory uses the imager container to generate the requested assets based on the schematic and the Talos Linux version.
Security
Image Factory verifies signatures of all source container images fetched:
imager container images (base boot assets)
extensions system extensions catalogs
installer contianer images (base installer layer)
Talos Linux system extension images
Internally, Image Factory caches generated boot assets and signs all cached images using a private key.
Image Factory verifies the signature of the cached images before serving them to clients.
Image Factory signs generated installer images, and verifies the signature of the installer images before serving them to clients.
Image Factory does not provide a way to list all schematics, as schematics may contain sensitive information (e.g. private kernel boot arguments).
As the schematic ID is content-addressable, it is not possible to guess the ID of a schematic without knowing the content of the schematic.
Running your own Image Factory
Image Factory can be deployed on-premises to provide in-house asset generation.
Image Factory requires following components:
an OCI registry to store schematics (private)
an OCI registry to store cached assets (private)
an OCI registry to store installer images (should allow public read-only access)
a container image signing key: ECDSA P-256 private key in PEM format
Image Factory is configured using command line flags, use --help to see a list of available flags.
Image Factory should be configured to use proper authentication to push to the OCI registries:
by mounting proper credentials via ~/.docker/config.json
by supplying GITHUB_TOKEN (for ghcr.io)
Image Factory performs HTTP redirects to the public registry endpoint for installer images, so the public endpoint
should be available to Talos Linux machines to pull the installer images.
6.6 - Controllers and Resources
Discover how Talos Linux uses the concepts on Controllers and Resources.
Talos implements concepts of resources and controllers to facilitate internal operations of the operating system.
Talos resources and controllers are very similar to Kubernetes resources and controllers, but there are some differences.
The content of this document is not required to operate Talos, but it is useful for troubleshooting.
Starting with Talos 0.9, most of the Kubernetes control plane bootstrapping and operations is implemented via controllers and resources which allows Talos to be reactive to configuration changes, environment changes (e.g. time sync).
Resources
A resource captures a piece of system state.
Each resource belongs to a “Type” which defines resource contents.
Resource state can be split in two parts:
metadata: fixed set of fields describing resource - namespace, type, ID, etc.
spec: contents of the resource (depends on resource type).
Resource is uniquely identified by (namespace, type, id).
Namespaces provide a way to avoid conflicts on duplicate resource IDs.
At the moment of this writing, all resources are local to the node and stored in memory.
So on every reboot resource state is rebuilt from scratch (the only exception is MachineConfig resource which reflects current machine config).
Controllers
Controllers run as independent lightweight threads in Talos.
The goal of the controller is to reconcile the state based on inputs and eventually update outputs.
A controller can have any number of resource types (and namespaces) as inputs.
In other words, it watches specified resources for changes and reconciles when these changes occur.
A controller might also have additional inputs: running reconcile on schedule, watching etcd keys, etc.
A controller has a single output: a set of resources of fixed type in a fixed namespace.
Only one controller can manage resource type in the namespace, so conflicts are avoided.
Querying Resources
Talos CLI tool talosctl provides read-only access to the resource API which includes getting specific resource,
listing resources and watching for changes.
Talos stores resources describing resource types and namespaces in meta namespace:
$ talosctl get resourcedefinitions
NODE NAMESPACE TYPE ID VERSION
172.20.0.2 meta ResourceDefinition bootstrapstatuses.v1alpha1.talos.dev 1172.20.0.2 meta ResourceDefinition etcdsecrets.secrets.talos.dev 1172.20.0.2 meta ResourceDefinition kubernetescontrolplaneconfigs.config.talos.dev 1172.20.0.2 meta ResourceDefinition kubernetessecrets.secrets.talos.dev 1172.20.0.2 meta ResourceDefinition machineconfigs.config.talos.dev 1172.20.0.2 meta ResourceDefinition machinetypes.config.talos.dev 1172.20.0.2 meta ResourceDefinition manifests.kubernetes.talos.dev 1172.20.0.2 meta ResourceDefinition manifeststatuses.kubernetes.talos.dev 1172.20.0.2 meta ResourceDefinition namespaces.meta.cosi.dev 1172.20.0.2 meta ResourceDefinition resourcedefinitions.meta.cosi.dev 1172.20.0.2 meta ResourceDefinition rootsecrets.secrets.talos.dev 1172.20.0.2 meta ResourceDefinition secretstatuses.kubernetes.talos.dev 1172.20.0.2 meta ResourceDefinition services.v1alpha1.talos.dev 1172.20.0.2 meta ResourceDefinition staticpods.kubernetes.talos.dev 1172.20.0.2 meta ResourceDefinition staticpodstatuses.kubernetes.talos.dev 1172.20.0.2 meta ResourceDefinition timestatuses.v1alpha1.talos.dev 1
$ talosctl get namespaces
NODE NAMESPACE TYPE ID VERSION
172.20.0.2 meta Namespace config 1172.20.0.2 meta Namespace controlplane 1172.20.0.2 meta Namespace meta 1172.20.0.2 meta Namespace runtime 1172.20.0.2 meta Namespace secrets 1
Most of the time namespace flag (--namespace) can be omitted, as ResourceDefinition contains default
namespace which is used if no namespace is given:
Resource definition also contains type aliases which can be used interchangeably with canonical resource name:
$ talosctl get ns config
NODE NAMESPACE TYPE ID VERSION
172.20.0.2 meta Namespace config 1
Output
Command talosctl get supports following output modes:
table (default) prints resource list as a table
yaml prints pretty formatted resources with details, including full metadata spec.
This format carries most details from the backend resource (e.g. comments in MachineConfig resource)
json prints same information as yaml, some additional details (e.g. comments) might be lost.
This format is useful for automated processing with tools like jq.
Watching Changes
If flag --watch is appended to the talosctl get command, the command switches to watch mode.
If list of resources was requested, talosctl prints initial contents of the list and then appends resource information for every change:
$ talosctl get svc -w
NODE * NAMESPACE TYPE ID VERSION RUNNING HEALTHY
172.20.0.2 + runtime Service timed 2truetrue172.20.0.2 + runtime Service trustd 2truetrue172.20.0.2 + runtime Service udevd 2truetrue172.20.0.2 - runtime Service timed 2truetrue172.20.0.2 + runtime Service timed 1truefalse172.20.0.2 runtime Service timed 2truetrue
Column * specifies event type:
+ is created
- is deleted
is updated
In YAML/JSON output, field event is added to the resource representation to describe the event type.
Examples
Getting machine config:
$ talosctl get machineconfig -o yaml
node: 172.20.0.2
metadata:
namespace: config
type: MachineConfigs.config.talos.dev
id: v1alpha1
version: 2 phase: running
spec:
version: v1alpha1 # Indicates the schema used to decode the contents. debug: false# Enable verbose logging to the console. persist: true# Indicates whether to pull the machine config upon every boot.# Provides machine specific configuration options....
Getting control plane static pod statuses:
$ talosctl get staticpodstatus
NODE NAMESPACE TYPE ID VERSION READY
172.20.0.2 controlplane StaticPodStatus kube-system/kube-apiserver-talos-default-controlplane-1 3 True
172.20.0.2 controlplane StaticPodStatus kube-system/kube-controller-manager-talos-default-controlplane-1 3 True
172.20.0.2 controlplane StaticPodStatus kube-system/kube-scheduler-talos-default-controlplane-1 4 True
Talos network configuration subsystem is powered by COSI.
Talos translates network configuration from multiple sources: machine configuration, cloud metadata, network automatic configuration (e.g. DHCP) into COSI resources.
Network configuration and network state can be inspected using talosctl get command.
Network machine configuration can be modified using talosctl edit mc command (also variants talosctl patch mc, talosctl apply-config) without a reboot.
As API access requires network connection, --mode=try
can be used to test the configuration with automatic rollback to avoid losing network access to the node.
Resources
There are six basic network configuration items in Talos:
Address (IP address assigned to the interface/link);
Route (route to a destination);
Link (network interface/link configuration);
Resolver (list of DNS servers);
Hostname (node hostname and domainname);
TimeServer (list of NTP servers).
Each network configuration item has two counterparts:
*Status (e.g. LinkStatus) describes the current state of the system (Linux kernel state);
*Spec (e.g. LinkSpec) defines the desired configuration.
Resource
Status
Spec
Address
AddressStatus
AddressSpec
Route
RouteStatus
RouteSpec
Link
LinkStatus
LinkSpec
Resolver
ResolverStatus
ResolverSpec
Hostname
HostnameStatus
HostnameSpec
TimeServer
TimeServerStatus
TimeServerSpec
Status resources have aliases with the Status suffix removed, so for example
AddressStatus is also available as Address.
Talos networking controllers reconcile the state so that *Status equals the desired *Spec.
Observing State
The current network configuration state can be observed by querying *Status resources via
talosctl:
$ talosctl get addresses
NODE NAMESPACE TYPE ID VERSION ADDRESS LINK
172.20.0.2 network AddressStatus eth0/172.20.0.2/24 1 172.20.0.2/24 eth0
172.20.0.2 network AddressStatus eth0/fe80::9804:17ff:fe9d:3058/64 2 fe80::9804:17ff:fe9d:3058/64 eth0
172.20.0.2 network AddressStatus flannel.1/10.244.4.0/32 1 10.244.4.0/32 flannel.1
172.20.0.2 network AddressStatus flannel.1/fe80::10b5:44ff:fe62:6fb8/64 2 fe80::10b5:44ff:fe62:6fb8/64 flannel.1
172.20.0.2 network AddressStatus lo/127.0.0.1/8 1 127.0.0.1/8 lo
172.20.0.2 network AddressStatus lo/::1/128 1 ::1/128 lo
In the output there are addresses set up by Talos (e.g. eth0/172.20.0.2/24) and
addresses set up by other facilities (e.g. flannel.1/10.244.4.0/32 set up by CNI).
Talos networking controllers watch the kernel state and update resources
accordingly.
Additional details about the address can be accessed via the YAML output:
The desired networking configuration is combined from multiple sources and presented
as *Spec resources:
$ talosctl get addressspecs
NODE NAMESPACE TYPE ID VERSION
172.20.0.2 network AddressSpec eth0/172.20.0.2/24 2172.20.0.2 network AddressSpec lo/127.0.0.1/8 2172.20.0.2 network AddressSpec lo/::1/128 2
These AddressSpecs are applied to the Linux kernel to reach the desired state.
If, for example, an AddressSpec is removed, the address is removed from the Linux network interface as well.
*Spec resources can’t be manipulated directly, they are generated automatically by Talos
from multiple configuration sources (see a section below for details).
If a *Spec resource is queried in YAML format, some additional information is available:
An important field is the layer field, which describes a configuration layer this spec is coming from: in this case, it’s generated by a network operator (see below) and is set by the DHCPv4 operator.
Configuration Merging
Spec resources described in the previous section show the final merged configuration state,
while initial specs are put to a different unmerged namespace network-config.
Spec resources in the network-config namespace are merged with conflict resolution to produce the final merged representation in the network namespace.
Let’s take HostnameSpec as an example.
The final merged representation is:
We can see that the final configuration for the hostname is talos-default-controlplane-1.
And this is the hostname that was actually applied.
This can be verified by querying a HostnameStatus resource:
$ talosctl get hostnamestatus
NODE NAMESPACE TYPE ID VERSION HOSTNAME DOMAINNAME
172.20.0.2 network HostnameStatus hostname 1 talos-default-controlplane-1
Initial configuration for the hostname in the network-config namespace is:
We can see that there are two specs for the hostname:
one from the default configuration layer which defines the hostname as talos-172-20-0-2 (default driven by the default node address);
another one from the layer operator that defines the hostname as talos-default-controlplane-1 (DHCP).
Talos merges these two specs into a final HostnameSpec based on the configuration layer and merge rules.
Here is the order of precedence from low to high:
configuration (derived from the machine configuration).
So in our example the operator layer HostnameSpec overrides the default layer producing the final hostname talos-default-controlplane-1.
The merge process applies to all six core networking specs.
For each spec, the layer controls the merge behavior
If multiple configuration specs
appear at the same layer, they can be merged together if possible, otherwise merge result
is stable but not defined (e.g. if DHCP on multiple interfaces provides two different hostnames for the node).
LinkSpecs are merged across layers, so for example, machine configuration for the interface MTU overrides an MTU set by the DHCP server.
Network Operators
Network operators provide dynamic network configuration which can change over time as the node is running:
DHCPv4
DHCPv6
Virtual IP
Network operators produce specs for addresses, routes, links, etc., which are then merged and applied according to the rules described above.
Operators are configured with OperatorSpec resources which describe when operators
should run and additional configuration for the operator:
OperatorSpec resources are generated by Talos based on machine configuration mostly.
DHCP4 operator is created automatically for all physical network links which are not configured explicitly via the kernel command line or the machine configuration.
This also means that on the first boot, without a machine configuration, a DHCP request is made on all physical network interfaces by default.
Specs generated by operators are prefixed with the operator ID (dhcp4/eth0 in the example above) in the unmerged network-config namespace:
$ talosctl -n 172.20.0.2 get addressspecs --namespace network-config
NODE NAMESPACE TYPE ID VERSION
172.20.0.2 network-config AddressSpec dhcp4/eth0/eth0/172.20.0.2/24 1
Other Network Resources
There are some additional resources describing the network subsystem state.
The NodeAddress resource presents node addresses excluding link-local and loopback addresses:
$ talosctl get nodeaddresses
NODE NAMESPACE TYPE ID VERSION ADDRESSES
10.100.2.23 network NodeAddress accumulative 6["10.100.2.23","147.75.98.173","147.75.195.143","192.168.95.64","2604:1380:1:ca00::17"]10.100.2.23 network NodeAddress current 5["10.100.2.23","147.75.98.173","192.168.95.64","2604:1380:1:ca00::17"]10.100.2.23 network NodeAddress default 1["10.100.2.23"]
default is the node default address;
current is the set of addresses a node currently has;
accumulative is the set of addresses a node had over time (it might include virtual IPs which are not owned by the node at the moment).
NodeAddress resources are used to pick up the default address for etcd peer URL, to populate SANs field in the generated certificates, etc.
Another important resource is Nodename which provides Node name in Kubernetes:
$ talosctl get nodename
NODE NAMESPACE TYPE ID VERSION NODENAME
10.100.2.23 controlplane Nodename nodename 1 infra-green-cp-mmf7v
Depending on the machine configuration nodename might be just a hostname or the FQDN of the node.
NetworkStatus aggregates the current state of the network configuration:
For each of the six basic resource types, there are several controllers:
*StatusController populates *Status resources observing the Linux kernel state.
*ConfigController produces the initial unmerged *Spec resources in the network-config namespace based on defaults, kernel command line, and machine configuration.
*MergeController merges *Spec resources into the final representation in the network namespace.
*SpecController applies merged *Spec resources to the kernel state.
For the network operators:
OperatorConfigController produces OperatorSpec resources based on machine configuration and deafauls.
OperatorSpecController runs network operators watching OperatorSpec resources and producing various *Spec resources in the network-config namespace.
Configuration Sources
There are several configuration sources for the network configuration, which are described in this section.
Defaults
lo interface is assigned addresses 127.0.0.1/8 and ::1/128;
hostname is set to the talos-<IP> where IP is the default node address;
resolvers are set to 8.8.8.8, 1.1.1.1;
time servers are set to pool.ntp.org;
DHCP4 operator is run on any physical interface which is not configured explicitly.
Cmdline
The kernel command line is parsed for the following options:
ip= option is parsed for node IP, default gateway, hostname, DNS servers, NTP servers;
bond= option is parsed for bonding interfaces and their options;
talos.hostname= option is used to set node hostname;
talos.network.interface.ignore= can be used to make Talos skip network interface configuration completely.
Platform
Platform configuration delivers cloud environment-specific options (e.g. the hostname).
Platform configuration is specific to the environment metadata: for example, on Equinix Metal, Talos automatically
configures public and private IPs, routing, link bonding, hostname.
Platform configuration is cached across reboots in /system/state/platform-network.yaml.
Operator
Network operators provide configuration for all basic resource types.
Machine Configuration
The machine configuration is parsed for link configuration, addresses, routes, hostname,
resolvers and time servers.
Any changes to .machine.network configuration can be applied in immediate mode.
Network Configuration Debugging
Most of the network controller operations and failures are logged to the kernel console,
additional logs with debug level are available with talosctl logs controller-runtime command.
If the network configuration can’t be established and the API is not available, debug level
logs can be sent to the console with debug: true option in the machine configuration.
6.8 - Network Connectivity
Description of the Networking Connectivity needed by Talos Linux
Configuring Network Connectivity
The simplest way to deploy Talos is by ensuring that all the remote components of the system (talosctl, the control plane nodes, and worker nodes) all have layer 2 connectivity.
This is not always possible, however, so this page lays out the minimal network access that is required to configure and operate a talos cluster.
Note: These are the ports required for Talos specifically, and should be configured in addition to the ports required by kubernetes.
See the kubernetes docs for information on the ports used by kubernetes itself.
Ports marked with a * are not currently configurable, but that may change in the future.
Follow along here.
6.9 - KubeSpan
Understand more about KubeSpan for Talos Linux.
WireGuard Peer Discovery
The key pieces of information needed for WireGuard generally are:
the public key of the host you wish to connect to
an IP address and port of the host you wish to connect to
The latter is really only required of one side of the pair.
Once traffic is received, that information is learned and updated by WireGuard automatically.
Kubernetes, though, also needs to know which traffic goes to which WireGuard peer.
Because this information may be dynamic, we need a way to keep this information up to date.
If we already have a connection to Kubernetes, it’s fairly easy: we can just keep that information in Kubernetes.
Otherwise, we have to have some way to discover it.
Talos Linux implements a multi-tiered approach to gathering this information.
Each tier can operate independently, but the amalgamation of the mechanisms produces a more robust set of connection criteria.
The Kubernetes-based system utilizes annotations on Kubernetes Nodes which describe each node’s public key and local addresses.
On top of this, KubeSpan can optionally route Pod subnets.
This is usually taken care of by the CNI, but there are many situations where the CNI fails to be able to do this itself, across networks.
NAT, Multiple Routes, Multiple IPs
One of the difficulties in communicating across networks is that there is often not a single address and port which can identify a connection for each node on the system.
For instance, a node sitting on the same network might see its peer as 192.168.2.10, but a node across the internet may see it as 2001:db8:1ef1::10.
We need to be able to handle any number of addresses and ports, and we also need to have a mechanism to try them.
WireGuard only allows us to select one at a time.
KubeSpan implements a controller which continuously discovers and rotates these IP:port pairs until a connection is established.
It then starts trying again if that connection ever fails.
Packet Routing
After we have established a WireGuard connection, we have to make sure that the right packets get sent to the WireGuard interface.
WireGuard supplies a convenient facility for tagging packets which come from it, which is great.
But in our case, we need to be able to allow traffic which both does not come from WireGuard and also is not destined for another Kubernetes node to flow through the normal mechanisms.
Unlike many corporate or privacy-oriented VPNs, we need to allow general internet traffic to flow normally.
Also, as our cluster grows, this set of IP addresses can become quite large and quite dynamic.
This would be very cumbersome and slow in iptables.
Luckily, the kernel supplies a convenient mechanism by which to define this arbitrarily large set of IP addresses: IP sets.
Talos collects all of the IPs and subnets which are considered “in-cluster” and maintains these in the kernel as an IP set.
Now that we have the IP set defined, we need to tell the kernel how to use it.
The traditional way of doing this would be to use iptables.
However, there is a big problem with IPTables.
It is a common namespace in which any number of other pieces of software may dump things.
We have no surety that what we add will not be wiped out by something else (from Kubernetes itself, to the CNI, to some workload application), be rendered unusable by higher-priority rules, or just generally cause trouble and conflicts.
Instead, we use a three-pronged system which is both more foundational and less centralised.
NFTables offers a separately namespaced, decentralised way of marking packets for later processing based on IP sets.
Instead of a common set of well-known tables, NFTables uses hooks into the kernel’s netfilter system, which are less vulnerable to being usurped, bypassed, or a source of interference than IPTables, but which are rendered down by the kernel to the same underlying XTables system.
Our NFTables system is where we store the IP sets.
Any packet which enters the system, either by forward from inside Kubernetes or by generation from the host itself, is compared against a hash table of this IP set.
If it is matched, it is marked for later processing by our next stage.
This is a high-performance system which exists fully in the kernel and which ultimately becomes an eBPF program, so it scales well to hundreds of nodes.
The next stage is the kernel router’s route rules.
These are defined as a common ordered list of operations for the whole operating system, but they are intended to be tightly constrained and are rarely used by applications in any case.
The rules we add are very simple: if a packet is marked by our NFTables system, send it to an alternate routing table.
This leads us to our third and final stage of packet routing.
We have a custom routing table with two rules:
send all IPv4 traffic to the WireGuard interface
send all IPv6 traffic to the WireGuard interface
So in summary, we:
mark packets destined for Kubernetes applications or Kubernetes nodes
send marked packets to a special routing table
send anything which is sent to that routing table through the WireGuard interface
This gives us an isolated, resilient, tolerant, and non-invasive way to route Kubernetes traffic safely, automatically, and transparently through WireGuard across almost any set of network topologies.
Design Decisions
Routing
Routing for Wireguard is a touch complicated when the set of possible peer
endpoints includes at least one member of the set of destinations.
That is, packets from Wireguard to a peer endpoint should not be sent to
Wireguard, lest a loop be created.
In order to handle this situation, Wireguard provides the ability to mark
packets which it generates, so their routing can be handled separately.
In our case, though, we actually want the inverse of this: we want to route
Wireguard packets however the normal networking routes and rules say they should
be routed, while packets destined for the other side of Wireguard Peers should
be forced into Wireguard interfaces.
While IP Rules allow you to invert matches, they do not support matching based
on IP sets.
That means, to use simple rules, we would have to add a rule for
each destination, which could reach into hundreds or thousands of rules to
manage.
This is not really much of a performance issue, but it is a management
issue, since it is expected that we would not be the only manager of rules in
the system, and rules offer no facility to tag for ownership.
IP Sets are supported by IPTables, and we could integrate there.
However, IPTables exists in a global namespace, which makes it fragile having
multiple parties manipulating it.
The newer NFTables replacement for IPTables, though, allows users to
independently hook into various points of XTables, keeping all such rules and
sets independent.
This means that regardless of what CNIs or other user-side routing rules may do,
our KubeSpan setup will not be messed up.
Therefore, we utilise NFTables (which natively supports IP sets and owner
grouping) instead, to mark matching traffic which should be sent to the
Wireguard interface.
This way, we can keep all our KubeSpan set logic in one place, allowing us to
simply use a single ip rule match:
for our fwmark, and sending those matched packets to a separate routing table
with one rule: default to the wireguard interface.
So we have three components:
A routing table for Wireguard-destined packets
An NFTables table which defines the set of destinations packets to which will
be marked with our firewall mark.
Hook into PreRouting (type Filter)
Hook into Outgoing (type Route)
One IP Rule which sends packets marked with our firewall mark to our Wireguard
routing table.
Routing Table
The routing table (number 180 by default) is simple, containing a single route for each family: send everything through the Wireguard interface.
NFTables
The logic inside NFTables is fairly simple.
First, everything is compiled into a single table: talos_kubespan.
Next, two chains are set up: one for the prerouting hook (kubespan_prerouting)
and the other for the outgoing hook (kubespan_outgoing).
We define two sets of target IP prefixes: one for IPv6 (kubespan_targets_ipv6)
and the other for IPv4 (kubespan_targets_ipv4).
Last, we add rules to each chain which basically specify:
If the packet is marked as from Wireguard, just accept it and terminate
the chain.
If the packet matches an IP in either of the target IP sets, mark that
packet with the to Wireguard mark.
Rules
There are two route rules defined: one to match IPv6 packets and the other to
match IPv4 packets.
These rules say the same thing for each: if the packet is marked that it should
go to Wireguard, send it to the Wireguard
routing table.
Firewall Mark
KubeSpan is using only two bits of the firewall mark with the mask 0x00000060.
Note: if other software on the node is using the bits 0x60 of the firewall mark, this
might cause conflicts and break KubeSpan.
At the moment of the writing, it was confirmed that Calico CNI is using bits 0xffff0000 and
Cilium CNI is using bits 0xf00, so KubeSpan is compatible with both.
Flannel CNI uses 0x4000 mask, so it is also compatible.
In the routing rules table, we match on the mark 0x40 with the mask 0x60:
32500: from all fwmark 0x40/0x60 lookup 180
In the NFTables table, we match with the same mask 0x60 and we set the mask by only modifying
bits from the 0x60 mask:
meta mark & 0x00000060 == 0x00000020 accept
ip daddr @kubespan_targets_ipv4 meta mark set meta mark & 0xffffffdf | 0x00000040 accept
ip6 daddr @kubespan_targets_ipv6 meta mark set meta mark & 0xffffffdf | 0x00000040 accept
6.10 - Process Capabilities
Understand the Linux process capabilities restrictions with Talos Linux.
Linux defines a set of process capabilities that can be used to fine-tune the process permissions.
Talos Linux for security reasons restricts any process from gaining the following capabilities:
CAP_SYS_MODULE (loading kernel modules)
CAP_SYS_BOOT (rebooting the system)
This means that any process including privileged Kubernetes pods will not be able to get these capabilities.
If you see the following error on starting a pod, make sure it doesn’t have any of the capabilities listed above in the spec:
Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: unable to apply caps: operation not permitted: unknown
Note: even with CAP_SYS_MODULE capability, Linux kernel module loading is restricted by requiring a valid signature.
Talos Linux creates a throw away signing key during kernel build, so it’s not possible to build/sign a kernel module for Talos Linux outside of the build process.
6.11 - talosctl
The design and use of the Talos Linux control application.
The talosctl tool acts as a reference implementation for the Talos API, but it also handles a lot of
conveniences for the use of Talos and its clusters.
Video Walkthrough
To see some live examples of talosctl usage, view the following video:
Client Configuration
Talosctl configuration is located in $XDG_CONFIG_HOME/talos/config.yaml if $XDG_CONFIG_HOME is defined.
Otherwise it is in $HOME/.talos/config.
The location can always be overridden by the TALOSCONFIG environment variable or the --talosconfig parameter.
Like kubectl, talosctl uses the concept of configuration contexts, so any number of Talos clusters can be managed with a single configuration file.
It also comes with some intelligent tooling to manage the merging of new contexts into the config.
The default operation is a non-destructive merge, where if a context of the same name already exists in the file, the context to be added is renamed by appending an index number.
You can easily overwrite instead, as well.
See the talosctl config help for more information.
Endpoints and Nodes
endpoints are the communication endpoints to which the client directly talks.
These can be load balancers, DNS hostnames, a list of IPs, etc.
If multiple endpoints are specified, the client will automatically load
balance and fail over between them.
It is recommended that these point to the set of control plane nodes, either directly or through a load balancer.
Each endpoint will automatically proxy requests destined to another node through it, so it is not necessary to change the endpoint configuration just because you wish to talk to a different node within the cluster.
Endpoints do, however, need to be members of the same Talos cluster as the target node, because these proxied connections reply on certificate-based authentication.
The node is the target node on which you wish to perform the API call.
While you can configure the target node (or even set of target nodes) inside the ’talosctl’ configuration file, it is recommended not to do so, but to explicitly declare the target node(s) using the -n or --nodes command-line parameter.
When specifying nodes, their IPs and/or hostnames are as seen by the endpoint servers, not as from the client.
This is because all connections are proxied first through the endpoints.
Kubeconfig
The configuration for accessing a Talos Kubernetes cluster is obtained with talosctl.
By default, talosctl will safely merge the cluster into the default kubeconfig.
Like talosctl itself, in the event of a naming conflict, the new context name will be index-appended before insertion.
The --force option can be used to overwrite instead.
You can also specify an alternate path by supplying it as a positional parameter.
Thus, like Talos clusters themselves, talosctl makes it easy to manage any
number of kubernetes clusters from the same workstation.
Commands
Please see the CLI reference for the entire list of commands which are available from talosctl.
6.12 - FAQs
Frequently Asked Questions about Talos Linux.
How is Talos different from other container optimized Linux distros?
Talos integrates tightly with Kubernetes, and is not meant to be a general-purpose operating system.
The most important difference is that Talos is fully controlled by an API via a gRPC interface, instead of an ordinary shell.
We don’t ship SSH, and there is no console access.
Removing components such as these has allowed us to dramatically reduce the footprint of Talos, and in turn, improve a number of other areas like security, predictability, reliability, and consistency across platforms.
It’s a big change from how operating systems have been managed in the past, but we believe that API-driven OSes are the future.
Why no shell or SSH?
Since Talos is fully API-driven, all maintenance and debugging operations are possible via the OS API.
We would like for Talos users to start thinking about what a “machine” is in the context of a Kubernetes cluster.
That is, that a Kubernetes cluster can be thought of as one massive machine, and the nodes are merely additional, undifferentiated resources.
We don’t want humans to focus on the nodes, but rather on the machine that is the Kubernetes cluster.
Should an issue arise at the node level, talosctl should provide the necessary tooling to assist in the identification, debugging, and remediation of the issue.
However, the API is based on the Principle of Least Privilege, and exposes only a limited set of methods.
We envision Talos being a great place for the application of control theory in order to provide a self-healing platform.
Why the name “Talos”?
Talos was an automaton created by the Greek God of the forge to protect the island of Crete.
He would patrol the coast and enforce laws throughout the land.
We felt it was a fitting name for a security focused operating system designed to run Kubernetes.
Why does Talos rely on a separate configuration from Kubernetes?
The talosconfig file contains client credentials to access the Talos Linux API.
Sometimes Kubernetes might be down for a number of reasons (etcd issues, misconfiguration, etc.), while Talos API access will always be available.
The Talos API is a way to access the operating system and fix issues, e.g. fixing access to Kubernetes.
When Talos Linux is running fine, using the Kubernetes APIs (via kubeconfig) is all you should need to deploy and manage Kubernetes workloads.
How does Talos handle certificates?
During the machine config generation process, Talos generates a set of certificate authorities (CAs) that remains valid for 10 years.
Talos is responsible for managing certificates for etcd, Talos API (apid), node certificates (kubelet), and other components.
It also handles the automatic rotation of server-side certificates.
However, client certificates such as talosconfig and kubeconfig are the user’s responsibility, and by default, they have a validity period of 1 year.
To renew the talosconfig certificate, the follow this process.
To renew kubeconfig, use talosctl kubeconfig command, and the time-to-live (TTL) is defined in the configuration.
How can I set the timezone of my Talos Linux clusters?
Talos doesn’t support timezones, and will always run in UTC.
This ensures consistency of log timestamps for all Talos Linux clusters, simplifying debugging.
Your containers can run with any timezone configuration you desire, but the timezone of Talos Linux is not configurable.
How do I see Talos kernel configuration?
Using Talos API
Current kernel config can be read with talosctl -n <NODE> read /proc/config.gz.
Generating Talos Linux ISO image with custom kernel arguments
Pass additional kernel arguments using --extra-kernel-arg flag:
$ docker run --rm -i ghcr.io/siderolabs/imager:v1.9.0 iso --arch amd64 --tar-to-stdout --extra-kernel-arg console=ttyS1 --extra-kernel-arg console=tty0 | tar xz
2022/05/25 13:18:47 copying /usr/install/amd64/vmlinuz to /mnt/boot/vmlinuz
2022/05/25 13:18:47 copying /usr/install/amd64/initramfs.xz to /mnt/boot/initramfs.xz
2022/05/25 13:18:47 creating grub.cfg
2022/05/25 13:18:47 creating ISO
ISO will be output to the file talos-<arch>.iso in the current directory.
Logging Kubernetes audit logs with loki
If using loki-stack helm chart to gather logs from the Kubernetes cluster, you can use the helm values to configure loki-stack to log Kubernetes API server audit logs:
promtail:
extraArgs:
- -config.expand-env
# this is required so that the promtail process can read the kube-apiserver audit logs written as `nobody` usercontainerSecurityContext:
capabilities:
add:
- DAC_READ_SEARCH
extraVolumes:
- name: audit-logs
hostPath:
path: /var/log/audit/kube
extraVolumeMounts:
- name: audit-logs
mountPath: /var/log/audit/kube
readOnly: trueconfig:
snippets:
extraScrapeConfigs: | - job_name: auditlogs
static_configs:
- targets:
- localhost
labels:
job: auditlogs
host: ${HOSTNAME}
__path__: /var/log/audit/kube/*.log
Setting CPU scaling governor
While its possible to set CPU scaling governor via .machine.sysfs it’s sometimes cumbersome to set it for all CPU’s individually.
A more elegant approach would be set it via a kernel commandline parameter.
This also means that the options are applied way early in the boot process.
This can be set in the machineconfig via the snippet below: