If you’re interested in this project and would like to help in engineering efforts, or have general usage questions, we are happy to have you!
We hold a weekly meeting that all audiences are welcome to attend.
We would appreciate your feedback so that we can make Talos even better!
To do so, you can take our survey.
You can subscribe to this meeting by joining the community forum above.
Enterprise
If you are using Talos in a production setting, and need consulting services to get started or to integrate Talos into your existing environment, we can help.
Sidero Labs, Inc. offers support contracts with SLA (Service Level Agreement)-bound terms for mission-critical environments.
Talos is a container optimized Linux distro; a reimagining of Linux for distributed systems such as Kubernetes.
Designed to be as minimal as possible while still maintaining practicality.
For these reasons, Talos has a number of features unique to it:
it is immutable
it is atomic
it is ephemeral
it is minimal
it is secure by default
it is managed via a single declarative configuration file and gRPC API
Talos can be deployed on container, cloud, virtualized, and bare metal platforms.
Why Talos
In having less, Talos offers more.
Security.
Efficiency.
Resiliency.
Consistency.
All of these areas are improved simply by having less.
1.2 - Quickstart
The easiest way to try Talos is by using the CLI (talosctl) to create a cluster on a machine with docker installed.
Download kubectl via one of methods outlined in the documentation.
Create the Cluster
Now run the following:
talosctl cluster create
Verify that you can reach Kubernetes:
$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
talos-default-master-1 Ready master 115s v1.20.2 10.5.0.2 <none> Talos (v0.9.0) <host kernel> containerd://1.4.3
talos-default-worker-1 Ready <none> 115s v1.20.2 10.5.0.3 <none> Talos (v0.9.0) <host kernel> containerd://1.4.3
Destroy the Cluster
When you are all done, remove the cluster:
talosctl cluster destroy
1.3 - Getting Started
This document will walk you through installing a full Talos Cluster.
You may wish to read through the Quickstart first, to quickly create a local virtual cluster on your workstation.
Regardless of where you run Talos, you will find that there is a pattern to deploying it.
In general you will need to:
acquire the installation image
decide on the endpoint for Kubernetes
optionally create a load balancer
configure Talos
configure talosctl
bootstrap Kubernetes
Prerequisites
talosctl
The talosctl tool provides a CLI tool which interfaces with the Talos API in
an easy manner.
It also includes a number of useful tools for creating and managing your clusters.
When booted from the ISO, Talos will run in RAM, and it will not install itself
until it is provided a configuration.
Thus, it is safe to boot the ISO onto any machine.
Alternative Booting
If you wish to use a different boot mechanism (such as network boot or a custom ISO), there
are a number of required kernel parameters.
In order to configure Kubernetes and bootstrap the cluster, Talos needs to know
what the endpoint (DNS name or IP address) of the Kubernetes API Server will be.
The endpoint should be the fully-qualified HTTP(S) URL for the Kubernetes API
Server, which (by default) runs on port 6443 using HTTPS.
Thus, the format of the endpoint may be something like:
https://192.168.0.10:6443
https://kube.mycluster.mydomain.com:6443
https://[2001:db8:1234::80]:6443
Because the Kubernetes controlplane is meant to be supplied in a high
availability manner, we must also choose how to bind it to the servers
themselves.
There are three common ways to do this.
Dedicated Load-balancer
If you are using a cloud provider or have your own load-balancer available (such
as HAProxy, nginx reverse proxy, or an F5 load-balancer), using
a dedicated load balancer is a natural choice.
Just create an appropriate frontend matching the endpoint, and point the backends at each of the addresses of the Talos controlplane nodes.
This is convenient if a load-balancer is available, but don’t worry if that is
not the case.
Layer 2 Shared IP
Talos has integrated support for serving Kubernetes from a shared (sometimes
called “virtual”) IP address.
This method relies on OSI Layer 2 connectivity between controlplane Talos nodes.
In this case, we may choose an IP address on the same subnet as the Talos
controlplane nodes which is not otherwise assigned to any machine.
For instance, if your controlplane node IPs are:
192.168.0.10
192.168.0.11
192.168.0.12
You could choose the ip 192.168.0.15 as your shared IP address.
Just make sure that 192.168.0.15 is not used by any other machine and that your DHCP
will not serve it to any other machine.
Once chosen, form the full HTTPS URL from this IP:
https://192.168.0.15:6443
You are also free to set a DNS record to this IP address instead, but you will
still need to use the IP address to set up the shared IP
(machine.network.interfaces[].vip.ip) inside the Talos
configuration.
For more information about using a shared IP, see the related
Guide
DNS records
If neither of the other methods work for you, you can instead use DNS records to
provide a measure of redundancy.
In this case, you would add multiple A or AAAA records for a DNS name.
For instance, you could add:
kube.cluster1.mydomain.com IN A 192.168.0.10
kube.cluster1.mydomain.com IN A 192.168.0.11
kube.cluster1.mydomain.com IN A 192.168.0.12
Then, your endpoint would be:
https://kube.cluster1.mydomain.com:6443
Decide how to access the Talos API
Since Talos is entirely API-driven, it is important to know how you are going to
access that API.
Talos comes with a number of mechanisms to make that easier.
Controlplane nodes can proxy requests for worker nodes.
This means that you only need access to the controlplane nodes in order to access
the rest of the network.
This is useful for security (your worker nodes do not need to have
public IPs or be otherwise connected to the Internet), and it also makes working
with highly-variable clusters easier, since you only need to know the
controlplane nodes in advance.
Even better, the talosctl tool will automatically load balance and fail over
between all of your controlplane nodes, so long as it is informed of each of the
controlplane node IPs.
That does, of course, present the problem that you need to know how to talk to
the controlplane nodes.
In some environments, it is easy to be able to forecast, prescribe, or discover
the controlplane node IP addresses.
For others, though, even the controlplane nodes are dynamic, unpredictable, and
undiscoverable.
The dynamic options above for the Kubernetes API endpoint also apply to the
Talos API endpoints.
The difference is that the Talos API runs on port 50000/tcp.
Whichever way you wish to access the Talos API, be sure to note the IP(s) or
hostname(s) so that you can configure your talosctl tool’s endpoints below.
Configure Talos
When Talos boots without a configuration, such as when using the Talos ISO, it
enters a limited maintenance mode and waits for a configuration to be provided.
Alternatively, the Talos installer can be booted with the talos.config kernel
commandline argument set to an HTTP(s) URL from which it should receive its
configuration.
In cases where a PXE server can be available, this is much more efficient than
manually configuring each node.
If you do use this method, just note that Talos does require a number of other
kernel commandline parameters.
See the required kernel parameters for more information.
In either case, we need to generate the configuration which is to be provided.
Luckily, the talosctl tool comes with a configuration generator for exactly
this purpose.
talosctl gen config "cluster-name""cluster-endpoint"
Here, cluster-name is an arbitrary name for the cluster which will be used
in your local client configuration as a label.
It does not affect anything in the cluster itself.
It is arbitrary, but it should be unique in the configuration on your local workstation.
The cluster-endpoint is where you insert the Kubernetes Endpoint you
selected from above.
This is the Kubernetes API URL, and it should be a complete URL, with https://
and port, if not 443.
The default port is 6443, so the port is almost always required.
When you run this command, you will receive a number of files in your current
directory:
controlplane.yaml
init.yaml
join.yaml
talosconfig
The three .yaml files are what we call Machine Configs.
They are installed onto the Talos servers to act as their complete configuration,
describing everything from what disk Talos should be installed to, to what
sysctls to set, to what network settings it should have.
In the case of the controlplane.yaml and init.yaml, it even describes how Talos should form its Kubernetes cluster.
The talosconfig file (which is also YAML) is your local client configuration
file.
Controlplane, Init, and Join
The three types of Machine Configs correspond to the three roles of Talos nodes.
For our purposes, you can ignore the Init type.
It is a legacy type which will go away eventually.
Its purpose was to self-bootstrap.
Instead, we now use an API call to bootstrap the cluster, which is much more robust.
That leaves us with Controlplane and Join.
The Controlplane Machine Config describes the configuration of a Talos server on
which the Kubernetes Controlplane should run.
The Join Machine Config describes everything else: workload servers.
The main difference between Controlplane Machine Config files and Join Machine
Config files is that the former contains information about how to form the
Kubernetes cluster.
Templates
The generated files can be thought of as templates.
Individual machines may need specific settings (for instance, each may have a
different static IP address).
When different files are needed for machines of the same type, simply
copy the source template (controlplane.yaml or join.yaml) and make whatever
modifications need to be done.
For instance, if you had three controlplane nodes and three worker nodes, you
may do something like this:
for i in $(seq 0 2); do cp controlplane.yaml cp$i.yaml
end
for i in $(seq 0 2); do cp join.yaml w$i.yaml
end
In cases where there is no special configuration needed, you may use the same
file for each machine of the same type.
Apply Configuration
After you have generated each machine’s Machine Config, you need to load them
into the mahines themselves.
For that, you need to know their IP addresses.
If you have access to the console or console logs of the machines, you can read
them to find the IP address(es).
Talos will print them out during the boot process:
The insecure flag is necessary at this point because the PKI infrastructure has
not yet been made available to the node.
Note that the connection will be encrypted, it is just unauthenticated.
If you have console access, though, you can extract the server
certificate fingerprint and use it for an additional layer of validation:
Using the fingerprint allows you to be sure you are sending the configuration to
the right machine, but it is completely optional.
After the configuration is applied to a node, it will reboot.
You may repeat this process for each of the nodes in your cluster.
Configure your talosctl client
Now that the nodes are running Talos with its full PKI security suite, you need
to use that PKI to talk to the machines.
That means configuring your client, and that is what that talosconfig file is for.
Endpoints
Endpoints are the communication endpoints to which the client directly talks.
These can be load balancers, DNS hostnames, a list of IPs, etc.
In general, it is recommended that these point to the set of control plane
nodes, either directly or through a reverse proxy or load balancer.
Each endpoint will automatically proxy requests destined to another node through
it, so it is not necessary to change the endpoint configuration just because you
wish to talk to a different node within the cluster.
Endpoints do, however, need to be members of the same Talos cluster as the
target node, because these proxied connections reply on certificate-based
authentication.
We need to set the endpoints in your talosconfig.
talosctl will automatically load balance and fail over among the endpoints,
so no external load balancer or DNS abstraction is required
(though you are free to use them, if desired).
As an example, if the IP addresses of our controlplane nodes are:
The node is the target node on which you wish to perform the API call.
Keep in mind, when specifying nodes that their IPs and/or hostnames are as seen by the endpoint servers, not as from the client.
This is because all connections are proxied first through the endpoints.
Some people also like to set a default set of nodes in the talosconfig.
This can be done in the same manner, replacing endpoint with node.
If you do this, however, know that you could easily reboot the wrong machine
by forgetting to declare the right one explicitly.
Worse, if you set several nodes as defaults, you could, with one talosctl upgrade
command upgrade your whole cluster all at the same time.
It’s a powerful tool, and with that comes great responsibility.
The author of this document does not set a default node.
You may simply provide -n or --nodes to any talosctl command to
supply the node or (comma-delimited) nodes on which you wish to perform the
operation.
Supplying the commandline parameter will override any default nodes
in the configuration file.
To verify default node(s) you’re currently configured to use, you can run:
$ talosctl version
Client:
...
Server:
NODE: <node>
...
For a more in-depth discussion of Endpoints and Nodes, please see
talosctl.
Default configuration file
You can reference which configuration file to use directly with the --talosconfig parameter:
talosctl --talosconfig=./talosconfig \
--nodes 192.168.0.2 version
However, talosctl comes with tooling to help you integrate and merge this
configuration into the default talosctl configuration file.
This is done with the merge option.
talosctl config merge ./talosconfig
This will merge your new talosconfig into the default configuration file
($XDG_CONFIG_HOME/talos/config.yaml), creating it if necessary.
Like Kubernetes, the talosconfig configuration files has multiple “contexts”
which correspond to multiple clusters.
The <cluster-name> you chose above will be used as the context name.
Kubernetes Bootstrap
All of your machines are configured, and your talosctl client is set up.
Now, you are ready to bootstrap your Kubernetes cluster.
If that sounds daunting, you haven’t used Talos before.
Bootstrapping your Kubernetes cluster with Talos is as simple as:
talosctl bootstrap --nodes 192.168.0.2
The IP there can be any of your controlplanes (or the loadbalancer, if you have
one).
It should only be issued once.
At this point, Talos will form an etcd cluster, generate all of the core
Kubernetes assets, and start the Kubernetes controlplane components.
After a few moments, you will be able to download your Kubernetes client
configuration and get started:
talosctl kubeconfig
Running this command will add (merge) you new cluster into you local Kubernetes
configuration in the same way as talosctl config merge merged the Talos client
configuration into your local Talos client configuration file.
If you would prefer for the configuration to not be merged into your default
Kubernetes configuration file, simple tell it a filename:
talosctl kubeconfig alternative-kubeconfig
If all goes well, you should now be able to connect to Kubernetes and see your
nodes:
kubectl get nodes
1.4 - System Requirements
Minimum Requirements
Role
Memory
Cores
Init/Control Plane
2GB
2
Worker
1GB
1
Recommended
Role
Memory
Cores
Init/Control Plane
4GB
4
Worker
2GB
2
These requirements are similar to that of kubernetes.
1.5 - What's New in Talos 0.9
Control Plane as Static Pods
Talos now runs the Kubernetes control plane as static pods managed via machine configuration.
This change makes the bootstrap process much more stable and resilient to failures.
For single control plane node clusters it eliminates bugs with the control plane being unavailable after a reboot.
As the control plane configuration is managed via the Talos API, even if the control plane configuration was wrong and
the API server is not available, the change can be rolled back using talosctl to bring the control plane back up.
When upgrading from Talos 0.8, control plane can be converted to run as static pods.
ECDSA Certificates and Keys for Kubernetes
Talos now generates uses ECDSA keys for Kubernetes and etcd PKI.
ECDSA keys are much smaller than RSA keys and all PKI operations are much faster (for example, generating a certificate from the CA) which
leads to much faster bootstrap and boot times.
Immediate Machine Configuration Updates
Changes to the .cluster part of Talos machine configuration can now be applied immediately (without a reboot).
This allows, for example, updating versions of control plane components, adding additional arguments or modifying bootstrap manifests.
Future versions of Talos will expand on this to allow most of the machine configuration to be applied without a reboot.
Disk Encryption
Talos now supports encryption for STATE and EPHEMERAL partitions of the system disk.
The STATE partition holds machine configuration and the EPHEMERAL partition is mounted as /var which stores container runtime
state, and configuration files laid on top of Talos read-only immutable root filesystem.
The encryption key in Talos 0.9 is derived from the Node UUID which is a unique machine identifier provided by the manufacturer.
Disk encryption is not enabled by default: it needs to be enabled via machine configuration.
Virtual IP for the Control Plane Endpoint
Talos adds support for Virtual L2 shared IP for the control plane: control plane nodes ensure only one of the nodes
advertise the shared IP via ARP.
If one of the control plane nodes goes down, another node takes over the shared IP.
Updated Components
Linux: 5.10.1 -> 5.10.19
Kubernetes: 1.20.1 -> 1.20.5
CoreDNS: 1.7.0 -> 1.8.0
etcd: 3.4.14 -> 3.4.15
containerd: 1.4.3 -> 1.4.4
2 - Bare Metal Platforms
2.1 - Digital Rebar
In this guide we will create an Kubernetes cluster with 1 worker node, and 2 controlplane nodes using an existing digital rebar deployment.
In this guide we will create an Kubernetes cluster with 1 worker node, and 2 controlplane nodes.
We assume an existing digital rebar deployment, and some familiarity with iPXE.
We leave it up to the user to decide if they would like to use static networking, or DHCP.
The setup and configuration of DHCP will not be covered.
Create the Machine Configuration Files
Generating Base Configurations
Using the DNS name of the load balancer, generate the base configuration files for the Talos machines:
$ talosctl gen config talos-k8s-metal-tutorial https://<load balancer IP or DNS>:<port>
created init.yaml
created controlplane.yaml
created join.yaml
created talosconfig
The loadbalancer is used to distribute the load across multiple controlplane nodes.
This isn’t covered in detail, because we asume some loadbalancing knowledge before hand.
If you think this should be added to the docs, please create a issue.
At this point, you can modify the generated configs to your liking.
Validate the Configuration Files
$ talosctl validate --config init.yaml --mode metal
init.yaml is valid for metal mode
$ talosctl validate --config controlplane.yaml --mode metal
controlplane.yaml is valid for metal mode
$ talosctl validate --config join.yaml --mode metal
join.yaml is valid for metal mode
Publishing the Machine Configuration Files
Digital Rebar has a build-in fileserver, which means we can use this feature to expose the talos configuration files.
We will place init.yaml, controlplane.yaml, and worker.yaml into Digital Rebar file server by using the drpcli tools.
Copy the generated files from the step above into your Digital Rebar installation.
drpcli file upload <file>.yaml as <file>.yaml
Replacing <file> with init, controlplane or worker.
Download the boot files
Download a recent version of boot.tar.gz from github.
This guide assumes the user has a working API token, the Equinix Metal CLI installed, and some familiarity with the CLI.
Network Booting
To install Talos to a server a working TFTP and iPXE server are needed.
How this is done varies and is left as an exercise for the user.
In general this requires a Talos kernel vmlinuz and initramfs.
These assets can be downloaded from a given release.
Special Considerations
PXE Boot Kernel Parameters
The following is a list of kernel parameters required by Talos:
talos.platform: set this to packet
init_on_alloc=1: required by KSPP
slab_nomerge: required by KSPP
pti=on: required by KSPP
User Data
To configure a Talos you can use the metadata service provide by Equinix Metal.
It is required to add a shebang to the top of the configuration file.
The shebang is arbitrary in the case of Talos, and the convention we use is #!talos.
Creating a Cluster via the Equinix Metal CLI
Control Plane Endpoint
The strategy used for an HA cluster varies and is left as an exercise for the user.
Some of the known ways are:
DNS
Load Balancer
BGP
Create the Machine Configuration Files
Generating Base Configurations
Using the DNS name of the loadbalancer created earlier, generate the base configuration files for the Talos machines:
$ talosctl gen config talos-k8s-aws-tutorial https://<load balancer IP or DNS>:<port>
created init.yaml
created controlplane.yaml
created join.yaml
created talosconfig
Now add the required shebang (e.g. #!talos) at the top of init.yaml, controlplane.yaml, and join.yaml
At this point, you can modify the generated configs to your liking.
Validate the Configuration Files
talosctl validate --config init.yaml --mode metal
talosctl validate --config controlplane.yaml --mode metal
talosctl validate --config join.yaml --mode metal
Note: Validation of the install disk could potentially fail as the validation
is performed on you local machine and the specified disk may not exist.
In this guide we will create an HA Kubernetes cluster with 3 worker nodes using an existing load balancer and matchbox deployment.
Creating a Cluster
In this guide we will create an HA Kubernetes cluster with 3 worker nodes.
We assume an existing load balancer, matchbox deployment, and some familiarity with iPXE.
We leave it up to the user to decide if they would like to use static networking, or DHCP.
The setup and configuration of DHCP will not be covered.
Create the Machine Configuration Files
Generating Base Configurations
Using the DNS name of the load balancer, generate the base configuration files for the Talos machines:
$ talosctl gen config talos-k8s-metal-tutorial https://<load balancer IP or DNS>:<port>
created init.yaml
created controlplane.yaml
created join.yaml
created talosconfig
At this point, you can modify the generated configs to your liking.
Validate the Configuration Files
$ talosctl validate --config init.yaml --mode metal
init.yaml is valid for metal mode
$ talosctl validate --config controlplane.yaml --mode metal
controlplane.yaml is valid for metal mode
$ talosctl validate --config join.yaml --mode metal
join.yaml is valid for metal mode
Publishing the Machine Configuration Files
In bare-metal setups it is up to the user to provide the configuration files over HTTP(S).
A special kernel parameter (talos.config) must be used to inform Talos about where it should retreive its’ configuration file.
To keep things simple we will place init.yaml, controlplane.yaml, and join.yaml into Matchbox’s assets directory.
This directory is automatically served by Matchbox.
Create the Matchbox Configuration Files
The profiles we will create will reference vmlinuz, and initramfs.xz.
Download these files from the release of your choice, and place them in /var/lib/matchbox/assets.
Now that we have our configuraton files in place, boot all the machines.
Talos will come up on each machine, grab its’ configuration file, and bootstrap itself.
Retrieve the kubeconfig
At this point we can retrieve the admin kubeconfig by running:
In order to install Talos in Proxmox, you will need the ISO image from the Talos release page.
You can download talos-amd64.iso via
github.com/talos-systems/talos/releases
From the Proxmox UI, select the “local” storage and enter the “Content” section.
Click the “Upload” button:
Select the ISO you downloaded previously, then hit “Upload”
Create VMs
Start by creating a new VM by clicking the “Create VM” button in the Proxmox UI:
Fill out a name for the new VM:
In the OS tab, select the ISO we uploaded earlier:
Keep the defaults set in the “System” tab.
Keep the defaults in the “Hard Disk” tab as well, only changing the size if desired.
In the “CPU” section, give at least 2 cores to the VM:
Verify that the RAM is set to at least 2GB:
Keep the default values for networking, verifying that the VM is set to come up on the bridge interface:
Finish creating the VM by clicking through the “Confirm” tab and then “Finish”.
Repeat this process for a second VM to use as a worker node.
You can also repeat this for additional nodes desired.
Start Control Plane Node
Once the VMs have been created and updated, start the VM that will be the first control plane node.
This VM will boot the ISO image specified earlier and enter “maintenance mode”.
Once the machine has entered maintenance mode, there will be a console log that details the IP address that the node received.
Take note of this IP address, which will be referred to as $CONTROL_PLANE_IP for the rest of this guide.
If you wish to export this IP as a bash variable, simply issue a command like export CONTROL_PLANE_IP=1.2.3.4.
Generate Machine Configurations
With the IP address above, you can now generate the machine configurations to use for installing Talos and Kubernetes.
Issue the following command, updating the output directory, cluster name, and control plane IP as you see fit:
talosctl gen config talos-vbox-cluster https://$CONTROL_PLANE_IP:6443 --output-dir _out
This will create several files in the _out directory: init.yaml, controlplane.yaml, join.yaml, and talosconfig.
Create Control Plane Node
Using the init.yaml generated above, you can now apply this config using talosctl.
Issue:
You should now see some action in the Proxmox console for this VM.
Talos will be installed to disk, the VM will reboot, and then Talos will configure the Kubernetes control plane on this VM.
Note: This process can be repeated multiple times to create an HA control plane.
Simply apply controlplane.yaml instead of init.yaml for subsequent nodes.
Create Worker Node
Create at least a single worker node using a process similar to the control plane creation above.
Start the worker node VM and wait for it to enter “maintenance mode”.
Take note of the worker node’s IP address, which will be referred to as $WORKER_IP
Note: This process can be repeated multiple times to add additional workers.
Using the Cluster
Once the cluster is available, you can make use of talosctl and kubectl to interact with the cluster.
For example, to view current running containers, run talosctl containers for a list of containers in the system namespace, or talosctl containers -k for the k8s.io namespace.
To view the logs of a container, use talosctl logs <container> or talosctl logs -k <container>.
First, configure talosctl to talk to your control plane node by issuing the following, updating paths and IPs as necessary:
Fetch the kubeconfig file from the control plane node by issuing:
talosctl kubeconfig
You can then use kubectl in this fashion:
kubectl get nodes
Cleaning Up
To cleanup, simply stop and delete the virtual machines from the Proxmox UI.
3.4 - VMware
Creating Talos Kubernetes cluster using VMware.
Creating a Cluster via the govc CLI
In this guide we will create an HA Kubernetes cluster with 3 worker nodes.
We will use the govc cli which can be downloaded here.
Prerequisites
Prior to starting, it is important to have the following infrastructure in place and available:
DHCP server
Load Balancer or DNS address for cluster endpoint
If using a load balancer, the most common setup is to balance tcp/443 across the control plane nodes tcp/6443
If using a DNS address, the A record should return back the addresses of the control plane nodes
Create the Machine Configuration Files
Generating Base Configurations
Using the DNS name or name of the loadbalancer used in the prereq steps, generate the base configuration files for the Talos machines:
$ talosctl gen config talos-k8s-vmware-tutorial https://<load balancer IP or DNS>:<port>
created init.yaml
created controlplane.yaml
created join.yaml
created talosconfig
$ talosctl gen config talos-k8s-vmware-tutorial https://<DNS name>:6443
created init.yaml
created controlplane.yaml
created join.yaml
created talosconfig
At this point, you can modify the generated configs to your liking.
Validate the Configuration Files
$ talosctl validate --config init.yaml --mode cloud
init.yaml is valid for cloud mode
$ talosctl validate --config controlplane.yaml --mode cloud
controlplane.yaml is valid for cloud mode
$ talosctl validate --config join.yaml --mode cloud
join.yaml is valid for cloud mode
Set Environment Variables
govc makes use of the following environment variables
A talos.ova asset is published with each release.
We will refer to the version of the release as $TALOS_VERSION below.
It can be easily exported with export TALOS_VERSION="v0.3.0-alpha.10" or similar.
We’ll need to repeat this step for each Talos node we want to create.
In a typical HA setup, we’ll have 3 control plane nodes and N workers.
In the following example, we’ll setup a HA control plane with two worker nodes.
Talos makes use of the guestinfo facility of VMware to provide the machine/cluster configuration.
This can be set using the govc vm.change command.
To facilitate persistent storage using the vSphere cloud provider integration with Kubernetes, disk.enableUUID=1 is used.
Talos is known to work on Xen; however, it is currently undocumented.
4 - Cloud Platforms
4.1 - AWS
Creating a cluster via the AWS CLI.
Creating a Cluster via the AWS CLI
In this guide we will create an HA Kubernetes cluster with 3 worker nodes.
We assume an existing VPC, and some familiarity with AWS.
If you need more information on AWS specifics, please see the official AWS documentation.
Take note of the DNS name and ARN.
We will need these soon.
Create the Machine Configuration Files
Generating Base Configurations
Using the DNS name of the loadbalancer created earlier, generate the base configuration files for the Talos machines:
$ talosctl gen config talos-k8s-aws-tutorial https://<load balancer IP or DNS>:<port>
created init.yaml
created controlplane.yaml
created join.yaml
created talosconfig
Take note that in this version of Talos, the generated configs are too long for AWS userdata field.
Comments can be removed to workaround this with a sed command like:
In this guide we will create an HA Kubernetes cluster with 1 worker node.
We assume existing Blob Storage, and some familiarity with Azure.
If you need more information on Azure specifics, please see the official Azure documentation.
Environment Setup
We’ll make use of the following environment variables throughout the setup.
Edit the variables below with your correct information.
# Storage account to useexportSTORAGE_ACCOUNT="StorageAccountName"# Storage container to upload toexportSTORAGE_CONTAINER="StorageContainerName"# Resource group nameexportGROUP="ResourceGroupName"# LocationexportLOCATION="centralus"# Get storage account connection string based on info aboveexportCONNECTION=$(az storage account show-connection-string \
-n $STORAGE_ACCOUNT\
-g $GROUP\
-o tsv)
Create the Image
First, download the Azure image from a Talos release.
Once downloaded, untar with tar -xvf /path/to/azure-amd64.tar.gz
Upload the VHD
Once you have pulled down the image, you can upload it to blob storage with:
Now that the image is present in our blob storage, we’ll register it.
az image create \
--name talos \
--source https://$STORAGE_ACCOUNT.blob.core.windows.net/$STORAGE_CONTAINER/talos-azure.vhd \
--os-type linux \
-g $GROUP
Network Infrastructure
Virtual Networks and Security Groups
Once the image is prepared, we’ll want to work through setting up the network.
Issue the following to create a network security group and add rules to it.
In Azure, we have to pre-create the NICs for our control plane so that they can be associated with our load balancer.
for i in $( seq 012); do# Create public IP for each nic az network public-ip create \
--resource-group $GROUP\
--name talos-controlplane-public-ip-$i\
--allocation-method static
# Create nic az network nic create \
--resource-group $GROUP\
--name talos-controlplane-nic-$i\
--vnet-name talos-vnet \
--subnet talos-subnet \
--network-security-group talos-sg \
--public-ip-address talos-controlplane-public-ip-$i\
--lb-name talos-lb \
--lb-address-pools talos-be-pool
done
Cluster Configuration
With our networking bits setup, we’ll fetch the IP for our load balancer and create our configuration files.
LB_PUBLIC_IP=$(az network public-ip show \
--resource-group $GROUP\
--name talos-public-ip \
--query [ipAddress]\
--output tsv)talosctl gen config talos-k8s-azure-tutorial https://${LB_PUBLIC_IP}:6443
Compute Creation
We are now ready to create our azure nodes.
# Create availability setaz vm availability-set create \
--name talos-controlplane-av-set \
-g $GROUP# Create controlplane 0az vm create \
--name talos-controlplane-0 \
--image talos \
--custom-data ./init.yaml \
-g $GROUP\
--admin-username talos \
--generate-ssh-keys \
--verbose \
--boot-diagnostics-storage $STORAGE_ACCOUNT\
--os-disk-size-gb 20\
--nics talos-controlplane-nic-0 \
--availability-set talos-controlplane-av-set \
--no-wait
# Create 2 more controlplane nodesfor i in $( seq 12); do az vm create \
--name talos-controlplane-$i\
--image talos \
--custom-data ./controlplane.yaml \
-g $GROUP\
--admin-username talos \
--generate-ssh-keys \
--verbose \
--boot-diagnostics-storage $STORAGE_ACCOUNT\
--os-disk-size-gb 20\
--nics talos-controlplane-nic-$i\
--availability-set talos-controlplane-av-set \
--no-wait
done# Create worker node az vm create \
--name talos-worker-0 \
--image talos \
--vnet-name talos-vnet \
--subnet talos-subnet \
--custom-data ./join.yaml \
-g $GROUP\
--admin-username talos \
--generate-ssh-keys \
--verbose \
--boot-diagnostics-storage $STORAGE_ACCOUNT\
--nsg talos-sg \
--os-disk-size-gb 20\
--no-wait
# NOTES:# `--admin-username` and `--generate-ssh-keys` are required by the az cli,# but are not actually used by talos# `--os-disk-size-gb` is the backing disk for Kubernetes and any workload containers# `--boot-diagnostics-storage` is to enable console output which may be necessary# for troubleshooting
Retrieve the kubeconfig
You should now be able to interact with your cluster with talosctl.
We will need to discover the public IP for our first control plane node first.
In this guide we will create an HA Kubernetes cluster with 1 worker node.
We assume an existing Space, and some familiarity with DigitalOcean.
If you need more information on DigitalOcean specifics, please see the official DigitalOcean documentation.
Create the Image
First, download the DigitalOcean image from a Talos release.
Extract the archive to get the disk.raw file, compress it using gzip to disk.raw.gz.
Using an upload method of your choice (doctl does not have Spaces support), upload the image to a space.
Now, create an image using the URL of the uploaded image:
We will need the IP of the load balancer.
Using the ID of the load balancer, run:
doctl compute load-balancer get --format IP <load balancer ID>
Save it, as we will need it in the next step.
Create the Machine Configuration Files
Generating Base Configurations
Using the DNS name of the loadbalancer created earlier, generate the base configuration files for the Talos machines:
$ talosctl gen config talos-k8s-digital-ocean-tutorial https://<load balancer IP or DNS>:<port>
created init.yaml
created controlplane.yaml
created join.yaml
created talosconfig
At this point, you can modify the generated configs to your liking.
Validate the Configuration Files
$ talosctl validate --config init.yaml --mode cloud
init.yaml is valid for cloud mode
$ talosctl validate --config controlplane.yaml --mode cloud
controlplane.yaml is valid for cloud mode
$ talosctl validate --config join.yaml --mode cloud
join.yaml is valid for cloud mode
Note: Although SSH is not used by Talos, DigitalOcean still requires that an SSH key be associated with the droplet.
Create a dummy key that can be used to satisfy this requirement.
Create the Remaining Control Plane Nodes
Run the following twice, to give ourselves three total control plane nodes:
Creating a cluster via the CLI on Google Cloud Platform.
Creating a Cluster via the CLI
In this guide, we will create an HA Kubernetes cluster in GCP with 1 worker node.
We will assume an existing Cloud Storage bucket, and some familiarity with Google Cloud.
If you need more information on Google Cloud specifics, please see the official Google documentation.
Environment Setup
We’ll make use of the following environment variables throughout the setup.
Edit the variables below with your correct information.
# Storage account to useexportSTORAGE_BUCKET="StorageBucketName"# RegionexportREGION="us-central1"
Create the Image
First, download the Google Cloud image from a Talos release.
These images are called gcp-$ARCH.tar.gz.
Upload the Image
Once you have downloaded the image, you can upload it to your storage bucket with:
Once the image is prepared, we’ll want to work through setting up the network.
Issue the following to create a firewall, load balancer, and their required components.
In this guide, we will create an HA Kubernetes cluster in OpenStack with 1 worker node.
We will assume an existing some familiarity with OpenStack.
If you need more information on OpenStack specifics, please see the official OpenStack documentation.
Environment Setup
You should have an existing openrc file.
This file will provide environment variables necessary to talk to your OpenStack cloud.
See here for instructions on fetching this file.
Create the Image
First, download the OpenStack image from a Talos release.
These images are called openstack-$ARCH.tar.gz.
Untar this file with tar -xvf openstack-$ARCH.tar.gz.
The resulting file will be called disk.raw.
Upload the Image
Once you have the image, you can upload to OpenStack with:
openstack image create --public --disk-format raw --file disk.raw talos
Network Infrastructure
Load Balancer and Network Ports
Once the image is prepared, you will need to work through setting up the network.
Issue the following to create a load balancer, the necessary network ports for each control plane node, and associations between the two.
Creating loadbalancer:
# Create load balancer, updating vip-subnet-id if necessaryopenstack loadbalancer create --name talos-control-plane --vip-subnet-id public
# Create listeneropenstack loadbalancer listener create --name talos-control-plane-listener --protocol TCP --protocol-port 6443 talos-control-plane
# Pool and health monitoringopenstack loadbalancer pool create --name talos-control-plane-pool --lb-algorithm ROUND_ROBIN --listener talos-control-plane-listener --protocol TCP
openstack loadbalancer healthmonitor create --delay 5 --max-retries 4 --timeout 10 --type TCP talos-control-plane-pool
Creating ports:
# Create ports for control plane nodes, updating network name if necessaryopenstack port create --network shared talos-control-plane-1
openstack port create --network shared talos-control-plane-2
openstack port create --network shared talos-control-plane-3
# Create floating IPs for the ports, so that you will have talosctl connectivity to each control planeopenstack floating ip create --port talos-control-plane-1 public
openstack floating ip create --port talos-control-plane-2 public
openstack floating ip create --port talos-control-plane-3 public
Note: Take notice of the private and public IPs associated with each of these ports, as they will be used in the next step.
Additionally, take node of the port ID, as it will be used in server creation.
Associate port’s private IPs to loadbalancer:
# Create members for each port IP, updating subnet-id and address as necessary.openstack loadbalancer member create --subnet-id shared-subnet --address <PRIVATE IP OF talos-control-plane-1 PORT> --protocol-port 6443 talos-control-plane-pool
openstack loadbalancer member create --subnet-id shared-subnet --address <PRIVATE IP OF talos-control-plane-2 PORT> --protocol-port 6443 talos-control-plane-pool
openstack loadbalancer member create --subnet-id shared-subnet --address <PRIVATE IP OF talos-control-plane-3 PORT> --protocol-port 6443 talos-control-plane-pool
Security Groups
This example uses the default security group in OpenStack.
Ports have been opened to ensure that connectivity from both inside and outside the group is possible.
You will want to allow, at a minimum, ports 6443 (Kubernetes API server) and 50000 (Talos API) from external sources.
It is also recommended to allow communication over all ports from within the subnet.
Cluster Configuration
With our networking bits setup, we’ll fetch the IP for our load balancer and create our configuration files.
LB_PUBLIC_IP=$(openstack loadbalancer show talos-control-plane -f json | jq -r .vip_address)talosctl gen config talos-k8s-openstack-tutorial https://${LB_PUBLIC_IP}:6443
Compute Creation
We are now ready to create our OpenStack nodes.
Create control plane:
# Create control plane 1. Substitute the correct path to configuration files and the desired flavor.openstack server create talos-control-plane-1 --flavor m1.small --nic port-id=talos-control-plane-1 --image talos --user-data /path/to/init.yaml
# Create control planes 2 and 3, substituting the same info.for i in $( seq 23); do openstack server create talos-control-plane-$i --flavor m1.small --nic port-id=talos-control-plane-$i --image talos --user-data /path/to/controlplane.yaml
done
Create worker:
# Update network name as necessary.openstack server create talos-worker-1 --flavor m1.small --network shared --image talos --user-data /path/to/join.yaml
Note: This step can be repeated to add more workers.
Retrieve the kubeconfig
You should now be able to interact with your cluster with talosctl.
We will use one of the floating IPs we allocated earlier.
It does not matter which one.
In this guide we will create a Kubernetes cluster in Docker, using a containerized version of Talos.
Running Talos in Docker is intended to be used in CI pipelines, and local testing when you need a quick and easy cluster.
Furthermore, if you are running Talos in production, it provides an excellent way for developers to develop against the same version of Talos.
Requirements
The follow are requirements for running Talos in Docker:
Due to the fact that Talos runs in a container, certain APIs are not available when running in Docker.
For example upgrade, reset, and APIs like these don’t apply in container mode.
Create the Cluster
Creating a local cluster is as simple as:
talosctl cluster create --wait
Once the above finishes successfully, your talosconfig(~/.talos/config) will be configured to point to the new cluster.
If you are running on MacOS, an additional command is required:
talosctl config --endpoints 127.0.0.1
Note: Startup times can take up to a minute before the cluster is available.
Once the cluster is available, you can make use of talosctl and kubectl to interact with the cluster.
For example, to view current running containers, run talosctl containers for a list of containers in the system namespace, or talosctl containers -k for the k8s.io namespace.
To view the logs of a container, use talosctl logs <container> or talosctl logs -k <container>.
Cleaning Up
To cleanup, run:
talosctl cluster destroy
5.2 - Firecracker
Creating Talos Kubernetes cluster using Firecracker VMs.
In this guide we will create a Kubernetes cluster using Firecracker.
Note: Talos on QEMU offers easier way to run Talos in a set of VMs.
Requirements
Linux
a kernel with
KVM enabled (/dev/kvm must exist)
CONFIG_NET_SCH_NETEM enabled
CONFIG_NET_SCH_INGRESS enabled
at least CAP_SYS_ADMIN and CAP_NET_ADMIN capabilities
go get -d github.com/awslabs/tc-redirect-tap/cmd/tc-redirect-tap
cd$GOPATH/src/github.com/awslabs/tc-redirect-tap
make all
sudo cp tc-redirect-tap /opt/cni/bin
Note: if $GOPATH is not set, it defaults to ~/go.
Install Talos kernel and initramfs
Firecracker provisioner depends on Talos uncompressed kernel (vmlinuz) and initramfs (initramfs.xz).
These files can be downloaded from the Talos release:
Once the above finishes successfully, your talosconfig(~/.talos/config) will be configured to point to the new cluster.
Retrieve and Configure the kubeconfig
talosctl kubeconfig .
Using the Cluster
Once the cluster is available, you can make use of talosctl and kubectl to interact with the cluster.
For example, to view current running containers, run talosctl containers for a list of containers in the system namespace, or talosctl containers -k for the k8s.io namespace.
To view the logs of a container, use talosctl logs <container> or talosctl logs -k <container>.
A bridge interface will be created, and assigned the default IP 10.5.0.1.
Each node will be directly accessible on the subnet specified at cluster creation time.
A loadbalancer runs on 10.5.0.1 by default, which handles loadbalancing for the Talos, and Kubernetes APIs.
You can see a summary of the cluster state by running:
$ talosctl cluster show --provisioner firecracker
PROVISIONER firecracker
NAME talos-default
NETWORK NAME talos-default
NETWORK CIDR 10.5.0.0/24
NETWORK GATEWAY 10.5.0.1
NETWORK MTU 1500NODES:
NAME TYPE IP CPU RAM DISK
talos-default-master-1 Init 10.5.0.2 1.00 1.6 GB 4.3 GB
talos-default-master-2 ControlPlane 10.5.0.3 1.00 1.6 GB 4.3 GB
talos-default-master-3 ControlPlane 10.5.0.4 1.00 1.6 GB 4.3 GB
talos-default-worker-1 Join 10.5.0.5 1.00 1.6 GB 4.3 GB
Note: In that case that the host machine is rebooted before destroying the cluster, you may need to manually remove ~/.talos/clusters/talos-default.
Manual Clean Up
The talosctl cluster destroy command depends heavily on the clusters state directory.
It contains all related information of the cluster.
The PIDs and network associated with the cluster nodes.
If you happened to have deleted the state folder by mistake or you would like to cleanup
the environment, here are the steps how to do it manually:
Stopping VMs
Find the process of firecracker --api-sock execute:
ps -elf | grep '[f]irecracker --api-sock'
To stop the VMs manually, execute:
sudo kill -s SIGTERM <PID>
Example output, where VMs are running with PIDs 158065 and 158216
This is more tricky part as if you have already deleted the state folder.
If you didn’t then it is written in the state.yaml in the
/root/.talos/clusters/<cluster-name> directory.
In this guide we will create a Kubernetes cluster using QEMU.
Video Walkthrough
To see a live demo of this writeup, see the video below:
Requirements
Linux
a kernel with
KVM enabled (/dev/kvm must exist)
CONFIG_NET_SCH_NETEM enabled
CONFIG_NET_SCH_INGRESS enabled
at least CAP_SYS_ADMIN and CAP_NET_ADMIN capabilities
QEMU
bridge, static and firewall CNI plugins from the standard CNI plugins, and tc-redirect-tap CNI plugin from the awslabs tc-redirect-tap installed to /opt/cni/bin (installed automatically by talosctl)
iptables
/var/run/netns directory should exist
Installation
How to get QEMU
Install QEMU with your operating system package manager.
For example, on Ubuntu for x86:
Before the first cluster is created, talosctl will download the CNI bundle for the VM provisioning and install it to ~/.talos/cni directory.
Once the above finishes successfully, your talosconfig (~/.talos/config) will be configured to point to the new cluster, and kubeconfig will be
downloaded and merged into default kubectl config location (~/.kube/config).
Once the cluster is available, you can make use of talosctl and kubectl to interact with the cluster.
For example, to view current running containers, run talosctl -n 10.5.0.2 containers for a list of containers in the system namespace, or talosctl -n 10.5.0.2 containers -k for the k8s.io namespace.
To view the logs of a container, use talosctl -n 10.5.0.2 logs <container> or talosctl -n 10.5.0.2 logs -k <container>.
A bridge interface will be created, and assigned the default IP 10.5.0.1.
Each node will be directly accessible on the subnet specified at cluster creation time.
A loadbalancer runs on 10.5.0.1 by default, which handles loadbalancing for the Kubernetes APIs.
You can see a summary of the cluster state by running:
$ talosctl cluster show --provisioner qemu
PROVISIONER qemu
NAME talos-default
NETWORK NAME talos-default
NETWORK CIDR 10.5.0.0/24
NETWORK GATEWAY 10.5.0.1
NETWORK MTU 1500NODES:
NAME TYPE IP CPU RAM DISK
talos-default-master-1 Init 10.5.0.2 1.00 1.6 GB 4.3 GB
talos-default-master-2 ControlPlane 10.5.0.3 1.00 1.6 GB 4.3 GB
talos-default-master-3 ControlPlane 10.5.0.4 1.00 1.6 GB 4.3 GB
talos-default-worker-1 Join 10.5.0.5 1.00 1.6 GB 4.3 GB
Note: In that case that the host machine is rebooted before destroying the cluster, you may need to manually remove ~/.talos/clusters/talos-default.
Manual Clean Up
The talosctl cluster destroy command depends heavily on the clusters state directory.
It contains all related information of the cluster.
The PIDs and network associated with the cluster nodes.
If you happened to have deleted the state folder by mistake or you would like to cleanup
the environment, here are the steps how to do it manually:
Remove VM Launchers
Find the process of talosctl qemu-launch:
ps -elf | grep 'talosctl qemu-launch'
To remove the VMs manually, execute:
sudo kill -s SIGTERM <PID>
Example output, where VMs are running with PIDs 157615 and 157617
This is more tricky part as if you have already deleted the state folder.
If you didn’t then it is written in the state.yaml in the
~/.talos/clusters/<cluster-name> directory.
In order to install Talos in VirtualBox, you will need the ISO image from the Talos release page.
You can download talos-amd64.iso via
github.com/talos-systems/talos/releases
Start by creating a new VM by clicking the “New” button in the VirtualBox UI:
Supply a name for this VM, and specify the Type and Version:
Edit the memory to supply at least 2GB of RAM for the VM:
Proceed through the disk settings, keeping the defaults.
You can increase the disk space if desired.
Once created, select the VM and hit “Settings”:
In the “System” section, supply at least 2 CPUs:
In the “Network” section, switch the network “Attached To” section to “Bridged Adapter”:
Finally, in the “Storage” section, select the optical drive and, on the right, select the ISO by browsing your filesystem:
Repeat this process for a second VM to use as a worker node.
You can also repeat this for additional nodes desired.
Start Control Plane Node
Once the VMs have been created and updated, start the VM that will be the first control plane node.
This VM will boot the ISO image specified earlier and enter “maintenance mode”.
Once the machine has entered maintenance mode, there will be a console log that details the IP address that the node received.
Take note of this IP address, which will be referred to as $CONTROL_PLANE_IP for the rest of this guide.
If you wish to export this IP as a bash variable, simply issue a command like export CONTROL_PLANE_IP=1.2.3.4.
Generate Machine Configurations
With the IP address above, you can now generate the machine configurations to use for installing Talos and Kubernetes.
Issue the following command, updating the output directory, cluster name, and control plane IP as you see fit:
talosctl gen config talos-vbox-cluster https://$CONTROL_PLANE_IP:6443 --output-dir _out
This will create several files in the _out directory: init.yaml, controlplane.yaml, join.yaml, and talosconfig.
Create Control Plane Node
Using the init.yaml generated above, you can now apply this config using talosctl.
Issue:
You should now see some action in the VirtualBox console for this VM.
Talos will be installed to disk, the VM will reboot, and then Talos will configure the Kubernetes control plane on this VM.
Note: This process can be repeated multiple times to create an HA control plane.
Simply apply controlplane.yaml instead of init.yaml for subsequent nodes.
Create Worker Node
Create at least a single worker node using a process similar to the control plane creation above.
Start the worker node VM and wait for it to enter “maintenance mode”.
Take note of the worker node’s IP address, which will be referred to as $WORKER_IP
Note: This process can be repeated multiple times to add additional workers.
Using the Cluster
Once the cluster is available, you can make use of talosctl and kubectl to interact with the cluster.
For example, to view current running containers, run talosctl containers for a list of containers in the system namespace, or talosctl containers -k for the k8s.io namespace.
To view the logs of a container, use talosctl logs <container> or talosctl logs -k <container>.
First, configure talosctl to talk to your control plane node by issuing the following, updating paths and IPs as necessary:
Insert the SD card to your board, turn it on and wait for the console to show you the instructions for bootstrapping the node.
Following the instructions in the console output to connect to the interactive installer:
talosctl apply-config --insecure --interactive --nodes <node IP or DNS name>
Once the interactive installation is applied, the cluster will form and you can then use kubectl.
Retrieve the kubeconfig
Retrieve the admin kubeconfig by running:
talosctl kubeconfig
6.2 - Libre Computer Board ALL-H3-CC
Installing Talos on Libre Computer Board ALL-H3-CC SBC using raw disk image.
Insert the SD card to your board, turn it on and wait for the console to show you the instructions for bootstrapping the node.
Following the instructions in the console output to connect to the interactive installer:
talosctl apply-config --insecure --interactive --nodes <node IP or DNS name>
Once the interactive installation is applied, the cluster will form and you can then use kubectl.
Retrieve the kubeconfig
Retrieve the admin kubeconfig by running:
talosctl kubeconfig
6.3 - Pine64 Rock64
Installing Talos on Pine64 Rock64 SBC using raw disk image.
Insert the SD card to your board, turn it on and wait for the console to show you the instructions for bootstrapping the node.
Following the instructions in the console output to connect to the interactive installer:
talosctl apply-config --insecure --interactive --nodes <node IP or DNS name>
Once the interactive installation is applied, the cluster will form and you can then use kubectl.
Retrieve the kubeconfig
Retrieve the admin kubeconfig by running:
talosctl kubeconfig
6.4 - Radxa ROCK PI 4c
Installing Talos on Radxa ROCK PI 4c SBC using raw disk image.
Insert the SD card to your board, turn it on and wait for the console to show you the instructions for bootstrapping the node.
Following the instructions in the console output to connect to the interactive installer:
talosctl apply-config --insecure --interactive --nodes <node IP or DNS name>
Once the interactive installation is applied, the cluster will form and you can then use kubectl.
Retrieve the kubeconfig
Retrieve the admin kubeconfig by running:
talosctl kubeconfig
6.5 - Raspberry Pi 4 Model B
Installing Talos on Rpi4 SBC using raw disk image.
Video Walkthrough
To see a live demo of this writeup, see the video below:
At least version v2020.09.03-138a1 of the bootloader (rpi-eeprom) is required.
To update the bootloader we will need an SD card.
Insert the SD card into your computer and run the following:
The path to your SD card can be found using fdisk on Linux or diskutil on Mac OS.
In this example we will assume /dev/mmcblk0.
Remove the SD card from your local machine and insert it into the Raspberry Pi.
Power the Raspberry Pi on, and wait at least 10 seconds.
If successful, the green LED light will blink rapidly (forever), otherwise an error pattern will be displayed.
If an HDMI display is attached then the screen will display green for success or red if a failure occurs.
Power off the Raspberry Pi and remove the SD card from it.
Note: Updating the bootloader only needs to be done once.
Insert the SD card to your board, turn it on and wait for the console to show you the instructions for bootstrapping the node.
Following the instructions in the console output to connect to the interactive installer:
talosctl apply-config --insecure --interactive --nodes <node IP or DNS name>
Once the interactive installation is applied, the cluster will form and you can then use kubectl.
Retrieve the kubeconfig
Retrieve the admin kubeconfig by running:
talosctl kubeconfig
Troubleshooting
The following table can be used to troubleshoot booting issues:
Long Flashes
Short Flashes
Status
0
3
Generic failure to boot
0
4
start*.elf not found
0
7
Kernel image not found
0
8
SDRAM failure
0
9
Insufficient SDRAM
0
10
In HALT state
2
1
Partition not FAT
2
2
Failed to read from partition
2
3
Extended partition not FAT
2
4
File signature/hash mismatch - Pi 4
4
4
Unsupported board type
4
5
Fatal firmware error
4
6
Power failure type A
4
7
Power failure type B
7 - Guides
7.1 - Advanced Networking
Static Addressing
Static addressing is comprised of specifying cidr, routes ( remember to add your default gateway ), and interface.
Most likely you’ll also want to define the nameservers so you have properly functioning DNS.
In some environments you may need to set additional addresses on an interface.
In the following example, we set two additional addresses on the loopback interface.
In this guide we will create a Talos cluster running in an air-gapped environment with all the required images being pulled from an internal registry.
We will use the QEMU provisioner available in talosctl to create a local cluster, but the same approach could be used to deploy Talos in bigger air-gapped networks.
In air-gapped environments, access to the public Internet is restricted, so Talos can’t pull images from public Docker registries (docker.io, ghcr.io, etc.)
We need to identify the images required to install and run Talos.
The same strategy can be used for images required by custom workloads running on the cluster.
The talosctl images command provides a list of default images used by the Talos cluster (with default configuration
settings).
To print the list of images, run:
talosctl images
This list contains images required by a default deployment of Talos.
There might be additional images required for the workloads running on this cluster, and those should be added to this list.
Preparing the Internal Registry
As access to the public registries is restricted, we have to run an internal Docker registry.
In this guide, we will launch the registry on the same machine using Docker:
This registry will be accepting connections on port 6000 on the host IPs.
The registry is empty by default, so we have fill it with the images required by Talos.
First, we pull all the images to our local Docker daemon:
$ for image in `talosctl images`; do docker pull $image; donev0.12.0-amd64: Pulling from coreos/flannel
Digest: sha256:6d451d92c921f14bfb38196aacb6e506d4593c5b3c9d40a8b8a2506010dc3e10
...
All images are now stored in the Docker daemon store:
$ docker images
ghcr.io/talos-systems/install-cni v0.3.0-12-g90722c3 980d36ee2ee1 5 days ago 79.7MB
k8s.gcr.io/kube-proxy-amd64 v1.20.0 33c60812eab8 2 weeks ago 118MB
...
Now we need to re-tag them so that we can push them to our local registry.
We are going to replace the first component of the image name (before the first slash) with our registry endpoint 127.0.0.1:6000:
$ for image in `talosctl images`; do\
docker tag $image`echo$image | sed -E 's#^[^/]+/#127.0.0.1:6000/#'`\
done
As the next step, we push images to the internal registry:
$ for image in `talosctl images`; do\
docker push `echo$image | sed -E 's#^[^/]+/#127.0.0.1:6000/#'`\
done
We can now verify that the images are pushed to the registry:
Note: images in the registry don’t have the registry endpoint prefix anymore.
Launching Talos in an Air-gapped Environment
For Talos to use the internal registry, we use the registry mirror feature to redirect all the image pull requests to the internal registry.
This means that the registry endpoint (as the first component of the image reference) gets ignored, and all pull requests are sent directly to the specified endpoint.
We are going to use a QEMU-based Talos cluster for this guide, but the same approach works with Docker-based clusters as well.
As QEMU-based clusters go through the Talos install process, they can be used better to model a real air-gapped environment.
The talosctl cluster create command provides conveniences for common configuration options.
The only required flag for this guide is --registry-mirror '*'=http://10.5.0.1:6000 which redirects every pull request to the internal registry.
The endpoint being used is 10.5.0.1, as this is the default bridge interface address which will be routable from the QEMU VMs (127.0.0.1 IP will be pointing to the VM itself).
$ sudo -E talosctl cluster create --provisioner=qemu --registry-mirror '*'=http://10.5.0.1:6000 --install-image=ghcr.io/talos-systems/installer:v0.9.0
validating CIDR and reserving IPs
generating PKI and tokens
creating state directory in "/home/smira/.talos/clusters/talos-default"creating network talos-default
creating load balancer
creating dhcpd
creating master nodes
creating worker nodes
waiting for API
...
Note: --install-image should match the image which was copied into the internal registry in the previous step.
You can be verify that the cluster is air-gapped by inspecting the registry logs: docker logs -f registry-airgapped.
Closing Notes
Running in an air-gapped environment might require additional configuration changes, for example using custom settings for DNS and NTP servers.
When scaling this guide to the bare-metal environment, following Talos config snippet could be used as an equivalent of the --registry-mirror flag above:
Other implementations of Docker registry can be used in place of the Docker registry image used above to run the registry.
If required, auth can be configured for the internal registry (and custom TLS certificates if needed).
7.3 - Configuring Certificate Authorities
Appending the Certificate Authority
Put into each machine the PEM encoded certificate:
Create cluster like normal and see that metrics are now present on this port:
$ curl 127.0.0.1:11234/v1/metrics
# HELP container_blkio_io_service_bytes_recursive_bytes The blkio io service bytes recursive# TYPE container_blkio_io_service_bytes_recursive_bytes gaugecontainer_blkio_io_service_bytes_recursive_bytes{container_id="0677d73196f5f4be1d408aab1c4125cf9e6c458a4bea39e590ac779709ffbe14",device="/dev/dm-0",major="253",minor="0",namespace="k8s.io",op="Async"}0container_blkio_io_service_bytes_recursive_bytes{container_id="0677d73196f5f4be1d408aab1c4125cf9e6c458a4bea39e590ac779709ffbe14",device="/dev/dm-0",major="253",minor="0",namespace="k8s.io",op="Discard"}0...
...
7.5 - Configuring Corporate Proxies
Appending the Certificate Authority of MITM Proxies
Put into each machine the PEM encoded certificate:
The simplest way to deploy Talos is by ensuring that all the remote components of the system (talosctl, the control plane nodes, and worker nodes) all have layer 2 connectivity.
This is not always possible, however, so this page lays out the minimal network access that is required to configure and operate a talos cluster.
Note: These are the ports required for Talos specifically, and should be configured in addition to the ports required by kubernetes.
See the kubernetes docs for information on the ports used by kubernetes itself.
Ports marked with a * are not currently configurable, but that may change in the future.
Follow along here.
7.7 - Configuring Pull Through Cache
In this guide we will create a set of local caching Docker registry proxies to minimize local cluster startup time.
When running Talos locally, pulling images from Docker registries might take a significant amount of time.
We spin up local caching pass-through registries to cache images and configure a local Talos cluster to use those proxies.
A similar approach might be used to run Talos in production in air-gapped environments.
It can be also used to verify that all the images are available in local registries.
Video Walkthrough
To see a live demo of this writeup, see the video below:
Requirements
The follow are requirements for creating the set of caching proxies:
Docker 18.03 or greater
Local cluster requirements for either docker or QEMU.
Launch the Caching Docker Registry Proxies
Talos pulls from docker.io, k8s.gcr.io, gcr.io, ghcr.io and quay.io by default.
If your configuration is different, you might need to modify the commands below:
Note: Proxies are started as docker containers, and they’re automatically configured to start with Docker daemon.
Please note that quay.io proxy doesn’t support recent Docker image schema, so we run older registry image version (2.5).
As a registry container can only handle a single upstream Docker registry, we launch a container per upstream, each on its own
host port (5000, 5001, 5002, 5003 and 5004).
Using Caching Registries with QEMU Local Cluster
With a QEMU local cluster, a bridge interface is created on the host.
As registry containers expose their ports on the host, we can use bridge IP to direct proxy requests.
The Talos local cluster should now start pulling via caching registries.
This can be verified via registry logs, e.g. docker logs -f registry-docker.io.
The first time cluster boots, images are pulled and cached, so next cluster boot should be much faster.
Note: 10.5.0.1 is a bridge IP with default network (10.5.0.0/24), if using custom --cidr, value should be adjusted accordingly.
Using Caching Registries with docker Local Cluster
With a docker local cluster we can use docker bridge IP, default value for that IP is 172.17.0.1.
On Linux, the docker bridge address can be inspected with ip addr show docker0.
Note: Removing docker registry containers also removes the image cache.
So if you plan to use caching registries, keep the containers running.
7.8 - Configuring the Cluster Endpoint
In this section, we will step through the configuration of a Talos based Kubernetes cluster.
There are three major components we will configure:
apid and talosctl
the master nodes
the worker nodes
Talos enforces a high level of security by using mutual TLS for authentication and authorization.
We recommend that the configuration of Talos be performed by a cluster owner.
A cluster owner should be a person of authority within an organization, perhaps a director, manager, or senior member of a team.
They are responsible for storing the root CA, and distributing the PKI for authorized cluster administrators.
Recommended settings
Talos runs great out of the box, but if you tweak some minor settings it will make your life
a lot easier in the future.
This is not a requirement, but rather a document to explain some key settings.
Endpoint
To configure the talosctl endpoint, it is recommended you use a resolvable DNS name.
This way, if you decide to upgrade to a multi-controlplane cluster you only have to add the ip adres to the hostname configuration.
The configuration can either be done on a Loadbalancer, or simply trough DNS.
For example:
This is in the config file for the cluster e.g. init.yaml, controlplane.yaml and join.yaml.
for more details, please see: v1alpha1 endpoint configuration
If you have a DNS name as the endpoint, you can upgrade your talos cluster with multiple controlplanes in the future (if you don’t have a multi-controlplane setup from the start)
Using a DNS name generates the corresponding Certificates (Kubernetes and Talos) for the correct hostname.
7.9 - Configuring Wireguard Network
In this guide you will learn how to set up Wireguard network using Kernel module.
Configuring Wireguard Network
Quick Start
The quickest way to try out Wireguard is to use talosctl cluster create command:
It will automatically generate Wireguard network configuration for each node with the following network topology:
Where all controlplane nodes will be used as Wireguard servers which listen on port 51111.
All controlplanes and workers will connect to all controlplanes.
It also sets PersistentKeepalive to 5 seconds to establish controlplanes to workers connection.
After the cluster is deployed it should be possible to verify Wireguard network connectivity.
It is possible to deploy a container with hostNetwork enabled, then do kubectl exec <container> /bin/bash and either do:
ping 10.1.0.2
Or install wireguard-tools package and run:
wg show
Wireguard show should output something like this:
interface: wg0
public key: OMhgEvNIaEN7zeCLijRh4c+0Hwh3erjknzdyvVlrkGM= private key: (hidden) listening port: 47946peer: 1EsxUygZo8/URWs18tqB5FW2cLVlaTA+lUisKIf8nh4= endpoint: 10.5.0.2:51111
allowed ips: 10.1.0.0/24
latest handshake: 1 minute, 55 seconds ago
transfer: 3.17 KiB received, 3.55 KiB sent
persistent keepalive: every 5 seconds
It is also possible to use generated configuration as a reference by pulling generated config files using:
All Wireguard configuration can be done by changing Talos machine config files.
As an example we will use this official Wireguard quick start tutorial.
Key Generation
This part is exactly the same:
wg genkey | tee privatekey | wg pubkey > publickey
Setting up Device
Inline comments show relations between configs and wg quickstart tutorial commands:
...
network:
interfaces:
...
# ip link add dev wg0 type wireguard - interface: wg0
mtu: 1500# ip address add dev wg0 192.168.2.1/24cidr: 192.168.2.1/24
# wg set wg0 listen-port 51820 private-key /path/to/private-key peer ABCDEF... allowed-ips 192.168.88.0/24 endpoint 209.202.254.14:8172wireguard:
privateKey: <privatekey file contents>
listenPort: 51820peers:
allowedIPs:
- 192.168.88.0/24
endpoint: 209.202.254.14.8172publicKey: ABCDEF...
...
When networkd gets this configuration it will create the device, configure it and will bring it up (equivalent to ip link set up dev wg0).
How to convert Talos self-hosted Kubernetes control plane (pre-0.9) to static pods based one.
Talos version 0.9 runs Kubernetes control plane in a new way: static pods managed by Talos.
Talos version 0.8 and below runs self-hosted control plane.
After Talos OS upgrade to version 0.9 Kubernetes control plane should be converted to run as static pods.
This guide describes automated conversion script and also shows detailed manual conversion process.
Video Walkthrough
To see a live demo of this writeup, see the video below:
Automated Conversion
First, make sure all nodes are updated to Talos 0.9:
$ talosctl -n <IP> convert-k8s
discovered master nodes ["172.20.0.2""172.20.0.3""172.20.0.4"]current self-hosted status: truegathering control plane configuration
aggregator CA key can't be recovered from bootkube-boostrapped control plane, generating new CA
patching master node "172.20.0.2" configuration
patching master node "172.20.0.3" configuration
patching master node "172.20.0.4" configuration
waiting for static pod definitions to be generated
waiting for manifests to be generated
Talos generated control plane static pod definitions and bootstrap manifests, please verify them with commands:
talosctl -n <master node IP> get StaticPods.kubernetes.talos.dev
talosctl -n <master node IP> get Manifests.kubernetes.talos.dev
in order to remove self-hosted control plane, pod-checkpointer component needs to be disabled
once pod-checkpointer is disabled, the cluster shouldn't be rebooted until the entire conversion process is completeconfirm disabling pod-checkpointer to proceed with control plane update [yes/no]:
Script stops at this point waiting for confirmation.
Talos still runs self-hosted control plane, and static pods were not rendered yet.
As instructed by the script, please verify that static pod definitions are correct:
Static pod definitions are generated from the machine configuration and should match pod template as generated by Talos on bootstrap of self-hosted control plane unless there were some manual changes applied to the daemonset specs after bootstrap.
Talos patches the machine configuration with the container image versions scraped from the daemonset definition, fetches the service account key from Kubernetes secrets.
Aggregator CA can’t be recovered from the self-hosted control plane, so new CA gets generated.
This is generally harmless and not visible from outside the cluster.
The Aggregator CA is not the same CA as is used by Talos or Kubernetes standard API.
It is a special PKI used for aggregating API extension services inside your cluster.
If you have non-standard apiserver aggregations (fairly rare, and you should know if you do), then you may need to restart these services after the new CA is in place.
$ talosctl -n <IP> get manifests --namespace=extras
NODE NAMESPACE TYPE ID VERSION
172.20.0.2 extras Manifest 05-https://docs.projectcalico.org/manifests/calico.yaml 1
Make sure that manifests and static pods are correct across all control plane nodes, as each node reconciles
control plane state on its own.
For example, CNI configuration in machine config should be in sync across all the nodes.
Talos nodes try to create any missing Kubernetes resources from the manifests, but it never
updates or deletes existing resources.
If something looks wrong, script can be aborted and machine configuration should be updated to fix the problem.
Once configuration is updated, the script can be restarted.
If static pod definitions and manifests look good, confirm next step to disable pod-checkpointer:
$ talosctl -n <IP> convert-k8s
...
confirm disabling pod-checkpointer to proceed with control plane update [yes/no]: yes
disabling pod-checkpointer
deleting daemonset "pod-checkpointer"checking for active pod checkpoints
2021/03/09 23:37:25 retrying error: found 3 active pod checkpoints: [pod-checkpointer-655gc-talos-default-master-3 pod-checkpointer-pw6mv-talos-default-master-1 pod-checkpointer-zdw9z-talos-default-master-2]2021/03/09 23:42:25 retrying error: found 1 active pod checkpoints: [pod-checkpointer-pw6mv-talos-default-master-1]confirm applying static pod definitions and manifests [yes/no]:
Self-hosted control plane runs pod-checkpointer to work around issues with control plane availability.
It should be disabled before conversion starts to allow self-hosted control plane to be removed.
It takes around 5 minutes for the pod-checkpointer to be fully disabled.
Script verifies that all checkpoints are removed before proceeding.
This last confirmation before proceeding is at the point when there is no way to keep running self-hosted control plane:
static pods are released, bootstrap manifests are applied, self-hosted control plane is removed.
$ talosctl -n <IP> convert-k8s
...
confirm applying static pod definitions and manifests [yes/no]: yes
removing self-hosted initialized key
waiting for static pods for"kube-apiserver" to be present in the API server state
waiting for static pods for"kube-controller-manager" to be present in the API server state
waiting for static pods for"kube-scheduler" to be present in the API server state
deleting daemonset "kube-apiserver"waiting for static pods for"kube-apiserver" to be present in the API server state
deleting daemonset "kube-controller-manager"waiting for static pods for"kube-controller-manager" to be present in the API server state
deleting daemonset "kube-scheduler"waiting for static pods for"kube-scheduler" to be present in the API server state
conversion process completed successfully
As soon as the control plane static pods are rendered, the kubelet starts the control plane static pods.
It is expected that the pods for kube-apiserver will crash initially.
Only one kube-apiserver can be bound to the host Node’s port 6443 at a time.
Eventually, the old kube-apiserver will be killed, and the new one will be able to start.
This is all handled automatically.
The script will continue by removing each self-hosted daemonset and verifying that static pods are ready and healthy.
Manual Conversion
Check that Talos runs self-hosted control plane:
$ talosctl -n <CONTROL_PLANE_IP> get bs
NODE NAMESPACE TYPE ID VERSION SELF HOSTED
172.20.0.2 runtime BootstrapStatus control-plane 2true
Talos machine configuration need to be updated to the 0.9 format; there are two new required machine configuration settings:
.cluster.serviceAccount is the service account PEM-encoded private key.
.cluster.aggregatorCA is the aggregator CA for kube-apiserver (certficiate and private key).
Current service account can be fetched from the Kubernetes secrets:
$ kubectl -n kube-system get secrets kube-controller-manager -o jsonpath='{.data.service\-account\.key}'LS0tLS1CRUdJTiBSU0EgUFJJVkFURS...
All control plane node machine configurations should be patched with the service account key:
$ talosctl -n <CONTROL_PLANE_IP1>,<CONTROL_PLANE_IP2>,... patch mc --immediate -p '[{"op": "add", "path": "/cluster/serviceAccount", "value": {"key": "LS0tLS1CRUdJTiBSU0EgUFJJVkFURS..."}}]'patched mc at the node 172.20.0.2
Aggregator CA can be generated using OpenSSL or any other certificate generation tools: RSA or ECDSA certificate with CN front-proxy valid for 10 years.
PEM-encoded CA certificate and key should be base64-encoded and patched into the machine config at path /cluster/aggregatorCA:
$ talosctl -n <CONTROL_PLANE_IP1>,<CONTROL_PLANE_IP2>,... patch mc --immediate -p '[{"op": "add", "path": "/cluster/aggregatorCA", "value": {"crt": "S0tLS1CRUdJTiBDRVJUSUZJQ...", "key": "LS0tLS1CRUdJTiBFQy..."}}]'patched mc at the node 172.20.0.2
At this point static pod definitions and bootstrap manifests should be rendered, please see “Automated Conversion” on how to verify generated objects.
Feel free to continue to refine your machine configuration until the generated static pod definitions and bootstrap manifests look good.
If static pod definitions are not generated, check logs with talosctl -n <IP> logs controller-runtime.
Note: You can use the --squash flag to create smaller images.
Now that we have a custom installer we can build Talos for the specific platform we wish to deploy to.
7.12 - Customizing the Root Filesystem
The installer image contains ONBUILD instructions that handle the following:
the decompression, and unpacking of the initramfs.xz
the unsquashing of the rootfs
the copying of new rootfs files
the squashing of the new rootfs
and the packing, and compression of the new initramfs.xz
When used as a base image, the installer will perform the above steps automatically with the requirement that a customization stage be defined in the Dockerfile.
For example, say we have an image that contains the contents of a library we wish to add to the Talos rootfs.
We need to define a stage with the name customization:
FROM scratch AS customizationCOPY --from=<name|index> <src> <dest>
Using a multi-stage Dockerfile we can define the customization stage and build FROM the installer image:
FROM scratch AS customizationCOPY --from=<name|index> <src> <dest>
FROM ghcr.io/talos-systems/installer:latest
When building the image, the customization stage will automatically be copied into the rootfs.
The customization stage is not limited to a single COPY instruction.
In fact, you can do whatever you would like in this stage, but keep in mind that everything in / will be copied into the rootfs.
Note: <dest> is the path relative to the rootfs that you wish to place the contents of <src>.
This will perform a rm -rf on the specified paths relative to the rootfs.
Note: RM must be a whitespace delimited list.
The resulting image can be used to:
generate an image for any of the supported providers
perform bare-metall installs
perform upgrades
We will step through common customizations in the remainder of this section.
7.13 - Disk Encryption
Guide on using system disk encryption
It is possible to enable encryption for system disks at the OS level.
As of this writing, only STATE and EPHEMERAL partitions can be encrypted.
STATE contains the most sensitive node data: secrets and certs.
EPHEMERAL partition may contain some sensitive workload data.
Data is encrypted using LUKS2, which is provided by the Linux kernel modules and cryptsetup utility.
The operating system will run additional setup steps when encryption is enabled.
If the disk encryption is enabled for the STATE partition, the system will:
Save STATE encryption config as JSON in the META partition.
Before mounting the STATE partition, load encryption configs either from the machine config or from the META partition.
Note that the machine config is always preferred over the META one.
Before mounting the STATE partition, format and encrypt it.
This occurs only if the STATE partition is empty and has no filesystem.
If the disk encryption is enabled for the EPHEMERAL partition, the system will:
Get the encryption config from the machine config.
Before mounting the EPHEMERAL partition, encrypt and format it.
This occurs only if the EPHEMERAL partition is empty and has no filesystem.
Configuration
Right now this encryption is disabled by default.
To enable disk encryption you should modify the machine configuration with the following options:
Note: What the LUKS2 docs call “keys” are, in reality, a passphrase.
When this passphrase is added, LUKS2 runs argon2 to create an actual key from that passphrase.
LUKS2 supports up to 32 encryption keys and it is possible to specify all of them in the machine configuration.
Talos always tries to sync the keys list defined in the machine config with the actual keys defined for the LUKS2 partition.
So if you update the keys list you should have at least one key that is not changed to be used for keys management.
When you define a key you should specify the key kind and the slot:
Take a note that key order does not play any role on which key slot is used.
Every key must always have a slot defined.
Encryption Key Kinds
Talos supports two kinds of keys:
nodeID which is generated using the node UUID and the partition label (note that if the node UUID is not really random it will fail the entropy check).
static which you define right in the configuration.
Note: Use static keys only if your STATE partition is encrypted and only for the EPHEMERAL partition.
For the STATE partition it will be stored in the META partition, which is not encrypted.
Key Rotation
It is necessary to do talosctl apply-config a couple of times to rotate keys, since there is a need to always maintain a single working key while changing the other keys around it.
That’s it!
After you run the last command, the partition will be wiped and the node will reboot.
During the next boot the system will encrypt the partition.
State Partition
Calling wipe against the STATE partition will make the node lose the config, so the previous flow is not going to work.
The flow should be to first wipe the STATE partition:
talosctl reset --system-labels-to-wipe STATE -n <node ip> --reboot=true
Node will enter into maintenance mode, then run apply-config with --insecure flag:
After installation is complete the node should encrypt the STATE partition.
7.14 - Editing Machine Configuration
How to edit and patch Talos machine configuration, with reboot, immediately, or stage update on reboot.
Talos node state is fully defined by machine configuration.
Initial configuration is delivered to the node at bootstrap time, but configuration can be updated while the node is running.
Note: Be sure that config is persisted so that configuration updates are not overwritten on reboots.
Configuration persistence was enabled by default since Talos 0.5 (persist: true in machine configuration).
There are three talosctl commands which facilitate machine configuration updates:
talosctl apply-config to apply configuration from the file
talosctl edit machineconfig to launch an editor with existing node configuration, make changes and apply configuration back
talosctl patch machineconfig to apply automated machine configuration via JSON patch
Each of these commands can operate in one of three modes:
apply change with a reboot (default): update configuration, reboot Talos node to apply configuration change
apply change immediately (--immediate flag): change is applied immediately without a reboot, only .cluster sub-tree of the machine configuration can be updated in Talos 0.9
apply change on next reboot (--on-reboot): change is staged to be applied after a reboot, but node is not rebooted
Note: applying change on next reboot (--on-reboot) doesn’t modify current node configuration, so next call to
talosctl edit machineconfig --on-reboot will not see changes
talosctl apply-config
This command is mostly used to submit initial machine configuration to the node (generated by talosctl gen config).
It can be used to apply new configuration from the file to the running node as well, but most of the time it’s not convenient, as it doesn’t operate on the current node machine configuration.
Example:
talosctl -n <IP> apply-config -f config.yaml
Command apply-config can also be invoked as apply machineconfig:
Applying machine configuration immediately (without a reboot):
talosctl -n IP apply machineconfig -f config.yaml --immediate
taloctl edit machineconfig
Command talosctl edit loads current machine configuration from the node and launches configured editor to modify the config.
If config hasn’t been changed in the editor (or if updated config is empty), update is not applied.
Note: Talos uses environment variables TALOS_EDITOR, EDITOR to pick up the editor preference.
If environment variables are missing, vi editor is used by default.
Example:
talosctl -n <IP> edit machineconfig
Configuration can be edited for multiple nodes if multiple IP addresses are specified:
talosctl -n <IP1>,<IP2>,... edit machineconfig
Applying machine configuration change immediately (without a reboot):
talosctl -n <IP> edit machineconfig --immediate
talosctl patch machineconfig
Command talosctl patch works similar to talosctl edit command - it loads current machine configuration, but instead of launching configured editor it applies JSON patch to the configuration and writes result back to the node.
Example, updating kubelet version (with a reboot):
$ talosctl -n <IP> patch machineconfig -p '[{"op": "replace", "path": "/machine/kubelet/image", "value": "ghcr.io/talos-systems/kubelet:v1.20.5"}]'patched mc at the node <IP>
Updating kube-apiserver version in immediate mode (without a reboot):
$ talosctl -n <IP> patch machineconfig --immediate -p '[{"op": "replace", "path": "/cluster/apiServer/image", "value": "k8s.gcr.io/kube-apiserver:v1.20.5"}]'patched mc at the node <IP>
Patch might be applied to multiple nodes when multiple IPs are specified:
If a Talos node fails to boot because of wrong configuration (for example, control plane endpoint is incorrect), configuration can be updated to fix the issue.
If the boot sequence is still running, Talos might refuse applying config in default mode.
In that case --on-reboot mode can be used coupled with talosctl reboot command to trigger a reboot and apply configuration update.
7.15 - Managing PKI
Generating an Administrator Key Pair
In order to create a key pair, you will need the root CA.
Save the CA public key, and CA private key as ca.crt, and ca.key respectively.
Now, run the following commands to generate a certificate:
talosctl gen key --name admin
talosctl gen csr --key admin.key --ip 127.0.0.1
talosctl gen crt --ca ca --csr admin.csr --name admin
Now, base64 encode admin.crt, and admin.key:
cat admin.crt | base64
cat admin.key | base64
You can now set the crt and key fields in the talosconfig to the base64 encoded strings.
Renewing an Expired Administrator Certificate
In order to renew the certificate, you will need the root CA, and the admin private key.
The base64 encoded key can be found in any one of the control plane node’s configuration file.
Where it is exactly will depend on the specific version of the configuration file you are using.
Save the CA public key, CA private key, and admin private key as ca.crt, ca.key, and admin.key respectively.
Now, run the following commands to generate a certificate:
talosctl gen csr --key admin.key --ip 127.0.0.1
talosctl gen crt --ca ca --csr admin.csr --name admin
You should see admin.crt in your current directory.
Now, base64 encode admin.crt:
cat admin.crt | base64
You can now set the certificate in the talosconfig to the base64 encoded string.
7.16 - Resetting a Machine
From time to time, it may be beneficial to reset a Talos machine to its “original” state.
Bear in mind that this is a destructive action for the given machine.
Doing this means removing the machine from Kubernetes, Etcd (if applicable), and clears any data on the machine that would normally persist a reboot.
The API command for doing this is talosctl reset.
There are a couple of flags as part of this command:
Flags:
--graceful if true, attempt to cordon/drain node and leave etcd (if applicable)(default true) --reboot if true, reboot the node after resetting instead of shutting down
The graceful flag is especially important when considering HA vs. non-HA Talos clusters.
If the machine is part of an HA cluster, a normal, graceful reset should work just fine right out of the box as long as the cluster is in a good state.
However, if this is a single node cluster being used for testing purposes, a graceful reset is not an option since Etcd cannot be “left” if there is only a single member.
In this case, reset should be used with --graceful=false to skip performing checks that would normally block the reset.
7.17 - Storage
Talos is known to work with Rook and NFS.
Rook
We recommend at least Rook v1.5.
NFS
The NFS client is part of the kubelet image maintained by the Talos team.
This means that the version installed in your running kubelet is the version of NFS supported by Talos.
7.18 - Troubleshooting Control Plane
Troubleshoot control plane failures for running cluster and bootstrap process.
This guide is written as series of topics and detailed answers for each topic.
It starts with basics of control plane and goes into Talos specifics.
This document mostly applies only to Talos 0.9 control plane based on static pods.
If Talos was upgraded from version 0.8, it might be still running self-hosted control plane, current status can
be checked with the command talosctl get bootstrapstatus:
$ talosctl -n <IP> get bs
NODE NAMESPACE TYPE ID VERSION SELF HOSTED
172.20.0.2 runtime BootstrapStatus control-plane 1false
In this guide we assume that Talos client config is available and Talos API access is available.
Kubernetes client configuration can be pulled from control plane nodes with talosctl -n <IP> kubeconfig
(this command works before Kubernetes is fully booted).
What is a control plane node?
Talos nodes which have .machine.type of init and controlplane are control plane nodes.
The only difference between init and controlplane nodes is that init node automatically
bootstraps a single-node etcd cluster on a first boot if the etcd data directory is empty.
A node with type init can be replaced with a controlplane node which is triggered to run etcd bootstrap
with talosctl --nodes <IP> bootstrap command.
Use of init type nodes is discouraged, as it might lead to split-brain scenario if one node in
existing cluster is reinstalled while config type is still init.
It is critical to make sure only one control plane runs in bootstrap mode (either with node type init or
via bootstrap API/talosctl bootstrap), as having more than node in bootstrap mode leads to split-brain
scenario (multiple etcd clusters are built instead of a single cluster).
What is special about control plane node?
Control plane nodes in Talos run etcd which provides data store for Kubernetes and Kubernetes control plane
components (kube-apiserver, kube-controller-manager and kube-scheduler).
Control plane nodes are tainted by default to prevent workloads from being scheduled to control plane nodes.
How many control plane nodes should be deployed?
With a single control plane node, cluster is not HA: if that single node experiences hardware failure, cluster
control plane is broken and can’t be recovered.
Single control plane node clusters are still used as test clusters and in edge deployments, but it should be noted that this setup is not HA.
Number of control plane should be odd (1, 3, 5, …), as with even number of nodes, etcd quorum doesn’t tolerate
failures correctly: e.g. with 2 control plane nodes quorum is 2, so failure of any node breaks quorum, so this
setup is almost equivalent to single control plane node cluster.
With three control plane nodes cluster can tolerate a failure of any single control plane node.
With five control plane nodes cluster can tolerate failure of any two control plane nodes.
What is control plane endpoint?
Kubernetes requires having a control plane endpoint which points to any healthy API server running on a control plane node.
Control plane endpoint is specified as URL like https://endpoint:6443/.
At any point in time, even during failures control plane endpoint should point to a healthy API server instance.
As kube-apiserver runs with host network, control plane endpoint should point to one of the control plane node IPs: node1:6443, node2:6443, …
For single control plane node clusters, control plane endpoint might be https://IP:6443/ or https://DNS:6443/, where IP is the IP of the control plane node and DNS points to IP.
DNS form of the endpoint allows to change the IP address of the control plane if that IP changes over time.
For HA clusters, control plane can be implemented as:
TCP L7 loadbalancer with active health checks against port 6443
round-robin DNS with active health checks against port 6443
BGP anycast IP with health checks
virtual shared L2 IP
It is critical that control plane endpoint works correctly during cluster bootstrap phase, as nodes discover
each other using control plane endpoint.
kubelet is not running on control plane node
Service kubelet should be running on control plane node as soon as networking is configured:
$ talosctl -n <IP> service kubelet
NODE 172.20.0.2
ID kubelet
STATE Running
HEALTH OK
EVENTS [Running]: Health check successful (2m54s ago)[Running]: Health check failed: Get "http://127.0.0.1:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused (3m4s ago)[Running]: Started task kubelet (PID 2334)for container kubelet (3m6s ago)[Preparing]: Creating service runner (3m6s ago)[Preparing]: Running pre state (3m15s ago)[Waiting]: Waiting for service "timed" to be "up"(3m15s ago)[Waiting]: Waiting for service "cri" to be "up", service "timed" to be "up"(3m16s ago)[Waiting]: Waiting for service "cri" to be "up", service "networkd" to be "up", service "timed" to be "up"(3m18s ago)
If kubelet is not running, it might be caused by wrong configuration, check kubelet logs
with talosctl logs:
etcd should be running on bootstrap node immediately (bootstrap node is either init node or controlplane node
after talosctl bootstrap command was issued).
When node boots for the first time, etcd data directory /var/lib/etcd directory is empty and Talos launches etcd in a mode to build the initial cluster of a single node.
At this time /var/lib/etcd directory becomes non-empty and etcd runs as usual.
If etcd is not running, check service etcd state:
$ talosctl -n <IP> service etcd
NODE 172.20.0.2
ID etcd
STATE Running
HEALTH OK
EVENTS [Running]: Health check successful (3m21s ago)[Running]: Started task etcd (PID 2343)for container etcd (3m26s ago)[Preparing]: Creating service runner (3m26s ago)[Preparing]: Running pre state (3m26s ago)[Waiting]: Waiting for service "cri" to be "up", service "networkd" to be "up", service "timed" to be "up"(3m26s ago)
If service is stuck in Preparing state for bootstrap node, it might be related to slow network - at this stage
Talos pulls etcd image from the container registry.
If etcd service is crashing and restarting, check service logs with talosctl -n <IP> logs etcd.
Most common reasons for crashes are:
wrong arguments passed via extraArgs in the configuration;
booting Talos on non-empty disk with previous Talos installation, /var/lib/etcd contains data from old cluster.
etcd is not running on non-bootstrap control plane node
Service etcd on non-bootstrap control plane node waits for Kubernetes to boot successfully on bootstrap node to find
other peers to build a cluster.
As soon as bootstrap node boots Kubernetes control plane components, and kubectl get endpoints returns IP of bootstrap control plane node, other control plane nodes will start joining the cluster followed by Kubernetes control plane components on each control plane node.
Kubernetes static pod definitions are not generated
Talos should write down static pod definitions for the Kubernetes control plane:
$ talosctl -n <IP> ls /etc/kubernetes/manifests
NODE NAME
172.20.0.2 .
172.20.0.2 talos-kube-apiserver.yaml
172.20.0.2 talos-kube-controller-manager.yaml
172.20.0.2 talos-kube-scheduler.yaml
If static pod definitions are not rendered, check etcd and kubelet service health (see above),
and controller runtime logs (talosctl logs controller-runtime).
Talos prints error an error on the server ("") has prevented the request from succeeding
This is expected during initial cluster bootstrap and sometimes after a reboot:
[ 70.093289][talos] task labelNodeAsMaster (1/1): starting
[ 80.094038][talos] retrying error: an error on the server ("") has prevented the request from succeeding (get nodes talos-default-master-1)
Initially kube-apiserver component is not running yet, and it takes some time before it becomes fully up
during bootstrap (image should be pulled from the Internet, etc.)
Once control plane endpoint is up Talos should proceed.
If Talos doesn’t proceed further, it might be a configuration issue.
In any case, status of control plane components can be checked with talosctl containers -k:
If kube-apiserver shows as CONTAINER_EXITED, it might have exited due to configuration error.
Logs can be checked with taloctl logs --kubernetes (or with -k as a shorthand):
$ talosctl -n <IP> logs -k kube-system/kube-apiserver-talos-default-master-1:kube-apiserver
172.20.0.2: 2021-03-05T20:46:13.133902064Z stderr F 2021/03/05 20:46:13 Running command:
172.20.0.2: 2021-03-05T20:46:13.133933824Z stderr F Command env: (log-file=, also-stdout=false, redirect-stderr=true)172.20.0.2: 2021-03-05T20:46:13.133938524Z stderr F Run from directory:
172.20.0.2: 2021-03-05T20:46:13.13394154Z stderr F Executable path: /usr/local/bin/kube-apiserver
...
Talos prints error nodes "talos-default-master-1" not found
This error means that kube-apiserver is up, and control plane endpoint is healthy, but kubelet hasn’t got
its client certificate yet and wasn’t able to register itself.
For the kubelet to get its client certificate, following conditions should apply:
control plane endpoint is healthy (kube-apiserver is running)
$ kubectl get csr
NAME AGE SIGNERNAME REQUESTOR CONDITION
csr-jcn9j 14m kubernetes.io/kube-apiserver-client-kubelet system:bootstrap:q9pyzr Approved,Issued
csr-p6b9q 14m kubernetes.io/kube-apiserver-client-kubelet system:bootstrap:q9pyzr Approved,Issued
csr-sw6rm 14m kubernetes.io/kube-apiserver-client-kubelet system:bootstrap:q9pyzr Approved,Issued
csr-vlghg 14m kubernetes.io/kube-apiserver-client-kubelet system:bootstrap:q9pyzr Approved,Issued
Talos prints error node not ready
Node in Kubernetes is marked as Ready once CNI is up.
It takes a minute or two for the CNI images to be pulled and for the CNI to start.
If the node is stuck in this state for too long, check CNI pods and logs with kubectl, usually
CNI resources are created in kube-system namespace.
For example, for Talos default Flannel CNI:
$ kubectl -n kube-system get pods
NAME READY STATUS RESTARTS AGE
...
kube-flannel-25drx 1/1 Running 0 23m
kube-flannel-8lmb6 1/1 Running 0 23m
kube-flannel-gl7nx 1/1 Running 0 23m
kube-flannel-jknt9 1/1 Running 0 23m
...
Talos prints error x509: certificate signed by unknown authority
Full error might look like:
x509: certificate signed by unknown authority (possiby because of crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes"
Commonly, the control plane endpoint points to a different cluster, as the client certificate
generated by Talos doesn’t match CA of the cluster at control plane endpoint.
etcd is running on bootstrap node, but stuck in pre state on non-bootstrap nodes
Please see question etcd is not running on non-bootstrap control plane node.
Checking kube-controller-manager and kube-scheduler
If control plane endpoint is up, status of the pods can be performed with kubectl:
$ kubectl get pods -n kube-system -l k8s-app=kube-controller-manager
NAME READY STATUS RESTARTS AGE
kube-controller-manager-talos-default-master-1 1/1 Running 0 28m
kube-controller-manager-talos-default-master-2 1/1 Running 0 28m
kube-controller-manager-talos-default-master-3 1/1 Running 0 28m
If control plane endpoint is not up yet, container status can be queried with
talosctl containers --kubernetes:
If some of the containers are not running, it could be that image is still being pulled.
Otherwise process might crashing, in that case logs can be checked with talosctl logs --kubernetes <containerID>:
$ talosctl -n <IP> logs -k kube-system/kube-controller-manager-talos-default-master-1:kube-controller-manager
172.20.0.3: 2021-03-09T13:59:34.291667526Z stderr F 2021/03/09 13:59:34 Running command:
172.20.0.3: 2021-03-09T13:59:34.291702262Z stderr F Command env: (log-file=, also-stdout=false, redirect-stderr=true)172.20.0.3: 2021-03-09T13:59:34.291707121Z stderr F Run from directory:
172.20.0.3: 2021-03-09T13:59:34.291710908Z stderr F Executable path: /usr/local/bin/kube-controller-manager
172.20.0.3: 2021-03-09T13:59:34.291719163Z stderr F Args (comma-delimited): /usr/local/bin/kube-controller-manager,--allocate-node-cidrs=true,--cloud-provider=,--cluster-cidr=10.244.0.0/16,--service-cluster-ip-range=10.96.0.0/12,--cluster-signing-cert-file=/system/secrets/kubernetes/kube-controller-manager/ca.crt,--cluster-signing-key-file=/system/secrets/kubernetes/kube-controller-manager/ca.key,--configure-cloud-routes=false,--kubeconfig=/system/secrets/kubernetes/kube-controller-manager/kubeconfig,--leader-elect=true,--root-ca-file=/system/secrets/kubernetes/kube-controller-manager/ca.crt,--service-account-private-key-file=/system/secrets/kubernetes/kube-controller-manager/service-account.key,--profiling=false172.20.0.3: 2021-03-09T13:59:34.293870359Z stderr F 2021/03/09 13:59:34 Now listening for interrupts
172.20.0.3: 2021-03-09T13:59:34.761113762Z stdout F I0309 13:59:34.760982 10 serving.go:331] Generated self-signed cert in-memory
...
Checking controller runtime logs
Talos runs a set of controllers which work on resources to build and support Kubernetes control plane.
Some debugging information can be queried from the controller logs with talosctl logs controller-runtime:
Controllers run reconcile loop, so they might be starting, failing and restarting, that is expected behavior.
Things to look for:
v1alpha1.BootstrapStatusController: bootkube initialized status not found: control plane is not self-hosted, running with static pods.
k8s.KubeletStaticPodController: writing static pod "/etc/kubernetes/manifests/talos-kube-apiserver.yaml": static pod definitions were rendered successfully.
k8s.ManifestApplyController: controller failed: error creating mapping for object /v1/Secret/bootstrap-token-q9pyzr: an error on the server ("") has prevented the request from succeeding: control plane endpoint is not up yet, bootstrap manifests can’t be injected, controller is going to retry.
k8s.KubeletStaticPodController: controller failed: error refreshing pod status: error fetching pod status: an error on the server ("Authorization error (user=apiserver-kubelet-client, verb=get, resource=nodes, subresource=proxy)") has prevented the request from succeeding: kubelet hasn’t been able to contact kube-apiserver yet to push pod status, controller
is going to retry.
k8s.ManifestApplyController: created rbac.authorization.k8s.io/v1/ClusterRole/psp:privileged: one of the bootstrap manifests got successfully applied.
secrets.KubernetesController: controller failed: missing cluster.aggregatorCA secret: Talos is running with 0.8 configuration, if the cluster was upgraded from 0.8, this is expected, and conversion process will fix machine config
automatically.
If this cluster was bootstrapped with version 0.9, machine configuration should be regenerated with 0.9 talosctl.
If there are no new messages in controller-runtime log, it means that controllers finished reconciling successfully.
Checking static pod definitions
Talos generates static pod definitions for kube-apiserver, kube-controller-manager, and kube-scheduler
components based on machine configuration.
These definitions can be checked as resources with talosctl get staticpods:
Status of the static pods can queried with talosctl get staticpodstatus:
$ talosctl -n <IP> get staticpodstatus
NODE NAMESPACE TYPE ID VERSION READY
172.20.0.2 controlplane StaticPodStatus kube-system/kube-apiserver-talos-default-master-1 1 True
172.20.0.2 controlplane StaticPodStatus kube-system/kube-controller-manager-talos-default-master-1 1 True
172.20.0.2 controlplane StaticPodStatus kube-system/kube-scheduler-talos-default-master-1 1 True
Most important status is Ready printed as last column, complete status can be fetched by adding -o yaml flag.
Checking bootstrap manifests
As part of bootstrap process, Talos injects bootstrap manifests into Kubernetes API server.
There are two kinds of manifests: system manifests built-in into Talos and extra manifests downloaded (custom CNI, extra manifests in the machine config):
$ talosctl -n <IP> get manifests --namespace=extras
NODE NAMESPACE TYPE ID VERSION
172.20.0.2 extras Manifest 05-https://docs.projectcalico.org/manifests/calico.yaml 1
Details of each manifests can be queried by adding -o yaml:
Worker node is stuck with apid health check failures
Control plane nodes have enough secret material to generate apid server certificates, but worker nodes
depend on control plane trustd services to generate certificates.
Worker nodes wait for kubelet to join the cluster, then apid queries Kubernetes endpoints via control plane
endpoint to find trustd endpoints, and use trustd to issue the certficiate.
So if apid health checks is failing on worker node:
make sure control plane endpoint is healthy
check that worker node kubelet joined the cluster
7.19 - Upgrading Kubernetes
This guide covers Kubernetes control plane upgrade for clusters running Talos-managed control plane.
If the cluster is still running self-hosted control plane (after upgrade from Talos 0.8), please
refer to 0.8 docs.
Video Walkthrough
To see a live demo of this writeup, see the video below:
Automated Kubernetes Upgrade
To upgrade from Kubernetes v1.20.1 to v1.20.4 run:
$ talosctl --nodes <master node> upgrade-k8s --from 1.20.1 --to 1.20.4
discovered master nodes ["172.20.0.2""172.20.0.3""172.20.0.4"]updating "kube-apiserver" to version "1.20.4" > updating node "172.20.0.2"2021/03/09 19:55:01 retrying error: config version mismatch: got "2", expected "3" > updating node "172.20.0.3"2021/03/09 19:55:05 retrying error: config version mismatch: got "2", expected "3" > updating node "172.20.0.4"2021/03/09 19:55:07 retrying error: config version mismatch: got "2", expected "3"updating "kube-controller-manager" to version "1.20.4" > updating node "172.20.0.2"2021/03/09 19:55:27 retrying error: config version mismatch: got "2", expected "3" > updating node "172.20.0.3"2021/03/09 19:55:47 retrying error: config version mismatch: got "2", expected "3" > updating node "172.20.0.4"2021/03/09 19:56:07 retrying error: config version mismatch: got "2", expected "3"updating "kube-scheduler" to version "1.20.4" > updating node "172.20.0.2"2021/03/09 19:56:27 retrying error: config version mismatch: got "2", expected "3" > updating node "172.20.0.3"2021/03/09 19:56:47 retrying error: config version mismatch: got "2", expected "3" > updating node "172.20.0.4"2021/03/09 19:57:08 retrying error: config version mismatch: got "2", expected "3"updating daemonset "kube-proxy" to version "1.20.4"
Script runs in two phases:
In the first phase every control plane node machine configuration is patched with new image version for each control plane component.
Talos renders new static pod definition on configuration update which is picked up by the kubelet.
Script waits for the change to propagate to the API server state.
Messages config version mismatch indicate that script is waiting for the updated container to be registered in the API server.
In the second phase script updates kube-proxy daemonset with the new image version.
If script fails for any reason, it can be safely restarted to continue upgrade process.
Manual Kubernetes Upgrade
Kubernetes can be upgraded manually as well by following the steps outlined below.
They are equivalent to the steps performed by the talosctl upgrade-k8s command.
Kubeconfig
In order to edit the control plane, we will need a working kubectl config.
If you don’t already have one, you can get one by running:
talosctl --nodes <master node> kubeconfig
API Server
Patch machine configuration using talosctl patch command:
$ talosctl -n <CONTROL_PLANE_IP_1> patch mc --immediate -p '[{"op": "replace", "path": "/cluster/apiServer/image", "value": "k8s.gcr.io/kube-apiserver:v1.20.4"}]'patched mc at the node 172.20.0.2
JSON patch might need to be adjusted if current machine configuration is missing .cluster.apiServer.image key.
Also machine configuration can be edited manually with talosctl -n <IP> edit mc --immediate.
Capture new version of kube-apiserver config with:
In this example, new version is 5.
Wait for the new pod definition to propagate to the API server state (replace talos-default-master-1 with the node name):
$ kubectl get pod -n kube-system -l k8s-app=kube-apiserver --field-selector spec.nodeName=talos-default-master-1 -o jsonpath='{.items[0].metadata.annotations.talos\.dev/config\-version}'5
Check that the pod is running:
$ kubectl get pod -n kube-system -l k8s-app=kube-apiserver --field-selector spec.nodeName=talos-default-master-1
NAME READY STATUS RESTARTS AGE
kube-apiserver-talos-default-master-1 1/1 Running 0 16m
Repeat this process for every control plane node, verifying that state got propagated successfully between each node update.
Controller Manager
Patch machine configuration using talosctl patch command:
$ talosctl -n <CONTROL_PLANE_IP_1> patch mc --immediate -p '[{"op": "replace", "path": "/cluster/controllerManager/image", "value": "k8s.gcr.io/kube-controller-manager:v1.20.4"}]'patched mc at the node 172.20.0.2
JSON patch might need be adjusted if current machine configuration is missing .cluster.controllerManager.image key.
Capture new version of kube-controller-manager config with:
In this example, new version is 3.
Wait for the new pod definition to propagate to the API server state (replace talos-default-master-1 with the node name):
$ kubectl get pod -n kube-system -l k8s-app=kube-controller-manager --field-selector spec.nodeName=talos-default-master-1 -o jsonpath='{.items[0].metadata.annotations.talos\.dev/config\-version}'3
Check that the pod is running:
$ kubectl get pod -n kube-system -l k8s-app=kube-controller-manager --field-selector spec.nodeName=talos-default-master-1
NAME READY STATUS RESTARTS AGE
kube-controller-manager-talos-default-master-1 1/1 Running 0 35m
Repeat this process for every control plane node, verifying that state got propagated successfully between each node update.
Scheduler
Patch machine configuration using talosctl patch command:
$ talosctl -n <CONTROL_PLANE_IP_1> patch mc --immediate -p '[{"op": "replace", "path": "/cluster/scheduler/image", "value": "k8s.gcr.io/kube-scheduler:v1.20.4"}]'patched mc at the node 172.20.0.2
JSON patch might need be adjusted if current machine configuration is missing .cluster.scheduler.image key.
Capture new version of kube-scheduler config with:
In this example, new version is 3.
Wait for the new pod definition to propagate to the API server state (replace talos-default-master-1 with the node name):
$ kubectl get pod -n kube-system -l k8s-app=kube-scheduler --field-selector spec.nodeName=talos-default-master-1 -o jsonpath='{.items[0].metadata.annotations.talos\.dev/config\-version}'3
Check that the pod is running:
$ kubectl get pod -n kube-system -l k8s-app=kube-scheduler --field-selector spec.nodeName=talos-default-master-1
NAME READY STATUS RESTARTS AGE
kube-scheduler-talos-default-master-1 1/1 Running 0 39m
Repeat this process for every control plane node, verifying that state got propagated successfully between each node update.
Upgrading Kubelet version requires Talos node reboot after machine configuration change.
For every node, patch machine configuration with new kubelet version, wait for the node to reboot:
$ talosctl -n <IP> patch mc -p '[{"op": "replace", "path": "/machine/kubelet/image", "value": "ghcr.io/talos-systems/kubelet:v1.20.4"}]'patched mc at the node 172.20.0.2
Once node boots with the new configuration, confirm upgrade with kubectl get nodes <name>:
$ kubectl get nodes talos-default-master-1
NAME STATUS ROLES AGE VERSION
talos-default-master-1 Ready control-plane,master 123m v1.20.4
7.20 - Upgrading Talos
Talos upgrades are effected by an API call.
The talosctl CLI utility will facilitate this.
Video Walkthrough
To see a live demo of this writeup, see the video below:
Upgrading from Talos 0.8
Talos 0.9 drops support for bootkube and self-hosted control plane.
Please make sure Talos is upgraded to the latest minor release of 0.8 first (0.8.4 at the moment
of this writing), then proceed with upgrading to the latest minor release of 0.9.
Before Upgrade to 0.9
If cluster was bootstrapped on Talos version < 0.8.3, add checkpointer annotations to
the kube-scheduler and kube-controller-manager daemonsets to improve resiliency of
self-hosted control plane to reboots (this is critical for single control-plane node clusters):
Talos 0.9 only supports Kubernetes versions 1.19.x and 1.20.x.
If running 1.18.x, please upgrade Kubernetes before upgrading Talos.
Make sure cluster is running latest minor release of Talos 0.8.
Prepare by downloading talosctl binary for Talos release 0.9.x.
After Upgrade to 0.9
After the upgrade to 0.9, Talos will still be running self-hosted control plane until the conversion process is run.
Note: Talos 0.9 doesn’t include bootkube recovery option (talosctl recover), so
it’s not possible to recover self-hosted control plane after upgrading to 0.9.
As soon as all the nodes get upgraded to 0.9, run talosctl convert-k8s to convert the control plane
to the new static pod format for 0.9.
Once the conversion process is complete, Kubernetes can be upgraded.
talosctl Upgrade
To manually upgrade a Talos node, you will specify the node’s IP address and the
installer container image for the version of Talos to which you wish to upgrade.
For instance, if your Talos node has the IP address 10.20.30.40 and you want
to install the official version v0.9.0, you would enter a command such
as:
There is an option to this command: --preserve, which can be used to explicitly tell Talos to either keep intact its ephemeral data or not.
In most cases, it is correct to just let Talos perform its default action.
However, if you are running a single-node control-plane, you will want to make sure that --preserve=true.
If Talos fails to run the upgrade, the --stage flag may be used to perform the upgrade after a reboot
which is followed by another reboot to upgraded version.
Machine Configuration Changes
Talos 0.9 introduces new required parameters in machine configuration:
.cluster.aggregatorCA
.cluster.serviceAccount
Talos supports both ECDSA and RSA certificates and keys for Kubernetes and etcd, with ECDSA being default.
Talos <= 0.8 supports only RSA keys and certificates.
Utility talosctl gen config generates by default config in 0.9 format which is not compatible with
Talos 0.8, but old format can be generated with talosctl gen config --talos-version=v0.8.
7.21 - Virtual (shared) IP
One of the biggest pain points when building a high-availability controlplane
is giving clients a single IP or URL at which they can reach any of the controlplane nodes.
The most common approaches all require external resources: reverse proxy, load
balancer, BGP, and DNS.
Using a “Virtual” IP address, on the other hand, provides high availability
without external coordination or resources, so long as the controlplane members
share a layer 2 network.
In practical terms, this means that they are all connected via a switch, with no
router in between them.
The term “virtual” is misleading here.
The IP address is real, and it is assigned to an interface.
Instead, what actually happens is that the controlplane machines vie for
control of the shared IP address.
There can be only one owner of the IP address at any given time, but if that
owner disappears or becomes non-responsive, another owner will be chosen,
and it will take up the mantle: the IP address.
Talos has (as of version 0.9) built-in support for this form of shared IP address,
and it can utilize this for both the Kubernetes API server and the Talos endpoint set.
Talos uses etcd for elections and leadership (control) of the IP address.
Video Walkthrough
To see a live demo of this writeup, see the video below:
Choose your Shared IP
To begin with, you should choose your shared IP address.
It should generally be a reserved, unused IP address in the same subnet as
your controlplane nodes.
It should not be assigned or assignable by your DHCP server.
For our example, we will assume that the controlplane nodes have the following
IP addresses:
192.168.0.10
192.168.0.11
192.168.0.12
We then choose our shared IP to be:
192.168.0.15
Configure your Talos Machines
The shared IP setting is only valid for controlplane nodes.
For the example above, each of the controlplane nodes should have the following
Machine Config snippet:
Obviously, for your own environment, the interface and the DHCP setting may
differ.
You are free to use static addressing (cidr) instead of DHCP.
Caveats
In general, the shared IP should just work.
However, since it relies on etcd for elections, the shared IP will not come
alive until after you have bootstrapped Kubernetes.
In general, this is not a problem, but it does mean that you cannot use the
shared IP when issuing the talosctl bootstrap command.
Instead, that command will need to target one of the controlplane nodes
discretely.
Modified indicates the UNIX timestamp at which the file was last modified
TODO: unix timestamp or include proto’s Date type |
| is_dir | bool | | IsDir indicates that the file is a directory |
| error | string | | Error describes any error encountered while trying to read the file information. |
| link | string | | Link is filled with symlink target |
| relative_name | string | | RelativeName is the name of the file or directory relative to the RootPath |
GenerateConfiguration
GenerateConfiguration describes the response to a generate configuration request.
System_partitions_to_wipe lists specific system disk partitions to be reset (wiped). If system_partitions_to_wipe is empty, all the partitions are erased.
--cert-fingerprint strings list of server certificate fingeprints to accept (defaults to no check)
-f, --file string the filename of the updated configuration
-h, --help help for apply-config
--immediate apply the config immediately (without a reboot)
-i, --insecure apply the config using the insecure (encrypted with no auth) maintenance service
--interactive apply the config using text based interactive mode
--on-reboot apply the config on reboot
Options inherited from parent commands
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl bootstrap
Bootstrap the cluster
talosctl bootstrap [flags]
Options
-h, --help help for bootstrap
Options inherited from parent commands
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl cluster create
Creates a local docker-based or QEMU-based kubernetes cluster
talosctl cluster create [flags]
Options
--arch string cluster architecture (default "amd64")
--cidr string CIDR of the cluster network (IPv4, ULA network for IPv6 is derived in automated way) (default "10.5.0.0/24")
--cni-bin-path strings search path for CNI binaries (VM only) (default [/home/user/.talos/cni/bin])
--cni-bundle-url string URL to download CNI bundle from (VM only) (default "https://github.com/siderolabs/talos/releases/download/v0.10.0-alpha.0/talosctl-cni-bundle-${ARCH}.tar.gz")
--cni-cache-dir string CNI cache directory path (VM only) (default "/home/user/.talos/cni/cache")
--cni-conf-dir string CNI config directory path (VM only) (default "/home/user/.talos/cni/conf.d")
--config-patch string patch generated machineconfigs
--cpus string the share of CPUs as fraction (each container/VM) (default "2.0")
--crashdump print debug crashdump to stderr when cluster startup fails
--custom-cni-url string install custom CNI from the URL (Talos cluster)
--disk int default limit on disk size in MB (each VM) (default 6144)
--disk-image-path string disk image to use
--dns-domain string the dns domain to use for cluster (default "cluster.local")
--docker-host-ip string Host IP to forward exposed ports to (Docker provisioner only) (default "0.0.0.0")
--encrypt-ephemeral enable ephemeral partition encryption
--encrypt-state enable state partition encryption
--endpoint string use endpoint instead of provider defaults
-p, --exposed-ports string Comma-separated list of ports/protocols to expose on init node. Ex -p <hostPort>:<containerPort>/<protocol (tcp or udp)> (Docker provisioner only)
-h, --help help for create
--image string the image to use (default "ghcr.io/talos-systems/talos:latest")
--init-node-as-endpoint use init node as endpoint instead of any load balancer endpoint
--initrd-path string the uncompressed kernel image to use (default "_out/initramfs-${ARCH}.xz")
-i, --input-dir string location of pre-generated config files
--install-image string the installer image to use (default "ghcr.io/talos-systems/installer:latest")
--ipv4 enable IPv4 network in the cluster (default true)
--ipv6 enable IPv6 network in the cluster (QEMU provisioner only)
--iso-path string the ISO path to use for the initial boot (VM only)
--kubernetes-version string desired kubernetes version to run (default "1.20.5")
--masters int the number of masters to create (default 1)
--memory int the limit on memory usage in MB (each container/VM) (default 2048)
--mtu int MTU of the cluster network (default 1500)
--nameservers strings list of nameservers to use (default [8.8.8.8,1.1.1.1,2001:4860:4860::8888,2606:4700:4700::1111])
--registry-insecure-skip-verify strings list of registry hostnames to skip TLS verification for
--registry-mirror strings list of registry mirrors to use in format: <registry host>=<mirror URL>
--skip-injecting-config skip injecting config from embedded metadata server, write config files to current directory
--skip-kubeconfig skip merging kubeconfig from the created cluster
--talos-version string the desired Talos version to generate config for (if not set, defaults to image version)
--use-vip use a virtual IP for the controlplane endpoint instead of the loadbalancer
--user-disk strings list of disks to create for each VM in format: <mount_point1>:<size1>:<mount_point2>:<size2>
--vmlinuz-path string the compressed kernel image to use (default "_out/vmlinuz-${ARCH}")
--wait wait for the cluster to be ready before returning (default true)
--wait-timeout duration timeout to wait for the cluster to be ready (default 20m0s)
--wireguard-cidr string CIDR of the wireguard network
--with-apply-config enable apply config when the VM is starting in maintenance mode
--with-bootloader enable bootloader to load kernel and initramfs from disk image after install (default true)
--with-debug enable debug in Talos config to send service logs to the console
--with-init-node create the cluster with an init node
--with-uefi enable UEFI on x86_64 architecture (always enabled for arm64)
--workers int the number of workers to create (default 1)
Options inherited from parent commands
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
--name string the name of the cluster (default "talos-default")
-n, --nodes strings target the specified nodes
--provisioner string Talos cluster provisioner to use (default "docker")
--state string directory path to store cluster state (default "/home/user/.talos/clusters")
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
SEE ALSO
talosctl cluster - A collection of commands for managing local docker-based or firecracker-based clusters
talosctl cluster destroy
Destroys a local docker-based or firecracker-based kubernetes cluster
talosctl cluster destroy [flags]
Options
-h, --help help for destroy
Options inherited from parent commands
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
--name string the name of the cluster (default "talos-default")
-n, --nodes strings target the specified nodes
--provisioner string Talos cluster provisioner to use (default "docker")
--state string directory path to store cluster state (default "/home/user/.talos/clusters")
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
SEE ALSO
talosctl cluster - A collection of commands for managing local docker-based or firecracker-based clusters
talosctl cluster show
Shows info about a local provisioned kubernetes cluster
talosctl cluster show [flags]
Options
-h, --help help for show
Options inherited from parent commands
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
--name string the name of the cluster (default "talos-default")
-n, --nodes strings target the specified nodes
--provisioner string Talos cluster provisioner to use (default "docker")
--state string directory path to store cluster state (default "/home/user/.talos/clusters")
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
SEE ALSO
talosctl cluster - A collection of commands for managing local docker-based or firecracker-based clusters
talosctl cluster
A collection of commands for managing local docker-based or firecracker-based clusters
Options
-h, --help help for cluster
--name string the name of the cluster (default "talos-default")
--provisioner string Talos cluster provisioner to use (default "docker")
--state string directory path to store cluster state (default "/home/user/.talos/clusters")
Options inherited from parent commands
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
Output shell completion code for the specified shell (bash or zsh)
Synopsis
Output shell completion code for the specified shell (bash or zsh).
The shell code must be evaluated to provide interactive
completion of talosctl commands. This can be done by sourcing it from
the .bash_profile.
Note for zsh users: [1] zsh completions are only supported in versions of zsh >= 5.2
talosctl completion SHELL [flags]
Examples
# Installing bash completion on macOS using homebrew
## If running Bash 3.2 included with macOS
brew install bash-completion
## or, if running Bash 4.1+
brew install bash-completion@2
## If talosctl is installed via homebrew, this should start working immediately.
## If you've installed via other means, you may need add the completion to your completion directory
talosctl completion bash > $(brew --prefix)/etc/bash_completion.d/talosctl
# Installing bash completion on Linux
## If bash-completion is not installed on Linux, please install the 'bash-completion' package
## via your distribution's package manager.
## Load the talosctl completion code for bash into the current shell
source <(talosctl completion bash)
## Write bash completion code to a file and source if from .bash_profile
talosctl completion bash > ~/.talos/completion.bash.inc
printf "
# talosctl shell completion
source '$HOME/.talos/completion.bash.inc'
" >> $HOME/.bash_profile
source $HOME/.bash_profile
# Load the talosctl completion code for zsh[1] into the current shell
source <(talosctl completion zsh)
# Set the talosctl completion code for zsh[1] to autoload on startup
talosctl completion zsh > "${fpath[1]}/_talosctl"
Options
-h, --help help for completion
Options inherited from parent commands
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl config add
Add a new context
talosctl config add <context> [flags]
Options
--ca string the path to the CA certificate
--crt string the path to the certificate
-h, --help help for add
--key string the path to the key
Options inherited from parent commands
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
Merge additional contexts from another Talos config into the default config
Synopsis
Contexts with the same name are renamed while merging configs.
talosctl config merge <from> [flags]
Options
-h, --help help for merge
Options inherited from parent commands
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
-h, --help help for containers
-k, --kubernetes use the k8s.io containerd namespace
Options inherited from parent commands
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl convert-k8s
Convert Kubernetes control plane from self-hosted (bootkube) to Talos-managed (static pods).
Synopsis
Command converts control plane bootstrapped on Talos <= 0.8 to Talos-managed control plane (Talos >= 0.9).
As part of the conversion process tool reads existing configuration of the control plane, updates
Talos node configuration to reflect changes made since the boostrap time. Once config is updated,
tool releases static pods and deletes self-hosted DaemonSets.
talosctl convert-k8s [flags]
Options
--endpoint string the cluster control plane endpoint
--force skip prompts, assume yes
-h, --help help for convert-k8s
Options inherited from parent commands
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl copy
Copy data out from the node
Synopsis
Creates an .tar.gz archive at the node starting at and
streams it back to the client.
If ‘-’ is given for , archive is written to stdout.
Otherwise archive is extracted to which should be an empty directory or
talosctl creates a directory if doesn’t exist. Command doesn’t preserve
ownership and access mode for the files in extract mode, while streamed .tar archive
captures ownership and permission bits.
talosctl copy <src-path> -|<local-path> [flags]
Options
-h, --help help for copy
Options inherited from parent commands
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl crashdump
Dump debug information about the cluster
talosctl crashdump [flags]
Options
--control-plane-nodes strings specify IPs of control plane nodes
-h, --help help for crashdump
--init-node string specify IPs of init node
--worker-nodes strings specify IPs of worker nodes
Options inherited from parent commands
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl dashboard
Cluster dashboard with real-time metrics
Synopsis
Provide quick UI to navigate through node real-time metrics.
Keyboard shortcuts:
h, : switch one node to the left
l, : switch one node to the right
j, : scroll process list down
k, : scroll process list up
: scroll process list half page down
: scroll process list half page up
: scroll process list one page down
: scroll process list one page up
talosctl dashboard [flags]
Options
-h, --help help for dashboard
-d, --update-interval duration interval between updates (default 3s)
Options inherited from parent commands
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl dmesg
Retrieve kernel logs
talosctl dmesg [flags]
Options
-f, --follow specify if the kernel log should be streamed
-h, --help help for dmesg
--tail specify if only new messages should be sent (makes sense only when combined with --follow)
Options inherited from parent commands
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl edit
Edit a resource from the default editor.
Synopsis
The edit command allows you to directly edit any API resource
you can retrieve via the command line tools.
It will open the editor defined by your TALOS_EDITOR,
or EDITOR environment variables, or fall back to ‘vi’ for Linux
or ’notepad’ for Windows.
talosctl edit <type> [<id>] [flags]
Options
-h, --help help for edit
--immediate apply the change immediately (without a reboot)
--namespace string resource namespace (default is to use default namespace per resource)
--on-reboot apply the change on next reboot
Options inherited from parent commands
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl etcd forfeit-leadership
Tell node to forfeit etcd cluster leadership
talosctl etcd forfeit-leadership [flags]
Options
-h, --help help for forfeit-leadership
Options inherited from parent commands
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
Use this command only if you want to remove a member which is in broken state.
If there is no access to the node, or the node can’t access etcd to call etcd leave.
Always prefer etcd leave over this command.
talosctl etcd remove-member <hostname> [flags]
Options
-h, --help help for remove-member
Options inherited from parent commands
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
--duration duration show events for the past duration interval (one second resolution, default is to show no history)
-h, --help help for events
--since string show events after the specified event ID (default is to show no history)
--tail int32 show specified number of past events (use -1 to show full history, default is to show no history)
Options inherited from parent commands
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl gen ca
Generates a self-signed X.509 certificate authority
talosctl gen ca [flags]
Options
-h, --help help for ca
--hours int the hours from now on which the certificate validity period ends (default 87600)
--organization string X.509 distinguished name for the Organization
--rsa generate in RSA format
Options inherited from parent commands
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
SEE ALSO
talosctl gen - Generate CAs, certificates, and private keys
talosctl gen config
Generates a set of configuration files for Talos cluster
Synopsis
The cluster endpoint is the URL for the Kubernetes API. If you decide to use
a control plane node, common in a single node control plane setup, use port 6443 as
this is the port that the API server binds to on every control plane node. For an HA
setup, usually involving a load balancer, use the IP and port of the load balancer.
talosctl gen config <cluster name> <cluster endpoint> [flags]
Options
--additional-sans strings additional Subject-Alt-Names for the APIServer certificate
--dns-domain string the dns domain to use for cluster (default "cluster.local")
-h, --help help for config
--install-disk string the disk to install to (default "/dev/sda")
--install-image string the image used to perform an installation (default "ghcr.io/talos-systems/installer:latest")
--kubernetes-version string desired kubernetes version to run
-o, --output-dir string destination to output generated files
-p, --persist the desired persist value for configs (default true)
--registry-mirror strings list of registry mirrors to use in format: <registry host>=<mirror URL>
--talos-version string the desired Talos version to generate config for (backwards compatibility, e.g. v0.8)
--version string the desired machine config version to generate (default "v1alpha1")
Options inherited from parent commands
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
SEE ALSO
talosctl gen - Generate CAs, certificates, and private keys
talosctl gen crt
Generates an X.509 Ed25519 certificate
talosctl gen crt [flags]
Options
--ca string path to the PEM encoded CERTIFICATE
--csr string path to the PEM encoded CERTIFICATE REQUEST
-h, --help help for crt
--hours int the hours from now on which the certificate validity period ends (default 24)
--name string the basename of the generated file
Options inherited from parent commands
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
SEE ALSO
talosctl gen - Generate CAs, certificates, and private keys
talosctl gen csr
Generates a CSR using an Ed25519 private key
talosctl gen csr [flags]
Options
-h, --help help for csr
--ip string generate the certificate for this IP address
--key string path to the PEM encoded EC or RSA PRIVATE KEY
Options inherited from parent commands
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
SEE ALSO
talosctl gen - Generate CAs, certificates, and private keys
talosctl gen key
Generates an Ed25519 private key
talosctl gen key [flags]
Options
-h, --help help for key
--name string the basename of the generated file
Options inherited from parent commands
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
SEE ALSO
talosctl gen - Generate CAs, certificates, and private keys
talosctl gen keypair
Generates an X.509 Ed25519 key pair
talosctl gen keypair [flags]
Options
-h, --help help for keypair
--ip string generate the certificate for this IP address
--organization string X.509 distinguished name for the Organization
Options inherited from parent commands
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
SEE ALSO
talosctl gen - Generate CAs, certificates, and private keys
talosctl gen
Generate CAs, certificates, and private keys
Options
-h, --help help for gen
Options inherited from parent commands
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl gen ca - Generates a self-signed X.509 certificate authority
talosctl gen config - Generates a set of configuration files for Talos cluster
-h, --help help for get
--namespace string resource namespace (default is to use default namespace per resource)
-o, --output string output mode (table, yaml) (default "table")
-w, --watch watch resource changes
Options inherited from parent commands
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl health
Check cluster health
talosctl health [flags]
Options
--control-plane-nodes strings specify IPs of control plane nodes
-h, --help help for health
--init-node string specify IPs of init node
--k8s-endpoint string use endpoint instead of kubeconfig default
--run-e2e run Kubernetes e2e test
--server run server-side check (default true)
--wait-timeout duration timeout to wait for the cluster to be ready (default 20m0s)
--worker-nodes strings specify IPs of worker nodes
Options inherited from parent commands
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl images
List the default images used by Talos
talosctl images [flags]
Options
-h, --help help for images
Options inherited from parent commands
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl inspect dependencies
Inspect controller-resource dependencies as graphviz graph.
Synopsis
Inspect controller-resource dependencies as graphviz graph.
Pipe the output of the command through the “dot” program (part of graphviz package)
to render the graph:
-h, --help help for dependencies
--with-resources display live resource information with dependencies
Options inherited from parent commands
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl kubeconfig
Download the admin kubeconfig from the node
Synopsis
Download the admin kubeconfig from the node.
If merge flag is defined, config will be merged with ~/.kube/config or [local-path] if specified.
Otherwise kubeconfig will be written to PWD or [local-path] if specified.
talosctl kubeconfig [local-path] [flags]
Options
-f, --force Force overwrite of kubeconfig if already present, force overwrite on kubeconfig merge
--force-context-name string Force context name for kubeconfig merge
-h, --help help for kubeconfig
-m, --merge Merge with existing kubeconfig (default true)
Options inherited from parent commands
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl list
Retrieve a directory listing
talosctl list [path] [flags]
Options
-d, --depth int32 maximum recursion depth
-h, --help help for list
-H, --humanize humanize size and time in the output
-l, --long display additional file details
-r, --recurse recurse into subdirectories
-t, --type strings filter by specified types:
f regular file
d directory
l, L symbolic link
Options inherited from parent commands
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl logs
Retrieve logs for a service
talosctl logs <service name> [flags]
Options
-f, --follow specify if the logs should be streamed
-h, --help help for logs
-k, --kubernetes use the k8s.io containerd namespace
--tail int32 lines of log file to display (default is to show from the beginning) (default -1)
Options inherited from parent commands
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl memory
Show memory usage
talosctl memory [flags]
Options
-h, --help help for memory
-v, --verbose display extended memory statistics
Options inherited from parent commands
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl mounts
List mounts
talosctl mounts [flags]
Options
-h, --help help for mounts
Options inherited from parent commands
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl patch
Update field(s) of a resource using a JSON patch.
talosctl patch <type> [<id>] [flags]
Options
-h, --help help for patch
--immediate apply the change immediately (without a reboot)
--namespace string resource namespace (default is to use default namespace per resource)
--on-reboot apply the change on next reboot
-p, --patch string the patch to be applied to the resource file.
--patch-file string a file containing a patch to be applied to the resource.
Options inherited from parent commands
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl processes
List running processes
talosctl processes [flags]
Options
-h, --help help for processes
-s, --sort string Column to sort output by. [rss|cpu] (default "rss")
-w, --watch Stream running processes
Options inherited from parent commands
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl read
Read a file on the machine
talosctl read <path> [flags]
Options
-h, --help help for read
Options inherited from parent commands
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl reboot
Reboot a node
talosctl reboot [flags]
Options
-h, --help help for reboot
Options inherited from parent commands
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl recover
Recover a control plane
talosctl recover [flags]
Options
-h, --help help for recover
-s, --source string The data source for restoring the control plane manifests from (valid options are "apiserver" and "etcd") (default "apiserver")
Options inherited from parent commands
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl reset
Reset a node
talosctl reset [flags]
Options
--graceful if true, attempt to cordon/drain node and leave etcd (if applicable) (default true)
-h, --help help for reset
--reboot if true, reboot the node after resetting instead of shutting down
--system-labels-to-wipe strings if set, just wipe selected system disk partitions by label but keep other partitions intact
Options inherited from parent commands
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl restart
Restart a process
talosctl restart <id> [flags]
Options
-h, --help help for restart
-k, --kubernetes use the k8s.io containerd namespace
Options inherited from parent commands
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl rollback
Rollback a node to the previous installation
talosctl rollback [flags]
Options
-h, --help help for rollback
Options inherited from parent commands
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl routes
List network routes
talosctl routes [flags]
Options
-h, --help help for routes
Options inherited from parent commands
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl service
Retrieve the state of a service (or all services), control service state
Synopsis
Service control command. If run without arguments, lists all the services and their state.
If service ID is specified, default action ‘status’ is executed which shows status of a single list service.
With actions ‘start’, ‘stop’, ‘restart’, service state is updated respectively.
talosctl service [<id> [start|stop|restart|status]] [flags]
Options
-h, --help help for service
Options inherited from parent commands
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl shutdown
Shutdown a node
talosctl shutdown [flags]
Options
-h, --help help for shutdown
Options inherited from parent commands
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl stats
Get container stats
talosctl stats [flags]
Options
-h, --help help for stats
-k, --kubernetes use the k8s.io containerd namespace
Options inherited from parent commands
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl time
Gets current server time
talosctl time [--check server] [flags]
Options
-c, --check string checks server time against specified ntp server
-h, --help help for time
Options inherited from parent commands
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl upgrade
Upgrade Talos on the target node
talosctl upgrade [flags]
Options
-f, --force force the upgrade (skip checks on etcd health and members, might lead to data loss)
-h, --help help for upgrade
-i, --image string the container image to use for performing the install
-p, --preserve preserve data
-s, --stage stage the upgrade to perform it after a reboot
Options inherited from parent commands
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl upgrade-k8s
Upgrade Kubernetes control plane in the Talos cluster.
Synopsis
Command runs upgrade of Kubernetes control plane components between specified versions. Pod-checkpointer is handled in a special way to speed up kube-apisever upgrades.
talosctl upgrade-k8s [flags]
Options
--endpoint string the cluster control plane endpoint
--from string the Kubernetes control plane version to upgrade from
-h, --help help for upgrade-k8s
--to string the Kubernetes control plane version to upgrade to (default "1.20.5")
Options inherited from parent commands
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
-a, --all write counts for all files, not just directories
-d, --depth int32 maximum recursion depth
-h, --help help for usage
-H, --humanize humanize size and time in the output
-t, --threshold int threshold exclude entries smaller than SIZE if positive, or entries greater than SIZE if negative
Options inherited from parent commands
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl validate
Validate config
talosctl validate [flags]
Options
-c, --config string the path of the config file
-h, --help help for validate
-m, --mode string the mode to validate the config for (valid values are metal, cloud, and container)
Options inherited from parent commands
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl version
Prints the version
talosctl version [flags]
Options
--client Print client version only
-h, --help help for version
--short Print the short version
Options inherited from parent commands
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
SEE ALSO
talosctl - A CLI for out-of-band management of Kubernetes nodes created by Talos
talosctl
A CLI for out-of-band management of Kubernetes nodes created by Talos
Options
--context string Context to be used in command
-e, --endpoints strings override default endpoints in Talos configuration
-h, --help help for talosctl
-n, --nodes strings target the specified nodes
--talosconfig string The path to the Talos configuration file (default "/home/user/.talos/config")
type: controlplane
# InstallConfig represents the installation options for preparing a node.install:
disk: /dev/sda # The disk used for installations.# Allows for supplying extra kernel args via the bootloader.extraKernelArgs:
- console=ttyS1
- panic=10
image: ghcr.io/talos-systems/installer:latest # Allows for supplying the image used to perform the installation.bootloader: true# Indicates if a bootloader should be installed.wipe: false# Indicates if the installation disk should be wiped at installation time.
typestring
Defines the role of the machine within the cluster.
Init
Init node type designates the first control plane node to come up.
You can think of it like a bootstrap node.
This node will perform the initial steps to bootstrap the cluster – generation of TLS assets, starting of the control plane, etc.
Control Plane
Control Plane node type designates the node as a control plane member.
This means it will host etcd along with the Kubernetes master components such as API Server, Controller Manager, Scheduler.
Worker
Worker node type designates the node as a worker node.
This means it will be an available compute node for scheduling workloads.
Valid values:
init
controlplane
join
tokenstring
The token is used by a machine to join the PKI of the cluster.
Using this token, a machine will create a certificate signing request (CSR), and request a certificate that will be used as its’ identity.
Warning: It is important to ensure that this token is correct since a machine’s certificate has a short TTL by default.
Examples:
token: 328hom.uqjzh6jnn2eie9oi
caPEMEncodedCertificateAndKey
The root certificate authority of the PKI.
It is composed of a base64 encoded crt and key.
Extra certificate subject alternative names for the machine’s certificate.
By default, all non-loopback interface IPs are automatically added to the certificate’s SANs.
Used to provide additional options to the kubelet.
Examples:
kubelet:
image: ghcr.io/talos-systems/kubelet:v1.20.5 # The `image` field is an optional reference to an alternative kubelet image.# The `extraArgs` field is used to provide additional flags to the kubelet.extraArgs:
feature-gates: ServerSideApply=true
# # The `extraMounts` field is used to add additional mounts to the kubelet container.# extraMounts:# - destination: /var/lib/example# type: bind# source: /var/lib/example# options:# - rshared# - rw
Provides machine specific network configuration options.
Examples:
network:
hostname: worker-1 # Used to statically set the hostname for the machine.# `interfaces` is used to define the network interface configuration.interfaces:
- interface: eth0 # The interface name.cidr: 192.168.2.0/24 # Assigns a static IP address to the interface.# A list of routes associated with the interface.routes:
- network: 0.0.0.0/0 # The route's network.gateway: 192.168.2.1# The route's gateway.metric: 1024# The optional metric for the route.mtu: 1500# The interface's MTU.# # Bond specific options.# bond:# # The interfaces that make up the bond.# interfaces:# - eth0# - eth1# mode: 802.3ad # A bond option.# lacpRate: fast # A bond option.# # Indicates if DHCP should be used to configure the interface.# dhcp: true# # DHCP specific options.# dhcpOptions:# routeMetric: 1024 # The priority of all routes received via DHCP.# # Wireguard specific configuration.# # wireguard server example# wireguard:# privateKey: ABCDEF... # Specifies a private key configuration (base64 encoded).# listenPort: 51111 # Specifies a device's listening port.# # Specifies a list of peer configurations to apply to a device.# peers:# - publicKey: ABCDEF... # Specifies the public key of this peer.# endpoint: 192.168.1.3 # Specifies the endpoint of this peer entry.# # AllowedIPs specifies a list of allowed IP addresses in CIDR notation for this peer.# allowedIPs:# - 192.168.1.0/24# # wireguard peer example# wireguard:# privateKey: ABCDEF... # Specifies a private key configuration (base64 encoded).# # Specifies a list of peer configurations to apply to a device.# peers:# - publicKey: ABCDEF... # Specifies the public key of this peer.# endpoint: 192.168.1.2 # Specifies the endpoint of this peer entry.# persistentKeepaliveInterval: 10s # Specifies the persistent keepalive interval for this peer.# # AllowedIPs specifies a list of allowed IP addresses in CIDR notation for this peer.# allowedIPs:# - 192.168.1.0/24# # Virtual (shared) IP address configuration.# vip:# ip: 172.16.199.55 # Specifies the IP address to be used.# Used to statically set the nameservers for the machine.nameservers:
- 9.8.7.6 - 8.7.6.5# # Allows for extra entries to be added to the `/etc/hosts` file# extraHostEntries:# - ip: 192.168.1.100 # The IP of the host.# # The host alias.# aliases:# - example# - example.domain.tld
Used to partition, format and mount additional disks.
Since the rootfs is read only with the exception of /var, mounts are only valid if they are under /var.
Note that the partitioning and formating is done only once, if and only if no existing partitions are found.
If size: is omitted, the partition is sized to occupy the full disk.
Note: size is in units of bytes.
Examples:
disks:
- device: /dev/sdb # The name of the disk to use.# A list of partitions to create on the disk.partitions:
- mountpoint: /var/mnt/extra # Where to mount the partition.# # The size of partition: either bytes or human readable representation. If `size:` is omitted, the partition is sized to occupy the full disk.# # Human readable representation.# size: 100 MB# # Precise value in bytes.# size: 1073741824
install:
disk: /dev/sda # The disk used for installations.# Allows for supplying extra kernel args via the bootloader.extraKernelArgs:
- console=ttyS1
- panic=10
image: ghcr.io/talos-systems/installer:latest # Allows for supplying the image used to perform the installation.bootloader: true# Indicates if a bootloader should be installed.wipe: false# Indicates if the installation disk should be wiped at installation time.
Allows the addition of user specified files.
The value of op can be create, overwrite, or append.
In the case of create, path must not exist.
In the case of overwrite, and append, path must be a valid file.
If an op value of append is used, the existing file will be appended.
Note that the file contents are not required to be base64 encoded.
Note: The specified path is relative to /var.
Examples:
files:
- content: '...'# The contents of the file.permissions: 0o666# The file's permissions in octal.path: /tmp/file.txt # The path of the file.op: append # The operation to use
envEnv
The env field allows for the addition of environment variables.
All environment variables are set on PID 1 in addition to every service.
Valid values:
GRPC_GO_LOG_VERBOSITY_LEVEL
GRPC_GO_LOG_SEVERITY_LEVEL
http_proxy
https_proxy
no_proxy
Examples:
env:
GRPC_GO_LOG_SEVERITY_LEVEL: info
GRPC_GO_LOG_VERBOSITY_LEVEL: "99"https_proxy: http://SERVER:PORT/
time:
disabled: false# Indicates if the time service is disabled for the machine.# Specifies time (NTP) servers to use for setting the system time.servers:
- time.cloudflare.com
Used to configure the machine’s container image registry mirrors.
Automatically generates matching CRI configuration for registry mirrors.
The mirrors section allows to redirect requests for images to non-default registry,
which might be local registry or caching mirror.
The config section provides a way to authenticate to the registry with TLS client
identity, provide registry CA, or authentication information.
Authentication information has same meaning with the corresponding field in .docker/config.json.
registries:
# Specifies mirror configuration for each registry.mirrors:
docker.io:
# List of endpoints (URLs) for registry mirrors to use.endpoints:
- https://registry.local
# Specifies TLS & auth configuration for HTTPS image registries.config:
registry.local:
# The TLS configuration for the registry.tls:
# Enable mutual TLS authentication with the registry.clientIdentity:
crt: TFMwdExTMUNSVWRKVGlCRFJWSlVTVVpKUTBGVVJTMHRMUzB0Q2sxSlNVSklla05DTUhGLi4u
key: TFMwdExTMUNSVWRKVGlCRlJESTFOVEU1SUZCU1NWWkJWRVVnUzBWWkxTMHRMUzBLVFVNLi4u
# The auth configuration for this registry.auth:
username: username # Optional registry authentication.password: password # Optional registry authentication.
Machine system disk encryption configuration.
Defines each system partition encryption parameters.
Examples:
systemDiskEncryption:
# Ephemeral partition encryption.ephemeral:
provider: luks2 # Encryption provider to use for the encryption.# Defines the encryption keys generation and storage method.keys:
- # Deterministically generated key from the node UUID and PartitionLabel.nodeID: {}
slot: 0# Key slot number for luks2 encryption.
ClusterConfig
ClusterConfig represents the cluster-wide config values.
# ControlPlaneConfig represents the control plane configuration options.controlPlane:
endpoint: https://1.2.3.4 # Endpoint is the canonical controlplane endpoint, which can be an IP address or a DNS hostname.localAPIServerPort: 443# The port that the API server listens on internally.clusterName: talos.local
# ClusterNetworkConfig represents kube networking configuration options.network:
# The CNI used.cni:
name: flannel # Name of CNI to use.dnsDomain: cluster.local # The domain used by Kubernetes DNS.# The pod subnet CIDR.podSubnets:
- 10.244.0.0/16
# The service subnet CIDR.serviceSubnets:
- 10.96.0.0/12
Provides control plane specific configuration options.
Examples:
controlPlane:
endpoint: https://1.2.3.4 # Endpoint is the canonical controlplane endpoint, which can be an IP address or a DNS hostname.localAPIServerPort: 443# The port that the API server listens on internally.
Provides cluster specific network configuration options.
Examples:
network:
# The CNI used.cni:
name: flannel # Name of CNI to use.dnsDomain: cluster.local # The domain used by Kubernetes DNS.# The pod subnet CIDR.podSubnets:
- 10.244.0.0/16
# The service subnet CIDR.serviceSubnets:
- 10.96.0.0/12
apiServer:
image: k8s.gcr.io/kube-apiserver:v1.20.5 # The container image used in the API server manifest.# Extra arguments to supply to the API server.extraArgs:
feature-gates: ServerSideApply=true
http2-max-streams-per-connection: "32"# Extra certificate subject alternative names for the API server's certificate.certSANs:
- 1.2.3.4 - 4.5.6.7
Controller manager server specific configuration options.
Examples:
controllerManager:
image: k8s.gcr.io/kube-controller-manager:v1.20.5 # The container image used in the controller manager manifest.# Extra arguments to supply to the controller manager.extraArgs:
feature-gates: ServerSideApply=true
proxy:
image: k8s.gcr.io/kube-proxy:v1.20.5 # The container image used in the kube-proxy manifest.mode: ipvs # proxy mode of kube-proxy.# Extra arguments to supply to kube-proxy.extraArgs:
proxy-mode: iptables
scheduler:
image: k8s.gcr.io/kube-scheduler:v1.20.5 # The container image used in the scheduler manifest.# Extra arguments to supply to the scheduler.extraArgs:
feature-gates: AllBeta=true
etcd:
image: gcr.io/etcd-development/etcd:v3.4.15 # The container image used to create the etcd service.# The `ca` is the root certificate authority of the PKI.ca:
crt: TFMwdExTMUNSVWRKVGlCRFJWSlVTVVpKUTBGVVJTMHRMUzB0Q2sxSlNVSklla05DTUhGLi4u
key: TFMwdExTMUNSVWRKVGlCRlJESTFOVEU1SUZCU1NWWkJWRVVnUzBWWkxTMHRMUzBLVFVNLi4u
# Extra arguments to supply to etcd.extraArgs:
election-timeout: "5000"
image: ghcr.io/talos-systems/kubelet:v1.20.5 # The `image` field is an optional reference to an alternative kubelet image.# The `extraArgs` field is used to provide additional flags to the kubelet.extraArgs:
feature-gates: ServerSideApply=true
# # The `extraMounts` field is used to add additional mounts to the kubelet container.# extraMounts:# - destination: /var/lib/example# type: bind# source: /var/lib/example# options:# - rshared# - rw
imagestring
The image field is an optional reference to an alternative kubelet image.
Examples:
image: ghcr.io/talos-systems/kubelet:v1.20.5
extraArgsmap[string]string
The extraArgs field is used to provide additional flags to the kubelet.
Examples:
extraArgs:
key: value
extraMounts[]Mount
The extraMounts field is used to add additional mounts to the kubelet container.
hostname: worker-1 # Used to statically set the hostname for the machine.# `interfaces` is used to define the network interface configuration.interfaces:
- interface: eth0 # The interface name.cidr: 192.168.2.0/24 # Assigns a static IP address to the interface.# A list of routes associated with the interface.routes:
- network: 0.0.0.0/0 # The route's network.gateway: 192.168.2.1# The route's gateway.metric: 1024# The optional metric for the route.mtu: 1500# The interface's MTU.# # Bond specific options.# bond:# # The interfaces that make up the bond.# interfaces:# - eth0# - eth1# mode: 802.3ad # A bond option.# lacpRate: fast # A bond option.# # Indicates if DHCP should be used to configure the interface.# dhcp: true# # DHCP specific options.# dhcpOptions:# routeMetric: 1024 # The priority of all routes received via DHCP.# # Wireguard specific configuration.# # wireguard server example# wireguard:# privateKey: ABCDEF... # Specifies a private key configuration (base64 encoded).# listenPort: 51111 # Specifies a device's listening port.# # Specifies a list of peer configurations to apply to a device.# peers:# - publicKey: ABCDEF... # Specifies the public key of this peer.# endpoint: 192.168.1.3 # Specifies the endpoint of this peer entry.# # AllowedIPs specifies a list of allowed IP addresses in CIDR notation for this peer.# allowedIPs:# - 192.168.1.0/24# # wireguard peer example# wireguard:# privateKey: ABCDEF... # Specifies a private key configuration (base64 encoded).# # Specifies a list of peer configurations to apply to a device.# peers:# - publicKey: ABCDEF... # Specifies the public key of this peer.# endpoint: 192.168.1.2 # Specifies the endpoint of this peer entry.# persistentKeepaliveInterval: 10s # Specifies the persistent keepalive interval for this peer.# # AllowedIPs specifies a list of allowed IP addresses in CIDR notation for this peer.# allowedIPs:# - 192.168.1.0/24# # Virtual (shared) IP address configuration.# vip:# ip: 172.16.199.55 # Specifies the IP address to be used.# Used to statically set the nameservers for the machine.nameservers:
- 9.8.7.6 - 8.7.6.5# # Allows for extra entries to be added to the `/etc/hosts` file# extraHostEntries:# - ip: 192.168.1.100 # The IP of the host.# # The host alias.# aliases:# - example# - example.domain.tld
hostnamestring
Used to statically set the hostname for the machine.
interfaces is used to define the network interface configuration.
By default all network interfaces will attempt a DHCP discovery.
This can be further tuned through this configuration parameter.
Examples:
interfaces:
- interface: eth0 # The interface name.cidr: 192.168.2.0/24 # Assigns a static IP address to the interface.# A list of routes associated with the interface.routes:
- network: 0.0.0.0/0 # The route's network.gateway: 192.168.2.1# The route's gateway.metric: 1024# The optional metric for the route.mtu: 1500# The interface's MTU.# # Bond specific options.# bond:# # The interfaces that make up the bond.# interfaces:# - eth0# - eth1# mode: 802.3ad # A bond option.# lacpRate: fast # A bond option.# # Indicates if DHCP should be used to configure the interface.# dhcp: true# # DHCP specific options.# dhcpOptions:# routeMetric: 1024 # The priority of all routes received via DHCP.# # Wireguard specific configuration.# # wireguard server example# wireguard:# privateKey: ABCDEF... # Specifies a private key configuration (base64 encoded).# listenPort: 51111 # Specifies a device's listening port.# # Specifies a list of peer configurations to apply to a device.# peers:# - publicKey: ABCDEF... # Specifies the public key of this peer.# endpoint: 192.168.1.3 # Specifies the endpoint of this peer entry.# # AllowedIPs specifies a list of allowed IP addresses in CIDR notation for this peer.# allowedIPs:# - 192.168.1.0/24# # wireguard peer example# wireguard:# privateKey: ABCDEF... # Specifies a private key configuration (base64 encoded).# # Specifies a list of peer configurations to apply to a device.# peers:# - publicKey: ABCDEF... # Specifies the public key of this peer.# endpoint: 192.168.1.2 # Specifies the endpoint of this peer entry.# persistentKeepaliveInterval: 10s # Specifies the persistent keepalive interval for this peer.# # AllowedIPs specifies a list of allowed IP addresses in CIDR notation for this peer.# allowedIPs:# - 192.168.1.0/24# # Virtual (shared) IP address configuration.# vip:# ip: 172.16.199.55 # Specifies the IP address to be used.
nameservers[]string
Used to statically set the nameservers for the machine.
Defaults to 1.1.1.1 and 8.8.8.8
disk: /dev/sda # The disk used for installations.# Allows for supplying extra kernel args via the bootloader.extraKernelArgs:
- console=ttyS1
- panic=10
image: ghcr.io/talos-systems/installer:latest # Allows for supplying the image used to perform the installation.bootloader: true# Indicates if a bootloader should be installed.wipe: false# Indicates if the installation disk should be wiped at installation time.
diskstring
The disk used for installations.
Examples:
disk: /dev/sda
disk: /dev/nvme0
extraKernelArgs[]string
Allows for supplying extra kernel args via the bootloader.
disabled: false# Indicates if the time service is disabled for the machine.# Specifies time (NTP) servers to use for setting the system time.servers:
- time.cloudflare.com
disabledbool
Indicates if the time service is disabled for the machine.
Defaults to false.
servers[]string
Specifies time (NTP) servers to use for setting the system time.
Defaults to pool.ntp.org
This parameter only supports a single time server.
RegistriesConfig
RegistriesConfig represents the image pull options.
# Specifies mirror configuration for each registry.mirrors:
docker.io:
# List of endpoints (URLs) for registry mirrors to use.endpoints:
- https://registry.local
# Specifies TLS & auth configuration for HTTPS image registries.config:
registry.local:
# The TLS configuration for the registry.tls:
# Enable mutual TLS authentication with the registry.clientIdentity:
crt: TFMwdExTMUNSVWRKVGlCRFJWSlVTVVpKUTBGVVJTMHRMUzB0Q2sxSlNVSklla05DTUhGLi4u
key: TFMwdExTMUNSVWRKVGlCRlJESTFOVEU1SUZCU1NWWkJWRVVnUzBWWkxTMHRMUzBLVFVNLi4u
# The auth configuration for this registry.auth:
username: username # Optional registry authentication.password: password # Optional registry authentication.
Specifies mirror configuration for each registry.
This setting allows to use local pull-through caching registires,
air-gapped installations, etc.
Registry name is the first segment of image identifier, with ‘docker.io’
being default one.
To catch any registry names not specified explicitly, use ‘*’.
Examples:
mirrors:
ghcr.io:
# List of endpoints (URLs) for registry mirrors to use.endpoints:
- https://registry.insecure
- https://ghcr.io/v2/
endpoint: https://1.2.3.4 # Endpoint is the canonical controlplane endpoint, which can be an IP address or a DNS hostname.localAPIServerPort: 443# The port that the API server listens on internally.
Endpoint is the canonical controlplane endpoint, which can be an IP address or a DNS hostname.
It is single-valued, and may optionally include a port number.
Examples:
endpoint: https://1.2.3.4:6443
endpoint: https://cluster1.internal:6443
localAPIServerPortint
The port that the API server listens on internally.
This may be different than the port portion listed in the endpoint field above.
The default is 6443.
APIServerConfig
APIServerConfig represents the kube apiserver configuration options.
image: k8s.gcr.io/kube-apiserver:v1.20.5 # The container image used in the API server manifest.# Extra arguments to supply to the API server.extraArgs:
feature-gates: ServerSideApply=true
http2-max-streams-per-connection: "32"# Extra certificate subject alternative names for the API server's certificate.certSANs:
- 1.2.3.4 - 4.5.6.7
imagestring
The container image used in the API server manifest.
image: k8s.gcr.io/kube-controller-manager:v1.20.5 # The container image used in the controller manager manifest.# Extra arguments to supply to the controller manager.extraArgs:
feature-gates: ServerSideApply=true
imagestring
The container image used in the controller manager manifest.
Examples:
image: k8s.gcr.io/kube-controller-manager:v1.20.5
extraArgsmap[string]string
Extra arguments to supply to the controller manager.
image: k8s.gcr.io/kube-proxy:v1.20.5 # The container image used in the kube-proxy manifest.mode: ipvs # proxy mode of kube-proxy.# Extra arguments to supply to kube-proxy.extraArgs:
proxy-mode: iptables
disabledbool
Disable kube-proxy deployment on cluster bootstrap.
Examples:
disabled: false
imagestring
The container image used in the kube-proxy manifest.
Examples:
image: k8s.gcr.io/kube-proxy:v1.20.5
modestring
proxy mode of kube-proxy.
The default is ‘iptables’.
extraArgsmap[string]string
Extra arguments to supply to kube-proxy.
SchedulerConfig
SchedulerConfig represents the kube scheduler configuration options.
image: k8s.gcr.io/kube-scheduler:v1.20.5 # The container image used in the scheduler manifest.# Extra arguments to supply to the scheduler.extraArgs:
feature-gates: AllBeta=true
imagestring
The container image used in the scheduler manifest.
image: gcr.io/etcd-development/etcd:v3.4.15 # The container image used to create the etcd service.# The `ca` is the root certificate authority of the PKI.ca:
crt: TFMwdExTMUNSVWRKVGlCRFJWSlVTVVpKUTBGVVJTMHRMUzB0Q2sxSlNVSklla05DTUhGLi4u
key: TFMwdExTMUNSVWRKVGlCRlJESTFOVEU1SUZCU1NWWkJWRVVnUzBWWkxTMHRMUzBLVFVNLi4u
# Extra arguments to supply to etcd.extraArgs:
election-timeout: "5000"
imagestring
The container image used to create the etcd service.
Examples:
image: gcr.io/etcd-development/etcd:v3.4.15
caPEMEncodedCertificateAndKey
The ca is the root certificate authority of the PKI.
It is composed of a base64 encoded crt and key.
# The CNI used.cni:
name: flannel # Name of CNI to use.dnsDomain: cluster.local # The domain used by Kubernetes DNS.# The pod subnet CIDR.podSubnets:
- 10.244.0.0/16
# The service subnet CIDR.serviceSubnets:
- 10.96.0.0/12
The CNI used.
Composed of “name” and “url”.
The “name” key only supports options of “flannel” or “custom”.
URLs is only used if name is equal to “custom”.
URLs should point to the set of YAML files to be deployed.
An empty struct or any other name will default to Flannel CNI.
Examples:
cni:
name: custom # Name of CNI to use.# URLs containing manifests to apply for the CNI.urls:
- https://raw.githubusercontent.com/cilium/cilium/v1.8/install/kubernetes/quick-install.yaml
dnsDomainstring
The domain used by Kubernetes DNS.
The default is cluster.local
Examples:
dnsDomain: cluser.local
podSubnets[]string
The pod subnet CIDR.
Examples:
podSubnets:
- 10.244.0.0/16
serviceSubnets[]string
The service subnet CIDR.
Examples:
serviceSubnets:
- 10.96.0.0/12
CNIConfig
CNIConfig represents the CNI configuration options.
name: custom # Name of CNI to use.# URLs containing manifests to apply for the CNI.urls:
- https://raw.githubusercontent.com/cilium/cilium/v1.8/install/kubernetes/quick-install.yaml
Admin kubeconfig certificate lifetime (default is 1 year).
Field format accepts any Go time.Duration format (‘1h’ for one hour, ‘10m’ for ten minutes).
MachineDisk
MachineDisk represents the options available for partitioning, formatting, and
mounting extra disks.
- device: /dev/sdb # The name of the disk to use.# A list of partitions to create on the disk.partitions:
- mountpoint: /var/mnt/extra # Where to mount the partition.# # The size of partition: either bytes or human readable representation. If `size:` is omitted, the partition is sized to occupy the full disk.# # Human readable representation.# size: 100 MB# # Precise value in bytes.# size: 1073741824
- content: '...'# The contents of the file.permissions: 0o666# The file's permissions in octal.path: /tmp/file.txt # The path of the file.op: append # The operation to use
- interface: eth0 # The interface name.cidr: 192.168.2.0/24 # Assigns a static IP address to the interface.# A list of routes associated with the interface.routes:
- network: 0.0.0.0/0 # The route's network.gateway: 192.168.2.1# The route's gateway.metric: 1024# The optional metric for the route.mtu: 1500# The interface's MTU.# # Bond specific options.# bond:# # The interfaces that make up the bond.# interfaces:# - eth0# - eth1# mode: 802.3ad # A bond option.# lacpRate: fast # A bond option.# # Indicates if DHCP should be used to configure the interface.# dhcp: true# # DHCP specific options.# dhcpOptions:# routeMetric: 1024 # The priority of all routes received via DHCP.# # Wireguard specific configuration.# # wireguard server example# wireguard:# privateKey: ABCDEF... # Specifies a private key configuration (base64 encoded).# listenPort: 51111 # Specifies a device's listening port.# # Specifies a list of peer configurations to apply to a device.# peers:# - publicKey: ABCDEF... # Specifies the public key of this peer.# endpoint: 192.168.1.3 # Specifies the endpoint of this peer entry.# # AllowedIPs specifies a list of allowed IP addresses in CIDR notation for this peer.# allowedIPs:# - 192.168.1.0/24# # wireguard peer example# wireguard:# privateKey: ABCDEF... # Specifies a private key configuration (base64 encoded).# # Specifies a list of peer configurations to apply to a device.# peers:# - publicKey: ABCDEF... # Specifies the public key of this peer.# endpoint: 192.168.1.2 # Specifies the endpoint of this peer entry.# persistentKeepaliveInterval: 10s # Specifies the persistent keepalive interval for this peer.# # AllowedIPs specifies a list of allowed IP addresses in CIDR notation for this peer.# allowedIPs:# - 192.168.1.0/24# # Virtual (shared) IP address configuration.# vip:# ip: 172.16.199.55 # Specifies the IP address to be used.
interfacestring
The interface name.
Examples:
interface: eth0
cidrstring
Assigns a static IP address to the interface.
This should be in proper CIDR notation.
Note: This option is mutually exclusive with DHCP option.
The interface’s MTU.
If used in combination with DHCP, this will override any MTU settings returned from DHCP server.
dhcpbool
Indicates if DHCP should be used to configure the interface.
The following DHCP options are supported:
OptionClasslessStaticRoute
OptionDomainNameServer
OptionDNSDomainSearchList
OptionHostName
Note: This option is mutually exclusive with CIDR.
Note: To configure an interface with only IPv6 SLAAC addressing, CIDR should be set to "" and DHCP to false
in order for Talos to skip configuration of addresses.
All other options will still apply.
Examples:
dhcp: true
ignorebool
Indicates if the interface should be ignored (skips configuration).
dummybool
Indicates if the interface is a dummy interface.
dummy is used to specify that this interface should be a virtual-only, dummy interface.
Wireguard specific configuration.
Includes things like private key, listen port, peers.
Examples:
wireguard:
privateKey: ABCDEF... # Specifies a private key configuration (base64 encoded).listenPort: 51111# Specifies a device's listening port.# Specifies a list of peer configurations to apply to a device.peers:
- publicKey: ABCDEF... # Specifies the public key of this peer.endpoint: 192.168.1.3# Specifies the endpoint of this peer entry.# AllowedIPs specifies a list of allowed IP addresses in CIDR notation for this peer.allowedIPs:
- 192.168.1.0/24
wireguard:
privateKey: ABCDEF... # Specifies a private key configuration (base64 encoded).# Specifies a list of peer configurations to apply to a device.peers:
- publicKey: ABCDEF... # Specifies the public key of this peer.endpoint: 192.168.1.2# Specifies the endpoint of this peer entry.persistentKeepaliveInterval: 10s # Specifies the persistent keepalive interval for this peer.# AllowedIPs specifies a list of allowed IP addresses in CIDR notation for this peer.allowedIPs:
- 192.168.1.0/24
privateKey: ABCDEF... # Specifies a private key configuration (base64 encoded).listenPort: 51111# Specifies a device's listening port.# Specifies a list of peer configurations to apply to a device.peers:
- publicKey: ABCDEF... # Specifies the public key of this peer.endpoint: 192.168.1.3# Specifies the endpoint of this peer entry.# AllowedIPs specifies a list of allowed IP addresses in CIDR notation for this peer.allowedIPs:
- 192.168.1.0/24
privateKey: ABCDEF... # Specifies a private key configuration (base64 encoded).# Specifies a list of peer configurations to apply to a device.peers:
- publicKey: ABCDEF... # Specifies the public key of this peer.endpoint: 192.168.1.2# Specifies the endpoint of this peer entry.persistentKeepaliveInterval: 10s # Specifies the persistent keepalive interval for this peer.# AllowedIPs specifies a list of allowed IP addresses in CIDR notation for this peer.allowedIPs:
- 192.168.1.0/24
privateKeystring
Specifies a private key configuration (base64 encoded).
Can be generated by wg genkey.
Specifies the public key of this peer.
Can be extracted from private key by running wg pubkey < private.key > public.key && cat public.key.
endpointstring
Specifies the endpoint of this peer entry.
persistentKeepaliveIntervalDuration
Specifies the persistent keepalive interval for this peer.
Field format accepts any Go time.Duration format (‘1h’ for one hour, ‘10m’ for ten minutes).
allowedIPs[]string
AllowedIPs specifies a list of allowed IP addresses in CIDR notation for this peer.
DeviceVIPConfig
DeviceVIPConfig contains settings for configuring a Virtual Shared IP on an interface.
ghcr.io:
# List of endpoints (URLs) for registry mirrors to use.endpoints:
- https://registry.insecure
- https://ghcr.io/v2/
endpoints[]string
List of endpoints (URLs) for registry mirrors to use.
Endpoint configures HTTP/HTTPS access mode, host name,
port and path (if path is not set, it defaults to /v2).
RegistryConfig
RegistryConfig specifies auth & TLS config per registry.
# Ephemeral partition encryption.ephemeral:
provider: luks2 # Encryption provider to use for the encryption.# Defines the encryption keys generation and storage method.keys:
- # Deterministically generated key from the node UUID and PartitionLabel.nodeID: {}
slot: 0# Key slot number for luks2 encryption.
Talos supports a number of kernel commandline parameters. Some are required for
it to operate. Others are optional and useful in certain circumstances.
Several of these are enforced by the Kernel Self Protection Project KSPP.
Required parameters:
talos.config: the HTTP(S) URL at which the machine configuration data can be found
talos.platform: can be one of aws, azure, container, digitalocean, gcp, metal, packet, or vmware
init_on_alloc=1: required by KSPP
slab_nomerge: required by KSPP
pti=on: required by KSPP
Recommended parameters:
init_on_free=1: advised by KSPP if minimizing stale data lifetime is
important
Available Talos-specific parameters
panic
The amount of time to wait after a panic before a reboot is issued.
Talos will always reboot if it encounters an unrecoverable error.
However, when collecting debug information, it may reboot too quickly for
humans to read the logs.
This option allows the user to delay the reboot to give time to collect debug
information from the console screen.
A value of 0 disables automtic rebooting entirely.
talos.config
The URL at which the machine configuration data may be found.
The board name, if Talos is being used on an ARM64 SBC.
Supported boards are:
- bananapi_m64: Banana Pi M64
- libretech_all_h3_cc_h5: Libre Computer ALL-H3-CC
- rock64: Pine64 Rock64
- rpi_4: Raspberry Pi 4, Model B
talos.hostname
The hostname to be used.
The hostname is generally specified in the machine config.
However, in some cases, the DHCP server needs to know the hostname
before the machine configuration has been acquired.
Unless specifically required, the machine configuration should be used
instead.
talos.interface
The network interface to use for pre-configuration booting.
If the node has multiple network interfaces, you may specify which interface
to use by setting this option.
Keep in mind that Talos uses indexed interface names (eth0, eth1, etc) and not
“predictable” interface names (enp2s0) or BIOS-enumerated (eno1) names.
talos.shutdown
The type of shutdown to use when Talos is told to shutdown.
Valid options are:
- halt
- poweroff
talos.network.interface.ignore
A network interface which should be ignored and not configured by Talos.
Before a configuration is applied (early on each boot), Talos attempts to
configure each network interface by DHCP.
If there are many network interfaces on the machine which have link but no
DHCP server, this can add significant boot delays.
This option may be specified multiple times for multiple network interfaces.
8.5 - Platform
Metal
Below is a image to visualize the process of bootstrapping nodes.
9 - Learn More
9.1 - Philosophy
Distributed
Talos is intended to be operated in a distributed manner.
That is, it is built for a high-availability dataplane first.
Its etcd cluster is built in an ad-hoc manner, with each appointed node joining on its own directive (with proper security validations enforced, of course).
Like as kubernetes itself, workloads are intended to be distributed across any number of compute nodes.
There should be no single points of failure, and the level of required coordination is as low as each platform allows.
Immutable
Talos takes immutability very seriously.
Talos itself, even when installed on a disk, always runs from a SquashFS image, meaning that even if a directory is mounted to be writable, the image itself is never modified.
All images are signed and delivered as single, versioned files.
We can always run integrity checks on our image to verify that it has not been modified.
While Talos does allow a few, highly-controlled write points to the filesystem, we strive to make them as non-unique and non-critical as possible.
In fact, we call the writable partition the “ephemeral” partition precisely because we want to make sure none of us ever uses it for unique, non-replicated, non-recreatable data.
Thus, if all else fails, we can always wipe the disk and get back up and running.
Minimal
We are always trying to reduce and keep small Talos’ footprint.
Because nearly the entire OS is built from scratch in Go, we are already
starting out in a good position.
We have no shell.
We have no SSH.
We have none of the GNU utilities, not even a rollup tool such as busybox.
Everything which is included in Talos is there because it is necessary, and
nothing is included which isn’t.
As a result, the OS right now produces a SquashFS image size of less than 80 MB.
Ephemeral
Everything Talos writes to its disk is either replicated or reconstructable.
Since the controlplane is high availability, the loss of any node will cause
neither service disruption nor loss of data.
No writes are even allowed to the vast majority of the filesystem.
We even call the writable partition “ephemeral” to keep this idea always in
focus.
Secure
Talos has always been designed with security in mind.
With its immutability, its minimalism, its signing, and its componenture, we are
able to simply bypass huge classes of vulnerabilities.
Moreover, because of the way we have designed Talos, we are able to take
advantage of a number of additional settings, such as the recommendations of the Kernel Self Protection Project (kspp) and the complete disablement of dynamic modules.
There are no passwords in Talos.
All networked communication is encrypted and key-authenticated.
The Talos certificates are short-lived and automatically-rotating.
Kubernetes is always constructed with its own separate PKI structure which is
enforced.
Declarative
Everything which can be configured in Talos is done so through a single YAML
manifest.
There is no scripting and no procedural steps.
Everything is defined by the one declarative YAML file.
This configuration includes that of both Talos itself and the Kubernetes which
it forms.
This is achievable because Talos is tightly focused to do one thing: run
kubernetes, in the easiest, most secure, most reliable way it can.
9.2 - Concepts
Platform
Mode
Endpoint
Node
9.3 - Architecture
Talos is designed to be atomic in deployment and modular in composition.
It is atomic in the sense that the entirety of Talos is distributed as a
single, self-contained image, which is versioned, signed, and immutable.
It is modular in the sense that it is composed of many separate components
which have clearly defined gRPC interfaces which facilitate internal flexibility
and external operational guarantees.
There are a number of components which comprise Talos.
All of the main Talos components communicate with each other by gRPC, through a socket on the local machine.
This imposes a clear separation of concerns and ensures that changes over time which affect the interoperation of components are a part of the public git record.
The benefit is that each component may be iterated and changed as its needs dictate, so long as the external API is controlled.
This is a key component in reducing coupling and maintaining modularity.
The File System
One of the more unique design decisions in Talos is the layout of the root file system.
There are three “layers” to the Talos root file system.
At its’ core the rootfs is a read-only squashfs.
The squashfs is then mounted as a loop device into memory.
This provides Talos with an immutable base.
The next layer is a set of tmpfs file systems for runtime specific needs.
Aside from the standard pseudo file systems such as /dev, /proc, /run, /sys and /tmp, a special /system is created for internal needs.
One reason for this is that we need special files such as /etc/hosts, and /etc/resolv.conf to be writable (remember that the rootfs is read-only).
For example, at boot Talos will write /system/etc/hosts and the bind mount it over /etc/hosts.
This means that instead of making all of /etc writable, Talos only makes very specific files writable under /etc.
All files under /system are completely reproducible.
For files and directories that need to persist across boots, Talos creates overlayfs file systems.
The /etc/kubernetes is a good example of this.
Directories like this are overlayfs backed by an XFS file system mounted at /var.
The /var directory is owned by Kubernetes with the exception of the above overlayfs file systems.
This directory is writable and used by etcd (in the case of control plane nodes), the kubelet, and the CRI (containerd).
9.4 - Components
In this section, we discuss the various components that underpin Talos.
Components
Component
Description
apid
When interacting with Talos, the gRPC API endpoint you interact with directly is provided by apid. apid acts as the gateway for all component interactions and forwards the requests to routerd.
containerd
An industry-standard container runtime with an emphasis on simplicity, robustness, and portability. To learn more, see the containerd website.
machined
Talos replacement for the traditional Linux init-process. Specially designed to run Kubernetes and does not allow starting arbitrary user services.
networkd
Handles all of the host level network configuration. The configuration is defined under the networking key
timed
Handles the host time synchronization by acting as a NTP-client.
kernel
The Linux kernel included with Talos is configured according to the recommendations outlined in the Kernel Self Protection Project.
routerd
Responsible for routing an incoming API request from apid to the appropriate backend (e.g. networkd, machined and timed).
trustd
To run and operate a Kubernetes cluster, a certain level of trust is required. Based on the concept of a ‘Root of Trust’, trustd is a simple daemon responsible for establishing trust within the system.
udevd
Implementation of eudev into machined. eudev is Gentoo’s fork of udev, systemd’s device file manager for the Linux kernel. It manages device nodes in /dev and handles all user space actions when adding or removing devices. To learn more, see the Gentoo Wiki.
apid
When interacting with Talos, the gRPC api endpoint you will interact with directly is apid.
Apid acts as the gateway for all component interactions.
Apid provides a mechanism to route requests to the appropriate destination when running on a control plane node.
We’ll use some examples below to illustrate what apid is doing.
When a user wants to interact with a Talos component via talosctl, there are two flags that control the interaction with apid.
The -e | --endpoints flag specifies which Talos node ( via apid ) should handle the connection.
Typically this is a public-facing server.
The -n | --nodes flag specifies which Talos node(s) should respond to the request.
If --nodes is omitted, the first endpoint will be used.
Note: Typically, there will be an endpoint already defined in the Talos config file.
Optionally, nodes can be included here as well.
For example, if a user wants to interact with machined, a command like talosctl -e cluster.talos.dev memory may be used.
$ talosctl -e cluster.talos.dev memory
NODE TOTAL USED FREE SHARED BUFFERS CACHE AVAILABLE
cluster.talos.dev 7938176823901455337246571
In this case, talosctl is interacting with apid running on cluster.talos.dev and forwarding the request to the machined api.
If we wanted to extend our example to retrieve memory from another node in our cluster, we could use the command talosctl -e cluster.talos.dev -n node02 memory.
$ talosctl -e cluster.talos.dev -n node02 memory
NODE TOTAL USED FREE SHARED BUFFERS CACHE AVAILABLE
node02 7938176823901455337246571
The apid instance on cluster.talos.dev receives the request and forwards it to apid running on node02, which forwards the request to the machined api.
We can further extend our example to retrieve memory for all nodes in our cluster by appending additional -n node flags or using a comma separated list of nodes ( -n node01,node02,node03 ):
$ talosctl -e cluster.talos.dev -n node01 -n node02 -n node03 memory
NODE TOTAL USED FREE SHARED BUFFERS CACHE AVAILABLE
node01 793887140711374929457042node02 25784414408190796181384952589227492node03 257844183025518612549777254556
The apid instance on cluster.talos.dev receives the request and forwards it to node01, node02, and node03, which then forwards the request to their local machined api.
containerd
Containerd provides the container runtime to launch workloads on Talos and Kubernetes.
Talos services are namespaced under the system namespace in containerd, whereas the Kubernetes services are namespaced under the k8s.io namespace.
machined
A common theme throughout the design of Talos is minimalism.
We believe strongly in the UNIX philosophy that each program should do one job well.
The init included in Talos is one example of this, and we are calling it “machined”.
We wanted to create a focused init that had one job - run Kubernetes.
To that extent, machined is relatively static in that it does not allow for arbitrary user-defined services.
Only the services necessary to run Kubernetes and manage the node are available.
This includes:
Networkd handles all of the host level network configuration.
The configuration is defined under the networking key.
By default, we attempt to issue a DHCP request for every interface on the server.
This can be overridden by supplying one of the following kernel arguments:
talos.network.interface.ignore - specify a list of interfaces to skip discovery on
ip - ip=<client-ip>:<server-ip>:<gw-ip>:<netmask>:<hostname>:<device>:<autoconf>:<dns0-ip>:<dns1-ip>:<ntp0-ip> as documented in the kernel here
The Linux kernel included with Talos is configured according to the recommendations outlined in the Kernel Self Protection Project (KSSP).
trustd
Security is one of the highest priorities within Talos.
To run a Kubernetes cluster, a certain level of trust is required to operate a cluster.
For example, orchestrating the bootstrap of a highly available control plane requires sensitive PKI data distribution.
To that end, we created trustd.
Based on a Root of Trust concept, trustd is a simple daemon responsible for establishing trust within the system.
Once trust is established, various methods become available to the trustee.
For example, it can accept a write request from another node to place a file on disk.
Additional methods and capabilities will be added to the trustd component to support new functionality in the rest of the Talos environment.
udevd
Udevd handles the kernel device notifications and sets up the necessary links in /dev.
9.5 - Upgrades
Talos
The upgrade process for Talos, like everything else, begins with an API call.
This call tells a node the installer image to use to perform the upgrade.
Each Talos version corresponds to an installer with the same version, such that the
version of the installer is the version of Talos which will be installed.
Because Talos is image based, even at run-time, upgrading Talos is almost
exactly the same set of operations as installing Talos, with the difference that
the system has already been initialized with a configuration.
An upgrade makes use of an A-B image scheme in order to facilitate rollbacks.
This scheme retains the one previous Talos kernel and OS image following each upgrade.
If an upgrade fails to boot, Talos will roll back to the previous version.
Likewise, Talos may be manually rolled back via API (or talosctl rollback).
This will simply update the boot reference and reboot.
An upgrade can preserve data or not.
If Talos is told to NOT preserve data, it will wipe its ephemeral partition, remove itself from the etcd cluster (if it is a control node), and generally make itself as pristine as is possible.
There are likely to be changes to the default option here over time, so if your setup has a preference to one way or the other, it is better to specify it explicitly, but we try to always be “safe” with this setting.
Sequence
When a Talos node receives the upgrade command, the first thing it does is cordon
itself in kubernetes, to avoid receiving any new workload.
It then starts to drain away its existing workload.
NOTE: If any of your workloads is sensitive to being shut down ungracefully, be sure to use the lifecycle.preStop Pod spec.
Once all of the workload Pods are drained, Talos will start shutting down its
internal processes.
If it is a control node, this will include etcd.
If preserve is not enabled, Talos will even leave etcd membership.
(Don’t worry about this; we make sure the etcd cluster is healthy and that it will remain healthy after our node departs, before we allow this to occur.)
Once all the processes are stopped and the services are shut down, all of the
filesystems will be unmounted.
This allows Talos to produce a very clean upgrade, as close as possible to a pristine system.
We verify the disk and then perform the actual image upgrade.
Finally, we tell the bootloader to boot once with the new kernel and OS image.
Then we reboot.
After the node comes back up and Talos verifies itself, it will make permanent
the bootloader change, rejoin the cluster, and finally uncordon itself to receive new workloads.
FAQs
Q. What happens if an upgrade fails?
A. There are many potential ways an upgrade can fail, but we always try to do
the safe thing.
The most common first failure is an invalid installer image reference.
In this case, Talos will fail to download the upgraded image and will abort the upgrade.
Sometimes, Talos is unable to successfully kill off all of the disk access points, in which case it cannot safely unmount all filesystems to effect the upgrade.
In this case, it will abort the upgrade and reboot.
It is possible (especially with test builds) that the upgraded Talos system will fail to start.
In this case, the node will be rebooted, and the bootloader will automatically use the previous Talos kernel and image, thus effectively aborting the upgrade.
Lastly, it is possible that Talos itself will upgrade successfully, start up, and rejoin the cluster but your workload will fail to run on it, for whatever reason.
This is when you would use the talosctl rollback command to revert back to the previous Talos version.
Q. Can upgrades be scheduled?
A. We provide the Talos Controller Manager to coordinate upgrades of a cluster.
Additionally, because the upgrade sequence is API-driven, you can easily tie this in to your own business logic to schedule and coordinate your upgrades.
Q. Can the upgrade process be observed?
A. The Talos Controller Manager does this internally, watching the logs of
the node being upgraded, using the streaming log API of Talos.
You can do the same thing using the talosctl logs --follow machined command.
Q. Are worker node upgrades handled differently from control plane node upgrades?
A. Short answer: no.
Long answer: Both node types follow the same set procedure.
However, since control plane nodes run additional services, such as etcd, there are some extra steps and checks performed on them.
From the user’s standpoint, however, the processes are identical.
There are also additional restrictions on upgrading control plane nodes.
For instance, Talos will refuse to upgrade a control plane node if that upgrade will cause a loss of quorum for etcd.
This can generally be worked around by setting preserve to true.
Q. Will an upgrade try to do the whole cluster at once?
Can I break my cluster by upgrading everything?
A. No.
Nothing prevents the user from sending any number of near-simultaneous upgrades to each node of the cluster.
While most people would not attempt to do this, it may be the desired behaviour in certain situations.
If, however, multiple control plane nodes are asked to upgrade at the same time, Talos will protect itself by making sure only one control plane node upgrades at any time, through its checking of etcd quorum.
A lease is taken out by the winning control plane node, and no other control plane node is allowed to execute the upgrade until the lease is released and the etcd cluster is healthy and will be healthy when the next node performs its upgrade.
Q. Is there an operator or controller which will keep my nodes updated
automatically?
A. Yes.
We provide the Talos Controller Manager to perform this maintenance in a simple, controllable fashion.
9.6 - FAQs
How is Talos different from other container optimized Linux distros?
Talos shares a lot of attributes with other distros, but there are some important differences.
Talos integrates tightly with Kubernetes, and is not meant to be a general-purpose operating system.
The most important difference is that Talos is fully controlled by an API via a gRPC interface, instead of an ordinary shell.
We don’t ship SSH, and there is no console access.
Removing components such as these has allowed us to dramatically reduce the footprint of Talos, and in turn, improve a number of other areas like security, predictability, reliability, and consistency across platforms.
It’s a big change from how operating systems have been managed in the past, but we believe that API-driven OSes are the future.
Why no shell or SSH?
Since Talos is fully API-driven, all maintenance and debugging operations should be possible via the OS API.
We would like for Talos users to start thinking about what a “machine” is in the context of a Kubernetes cluster.
That is, that a Kubernetes cluster can be thought of as one massive machine, and the nodes are merely additional, undifferentiated resources.
We don’t want humans to focus on the nodes, but rather on the machine that is the Kubernetes cluster.
Should an issue arise at the node level, talosctl should provide the necessary tooling to assist in the identification, debugging, and remedation of the issue.
However, the API is based on the Principle of Least Privilege, and exposes only a limited set of methods.
We envision Talos being a great place for the application of control theory in order to provide a self-healing platform.
Why the name “Talos”?
Talos was an automaton created by the Greek God of the forge to protect the island of Crete.
He would patrol the coast and enforce laws throughout the land.
We felt it was a fitting name for a security focused operating system designed to run Kubernetes.
Why does Talos query pool.ntp.org on boot even if configured to use a different time server?
When Talos boots, before the config is loaded, Talos performs a non-blocking attempt to sync the time with the default nameserver (pool.ntp.org).
This initial time sync is required if the node doesn’t have an RTC or the RTC is out of sync because TLS (e.g. HTTPS) requires time to be in sync for certificate validation.
As soon as the config is available, Talos starts syncing the time with the configured time server.
Time sync errors on initial boot can be safely ignored.
9.7 - talosctl
The talosctl tool packs a lot of power into a small package.
It acts as a reference implementation for the Talos API, but it also handles a lot of
conveniences for the use of Talos and its clusters.
Video Walkthrough
To see some live examples of talosctl usage, view the following video:
Client Configuration
Talosctl configuration is located in $XDG_CONFIG_HOME/talos/config.yaml if $XDG_CONFIG_HOME is defined.
Otherwise it is in $HOME/.talos/config.
The location can always be overridden by the TALOSCONFIG environment variable or the --talosconfig parameter.
Like kubectl, talosctl uses the concept of configuration contexts, so any number of Talos clusters can be managed with a single configuration file.
Unlike kubectl, it also comes with some intelligent tooling to manage the merging of new contexts into the config.
The default operation is a non-destructive merge, where if a context of the same name already exists in the file, the context to be added is renamed by appending an index number.
You can easily overwrite instead, as well.
See the talosctl config help for more information.
Endpoints and Nodes
The endpoints are the communication endpoints to which the client directly talks.
These can be load balancers, DNS hostnames, a list of IPs, etc.
Further, if multiple endpoints are specified, the client will automatically load
balance and fail over between them.
In general, it is recommended that these point to the set of control plane nodes, either directly or through a reverse proxy or load balancer.
Each endpoint will automatically proxy requests destined to another node through it, so it is not necessary to change the endpoint configuration just because you wish to talk to a different node within the cluster.
Endpoints do, however, need to be members of the same Talos cluster as the target node, because these proxied connections reply on certificate-based authentication.
The node is the target node on which you wish to perform the API call.
While you can configure the target node (or even set of target nodes) inside the ’talosctl’ configuration file, it is often useful to simply and explicitly declare the target node(s) using the -n or --nodes command-line parameter.
Keep in mind, when specifying nodes that their IPs and/or hostnames are as seen by the endpoint servers, not as from the client.
This is because all connections are proxied first through the endpoints.
Kubeconfig
The configuration for accessing a Talos Kubernetes cluster is obtained with talosctl.
By default, talosctl will safely merge the cluster into the default kubeconfig.
Like talosctl itself, in the event of a naming conflict, the new context name will be index-appended before insertion.
The --force option can be used to overwrite instead.
You can also specify an alternate path by supplying it as a positional parameter.
Thus, like Talos clusters themselves, talosctl makes it easy to manage any
number of kubernetes clusters from the same workstation.
Commands
Please see the CLI reference for the entire list of commands which are available from talosctl.
9.8 - Control Plane
This guide provides details on how Talos runs and bootstraps the Kubernetes control plane.
High-level Overview
Talos cluster bootstrap flow:
The etcd service is started on control plane nodes.
Instances of etcd on control plane nodes build the etcd cluster.
The kubelet service is started.
Control plane components are started as static pods via the kubelet, and the kube-apiserver component connects to the local (running on the same node) etcd instance.
The kubelet issues client certificate using the bootstrap token using the control plane endpoint (via kube-apiserver and kube-controller-manager).
The kubelet registers the node in the API server.
Kubernetes control plane schedules pods on the nodes.
Cluster Bootstrapping
All nodes start the kubelet service.
The kubelet tries to contact the control plane endpoint, but as it is not up yet, it keeps retrying.
One of the control plane nodes is chosen as the bootstrap node.
The node’s type can be either init or controlplane, where the controlplane type is promoted using the bootstrap API (talosctl bootstrap).
The bootstrap node initiates the etcd bootstrap process by initializing etcd as the first member of the cluster.
Note: there should be only one bootstrap node for the cluster lifetime.
Once etcd is bootstrapped, the bootstrap node has no special role and acts the same way as other control plane nodes.
Services etcd on non-bootstrap nodes try to get Endpoints resource via control plane endpoint, but that request fails as control plane endpoint is not up yet.
As soon as etcd is up on the bootstrap node, static pod definitions for the Kubernetes control plane components (kube-apiserver, kube-controller-manager, kube-scheduler) are rendered to disk.
The kubelet service on the bootstrap node picks up the static pod definitions and starts the Kubernetes control plane components.
As soon as kube-apiserver is launched, the control plane endpoint comes up.
The bootstrap node acquires an etcd mutex and injects the bootstrap manifests into the API server.
The set of the bootstrap manifests specify the Kubernetes join token and kubelet CSR auto-approval.
The kubelet service on all the nodes is now able to issue client certificates for themselves and register nodes in the API server.
Other bootstrap manifests specify additional resources critical for Kubernetes operations (i.e. CNI, PSP, etc.)
The etcd service on non-bootstrap nodes is now able to discover other members of the etcd cluster via the Kubernetes Endpoints resource.
The etcd cluster is now formed and consists of all control plane nodes.
All control plane nodes render static pod manifests for the control plane components.
Each node now runs a full set of components to make the control plane HA.
The kubelet service on worker nodes is now able to issue the client certificate and register itself with the API server.
Scaling Up the Control Plane
When new nodes are added to the control plane, the process is the same as the bootstrap process above: the etcd service discovers existing members of the control plane via the
control plane endpoint, joins the etcd cluster, and the control plane components are scheduled on the node.
Scaling Down the Control Plane
Scaling down the control plane involves removing a node from the cluster.
The most critical part is making sure that the node which is being removed leaves the etcd cluster.
When using talosctl reset command, the targeted control plane node leaves the etcd cluster as part of the reset sequence.
Upgrading Control Plane Nodes
When a control plane node is upgraded, Talos leaves etcd, wipes the system disk, installs a new version of itself, and reboots.
The upgraded node then joins the etcd cluster on reboot.
So upgrading a control plane node is equivalent to scaling down the control plane node followed by scaling up with a new version of Talos.
9.9 - Controllers and Resources
Talos implements concepts of resources and controllers to facilitate internal operations of the operating system.
Talos resources and controllers are very similar to Kubernetes resources and controllers, but there are some differences.
The content of this document is not required to operate Talos, but it is useful for troubleshooting.
Starting with Talos 0.9, most of the Kubernetes control plane boostrapping and operations is implemented via controllers and resources which allows Talos to be reactive to configuration changes, environment changes (e.g. time sync).
Resources
A resource captures a piece of system state.
Each resource belongs to a “Type” which defines resource contents.
Resource state can be split in two parts:
metadata: fixed set of fields describing resource - namespace, type, ID, etc.
spec: contents of the resource (depends on resource type).
Resource is uniquely identified by (namespace, type, id).
Namespaces provide a way to avoid conflicts on duplicate resource IDs.
At the moment of this writing, all resources are local to the node and stored in memory.
So on every reboot resource state is rebuilt from scratch (the only exception is MachineConfig resource which reflects current machine config).
Controllers
Controllers run as independent lightweight threads in Talos.
The goal of the controller is to reconcile the state based on inputs and eventually update outputs.
A controller can have any number of resource types (and namespaces) as inputs.
In other words, it watches specified resources for changes and reconciles when these changes occur.
A controller might also have additional inputs: running reconcile on schedule, watching etcd keys, etc.
A controller has a single output: a set of resources of fixed type in a fixed namespace.
Only one controller can manage resource type in the namespace, so conflicts are avoided.
Querying Resources
Talos CLI tool talosctl provides read-only access to the resource API which includes getting specific resource, listing resources and watching for changes.
Talos stores resources describing resource types and namespaces in meta namespace:
$ talosctl get resourcedefinitions
NODE NAMESPACE TYPE ID VERSION
172.20.0.2 meta ResourceDefinition bootstrapstatuses.v1alpha1.talos.dev 1172.20.0.2 meta ResourceDefinition etcdsecrets.secrets.talos.dev 1172.20.0.2 meta ResourceDefinition kubernetescontrolplaneconfigs.config.talos.dev 1172.20.0.2 meta ResourceDefinition kubernetessecrets.secrets.talos.dev 1172.20.0.2 meta ResourceDefinition machineconfigs.config.talos.dev 1172.20.0.2 meta ResourceDefinition machinetypes.config.talos.dev 1172.20.0.2 meta ResourceDefinition manifests.kubernetes.talos.dev 1172.20.0.2 meta ResourceDefinition manifeststatuses.kubernetes.talos.dev 1172.20.0.2 meta ResourceDefinition namespaces.meta.cosi.dev 1172.20.0.2 meta ResourceDefinition resourcedefinitions.meta.cosi.dev 1172.20.0.2 meta ResourceDefinition rootsecrets.secrets.talos.dev 1172.20.0.2 meta ResourceDefinition secretstatuses.kubernetes.talos.dev 1172.20.0.2 meta ResourceDefinition services.v1alpha1.talos.dev 1172.20.0.2 meta ResourceDefinition staticpods.kubernetes.talos.dev 1172.20.0.2 meta ResourceDefinition staticpodstatuses.kubernetes.talos.dev 1172.20.0.2 meta ResourceDefinition timestatuses.v1alpha1.talos.dev 1
$ talosctl get namespaces
NODE NAMESPACE TYPE ID VERSION
172.20.0.2 meta Namespace config 1172.20.0.2 meta Namespace controlplane 1172.20.0.2 meta Namespace extras 1172.20.0.2 meta Namespace meta 1172.20.0.2 meta Namespace runtime 1172.20.0.2 meta Namespace secrets 1
Most of the time namespace flag (--namespace) can be omitted, as ResourceDefinition contains default
namespace which is used if no namespace is given:
Resource definition also contains type aliases which can be used interchangeably with canonical resource name:
$ talosctl get ns config
NODE NAMESPACE TYPE ID VERSION
172.20.0.2 meta Namespace config 1
Output
Command talosctl get supports following output modes:
table (default) prints resource list as a table
yaml prints pretty formatted resources with details, including full metadata spec.
This format carries most details from the backend resource (e.g. comments in MachineConfig resource)
json prints same information as yaml, some additional details (e.g. comments) might be lost.
This format is useful for automated processing with tools like jq.
Watching Changes
If flag --watch is appended to the talosctl get command, the command switches to watch mode.
If list of resources was requested, talosctl prints initial contents of the list and then appends resource information
for every change:
$ talosctl get svc -w
NODE * NAMESPACE TYPE ID VERSION RUNNING HEALTHY
172.20.0.2 + runtime Service timed 2truetrue172.20.0.2 + runtime Service trustd 2truetrue172.20.0.2 + runtime Service udevd 2truetrue172.20.0.2 - runtime Service timed 2truetrue172.20.0.2 + runtime Service timed 1truefalse172.20.0.2 runtime Service timed 2truetrue
Column * specifies event type:
+ is created
- is deleted
is updated
In YAML/JSON output, field event is added to the resource representation to describe the event type.
Examples
Getting machine config:
$ talosctl get machineconfig -o yaml
node: 172.20.0.2
metadata:
namespace: config
type: MachineConfigs.config.talos.dev
id: v1alpha1
version: 2 phase: running
spec:
version: v1alpha1 # Indicates the schema used to decode the contents. debug: false# Enable verbose logging to the console. persist: true# Indicates whether to pull the machine config upon every boot.# Provides machine specific configuration options....
Getting control plane static pod statuses:
$ talosctl get staticpodstatus
NODE NAMESPACE TYPE ID VERSION READY
172.20.0.2 controlplane StaticPodStatus kube-system/kube-apiserver-talos-default-master-1 3 True
172.20.0.2 controlplane StaticPodStatus kube-system/kube-controller-manager-talos-default-master-1 3 True
172.20.0.2 controlplane StaticPodStatus kube-system/kube-scheduler-talos-default-master-1 4 True