What is Talos: a quick description of Talos
Quickstart: the fastest way to get a Talos cluster up and running
Getting Started: a long-form, guided tour of getting a full Talos cluster deployed

Open Source

Community

GitHub: repo
Slack: Join our slack channel
Support: Questions, bugs, feature requests GitHub Discussions
Forum: community
Twitter: @SideroLabs
Email: info@SideroLabs.com

If you’re interested in this project and would like to help in engineering efforts, or have general usage questions, we are happy to have you! We hold a weekly meeting that all audiences are welcome to attend.

We would appreciate your feedback so that we can make Talos even better! To do so, you can take our survey.

Office Hours

When: Mondays at 16:30 UTC.
Where: Google Meet.

You can subscribe to this meeting by joining the community forum above.

Enterprise

If you are using Talos in a production setting, and need consulting services to get started or to integrate Talos into your existing environment, we can help. Sidero Labs, Inc. offers support contracts with SLA (Service Level Agreement)-bound terms for mission-critical environments.

Learn More

1 - Introduction

1.1 - What is Talos?

Talos is a container optimized Linux distro; a reimagining of Linux for distributed systems such as Kubernetes. Designed to be as minimal as possible while still maintaining practicality. For these reasons, Talos has a number of features unique to it:

it is immutable
it is atomic
it is ephemeral
it is minimal
it is secure by default
it is managed via a single declarative configuration file and gRPC API

Talos can be deployed on container, cloud, virtualized, and bare metal platforms.

Why Talos

In having less, Talos offers more. Security. Efficiency. Resiliency. Consistency.

All of these areas are improved simply by having less.

1.2 - Quickstart

The easiest way to try Talos is by using the CLI (talosctl) to create a cluster on a machine with docker installed.

Prerequisites

`talosctl`

Download talosctl:

curl -Lo /usr/local/bin/talosctl https://github.com/talos-systems/talos/releases/latest/download/talosctl-$(uname -s | tr "[:upper:]" "[:lower:]")-amd64
chmod +x /usr/local/bin/talosctl

`kubectl`

Download kubectl via one of methods outlined in the documentation.

Create the Cluster

Now run the following:

talosctl cluster create

Verify that you can reach Kubernetes:

$ kubectl get nodes -o wide
NAME                     STATUS   ROLES    AGE    VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE         KERNEL-VERSION   CONTAINER-RUNTIME
talos-default-master-1   Ready    master   115s   v1.20.2   10.5.0.2      <none>        Talos (v0.9.0)   <host kernel>    containerd://1.4.3
talos-default-worker-1   Ready    <none>   115s   v1.20.2   10.5.0.3      <none>        Talos (v0.9.0)   <host kernel>    containerd://1.4.3

Destroy the Cluster

When you are all done, remove the cluster:

talosctl cluster destroy

1.3 - Getting Started

This document will walk you through installing a full Talos Cluster. You may wish to read through the Quickstart first, to quickly create a local virtual cluster on your workstation.

Regardless of where you run Talos, you will find that there is a pattern to deploying it.

In general you will need to:

acquire the installation image
decide on the endpoint for Kubernetes
- optionally create a load balancer
configure Talos
configure talosctl
bootstrap Kubernetes

Prerequisites

`talosctl`

The talosctl tool provides a CLI tool which interfaces with the Talos API in an easy manner. It also includes a number of useful tools for creating and managing your clusters.

You should install talosctl before continuing:

curl -Lo /usr/local/bin/talosctl https://github.com/talos-systems/talos/releases/latest/download/talosctl-$(uname -s | tr "[:upper:]" "[:lower:]")-amd64
chmod +x /usr/local/bin/talosctl

Acquire the installation image

The easiest way to install Talos is to use the ISO image.

The latest ISO image can be found on the Github Releases page:

For self-built media and network booting, you can use the kernel and initramfs:

When booted from the ISO, Talos will run in RAM, and it will not install itself until it is provided a configuration. Thus, it is safe to boot the ISO onto any machine.

Alternative Booting

If you wish to use a different boot mechanism (such as network boot or a custom ISO), there are a number of required kernel parameters.

Please see the kernel docs for more information.

Decide the Kubernetes Endpoint

In order to configure Kubernetes and bootstrap the cluster, Talos needs to know what the endpoint (DNS name or IP address) of the Kubernetes API Server will be.

The endpoint should be the fully-qualified HTTP(S) URL for the Kubernetes API Server, which (by default) runs on port 6443 using HTTPS.

Thus, the format of the endpoint may be something like:

https://192.168.0.10:6443
https://kube.mycluster.mydomain.com:6443
https://[2001:db8:1234::80]:6443

Because the Kubernetes controlplane is meant to be supplied in a high availability manner, we must also choose how to bind it to the servers themselves. There are three common ways to do this.

Dedicated Load-balancer

If you are using a cloud provider or have your own load-balancer available (such as HAProxy, nginx reverse proxy, or an F5 load-balancer), using a dedicated load balancer is a natural choice. Just create an appropriate frontend matching the endpoint, and point the backends at each of the addresses of the Talos controlplane nodes.

This is convenient if a load-balancer is available, but don’t worry if that is not the case.

Layer 2 Shared IP

Talos has integrated support for serving Kubernetes from a shared (sometimes called “virtual”) IP address. This method relies on OSI Layer 2 connectivity between controlplane Talos nodes.

In this case, we may choose an IP address on the same subnet as the Talos controlplane nodes which is not otherwise assigned to any machine. For instance, if your controlplane node IPs are:

192.168.0.10
192.168.0.11
192.168.0.12

You could choose the ip 192.168.0.15 as your shared IP address. Just make sure that 192.168.0.15 is not used by any other machine and that your DHCP will not serve it to any other machine.

Once chosen, form the full HTTPS URL from this IP:

https://192.168.0.15:6443

You are also free to set a DNS record to this IP address instead, but you will still need to use the IP address to set up the shared IP (machine.network.interfaces[].vip.ip) inside the Talos configuration.

For more information about using a shared IP, see the related Guide

DNS records

If neither of the other methods work for you, you can instead use DNS records to provide a measure of redundancy. In this case, you would add multiple A or AAAA records for a DNS name.

For instance, you could add:

kube.cluster1.mydomain.com  IN  A  192.168.0.10
kube.cluster1.mydomain.com  IN  A  192.168.0.11
kube.cluster1.mydomain.com  IN  A  192.168.0.12

Then, your endpoint would be:

https://kube.cluster1.mydomain.com:6443

Decide how to access the Talos API

Since Talos is entirely API-driven, it is important to know how you are going to access that API. Talos comes with a number of mechanisms to make that easier.

Controlplane nodes can proxy requests for worker nodes. This means that you only need access to the controlplane nodes in order to access the rest of the network. This is useful for security (your worker nodes do not need to have public IPs or be otherwise connected to the Internet), and it also makes working with highly-variable clusters easier, since you only need to know the controlplane nodes in advance.

Even better, the talosctl tool will automatically load balance and fail over between all of your controlplane nodes, so long as it is informed of each of the controlplane node IPs.

That does, of course, present the problem that you need to know how to talk to the controlplane nodes. In some environments, it is easy to be able to forecast, prescribe, or discover the controlplane node IP addresses. For others, though, even the controlplane nodes are dynamic, unpredictable, and undiscoverable.

The dynamic options above for the Kubernetes API endpoint also apply to the Talos API endpoints. The difference is that the Talos API runs on port 50000/tcp.

Whichever way you wish to access the Talos API, be sure to note the IP(s) or hostname(s) so that you can configure your talosctl tool’s endpoints below.

Configure Talos

When Talos boots without a configuration, such as when using the Talos ISO, it enters a limited maintenance mode and waits for a configuration to be provided.

Alternatively, the Talos installer can be booted with the talos.config kernel commandline argument set to an HTTP(s) URL from which it should receive its configuration. In cases where a PXE server can be available, this is much more efficient than manually configuring each node. If you do use this method, just note that Talos does require a number of other kernel commandline parameters. See the required kernel parameters for more information.

In either case, we need to generate the configuration which is to be provided. Luckily, the talosctl tool comes with a configuration generator for exactly this purpose.

  talosctl gen config "cluster-name" "cluster-endpoint"

Here, cluster-name is an arbitrary name for the cluster which will be used in your local client configuration as a label. It does not affect anything in the cluster itself. It is arbitrary, but it should be unique in the configuration on your local workstation.

The cluster-endpoint is where you insert the Kubernetes Endpoint you selected from above. This is the Kubernetes API URL, and it should be a complete URL, with https:// and port, if not 443. The default port is 6443, so the port is almost always required.

When you run this command, you will receive a number of files in your current directory:

controlplane.yaml
init.yaml
join.yaml
talosconfig

The three .yaml files are what we call Machine Configs. They are installed onto the Talos servers to act as their complete configuration, describing everything from what disk Talos should be installed to, to what sysctls to set, to what network settings it should have. In the case of the controlplane.yaml and init.yaml, it even describes how Talos should form its Kubernetes cluster.

The talosconfig file (which is also YAML) is your local client configuration file.

Controlplane, Init, and Join

The three types of Machine Configs correspond to the three roles of Talos nodes. For our purposes, you can ignore the Init type. It is a legacy type which will go away eventually. Its purpose was to self-bootstrap. Instead, we now use an API call to bootstrap the cluster, which is much more robust.

That leaves us with Controlplane and Join.

The Controlplane Machine Config describes the configuration of a Talos server on which the Kubernetes Controlplane should run. The Join Machine Config describes everything else: workload servers.

The main difference between Controlplane Machine Config files and Join Machine Config files is that the former contains information about how to form the Kubernetes cluster.

Templates

The generated files can be thought of as templates. Individual machines may need specific settings (for instance, each may have a different static IP address). When different files are needed for machines of the same type, simply copy the source template (controlplane.yaml or join.yaml) and make whatever modifications need to be done.

For instance, if you had three controlplane nodes and three worker nodes, you may do something like this:

  for i in $(seq 0 2); do
    cp controlplane.yaml cp$i.yaml
  end
  for i in $(seq 0 2); do
    cp join.yaml w$i.yaml
  end

In cases where there is no special configuration needed, you may use the same file for each machine of the same type.

Apply Configuration

After you have generated each machine’s Machine Config, you need to load them into the mahines themselves. For that, you need to know their IP addresses.

If you have access to the console or console logs of the machines, you can read them to find the IP address(es). Talos will print them out during the boot process:

[    4.605369] [talos] task loadConfig (1/1): this machine is reachable at:
[    4.607358] [talos] task loadConfig (1/1):   192.168.0.2
[    4.608766] [talos] task loadConfig (1/1): server certificate fingerprint:
[    4.611106] [talos] task loadConfig (1/1):   xA9a1t2dMxB0NJ0qH1pDzilWbA3+DK/DjVbFaJBYheE=
[    4.613822] [talos] task loadConfig (1/1):
[    4.614985] [talos] task loadConfig (1/1): upload configuration using talosctl:
[    4.616978] [talos] task loadConfig (1/1):   talosctl apply-config --insecure --nodes 192.168.0.2 --file <config.yaml>
[    4.620168] [talos] task loadConfig (1/1): or apply configuration using talosctl interactive installer:
[    4.623046] [talos] task loadConfig (1/1):   talosctl apply-config --insecure --nodes 192.168.0.2 --interactive
[    4.626365] [talos] task loadConfig (1/1): optionally with node fingerprint check:
[    4.628692] [talos] task loadConfig (1/1):   talosctl apply-config --insecure --nodes 192.168.0.2 --cert-fingerprint 'xA9a1t2dMxB0NJ0qH1pDzilWbA3+DK/DjVbFaJBYheE=' --file <config.yaml>

If you do not have console access, the IP address may also be discoverable from your DHCP server.

Once you have the IP address, you can then apply the correct configuration.

  talosctl apply-config --insecure \
    --nodes 192.168.0.2 \
    --file cp0.yaml

The insecure flag is necessary at this point because the PKI infrastructure has not yet been made available to the node. Note that the connection will be encrypted, it is just unauthenticated.

If you have console access, though, you can extract the server certificate fingerprint and use it for an additional layer of validation:

  talosctl apply-config --insecure \
    --nodes 192.168.0.2 \
    --cert-fingerprint xA9a1t2dMxB0NJ0qH1pDzilWbA3+DK/DjVbFaJBYheE= \
    --file cp0.yaml

Using the fingerprint allows you to be sure you are sending the configuration to the right machine, but it is completely optional.

After the configuration is applied to a node, it will reboot.

You may repeat this process for each of the nodes in your cluster.

Configure your talosctl client

Now that the nodes are running Talos with its full PKI security suite, you need to use that PKI to talk to the machines. That means configuring your client, and that is what that talosconfig file is for.

Endpoints

Endpoints are the communication endpoints to which the client directly talks. These can be load balancers, DNS hostnames, a list of IPs, etc. In general, it is recommended that these point to the set of control plane nodes, either directly or through a reverse proxy or load balancer.

Each endpoint will automatically proxy requests destined to another node through it, so it is not necessary to change the endpoint configuration just because you wish to talk to a different node within the cluster.

Endpoints do, however, need to be members of the same Talos cluster as the target node, because these proxied connections reply on certificate-based authentication.

We need to set the endpoints in your talosconfig. talosctl will automatically load balance and fail over among the endpoints, so no external load balancer or DNS abstraction is required (though you are free to use them, if desired).

As an example, if the IP addresses of our controlplane nodes are:

192.168.0.2
192.168.0.3
192.168.0.4

We would set those in the talosconfig with:

  talosctl --talosconfig=./talosconfig \
    config endpoint 192.168.0.2 192.168.0.3 192.168.0.4

Nodes

The node is the target node on which you wish to perform the API call.

Keep in mind, when specifying nodes that their IPs and/or hostnames are as seen by the endpoint servers, not as from the client. This is because all connections are proxied first through the endpoints.

Some people also like to set a default set of nodes in the talosconfig. This can be done in the same manner, replacing endpoint with node. If you do this, however, know that you could easily reboot the wrong machine by forgetting to declare the right one explicitly. Worse, if you set several nodes as defaults, you could, with one talosctl upgrade command upgrade your whole cluster all at the same time. It’s a powerful tool, and with that comes great responsibility. The author of this document does not set a default node.

You may simply provide -n or --nodes to any talosctl command to supply the node or (comma-delimited) nodes on which you wish to perform the operation. Supplying the commandline parameter will override any default nodes in the configuration file.

To verify default node(s) you’re currently configured to use, you can run:

$ talosctl version
Client:
        ...
Server:
        NODE:        <node>
        ...

For a more in-depth discussion of Endpoints and Nodes, please see talosctl.

Default configuration file

You can reference which configuration file to use directly with the --talosconfig parameter:

  talosctl --talosconfig=./talosconfig \
    --nodes 192.168.0.2 version

However, talosctl comes with tooling to help you integrate and merge this configuration into the default talosctl configuration file. This is done with the merge option.

  talosctl config merge ./talosconfig

This will merge your new talosconfig into the default configuration file ($XDG_CONFIG_HOME/talos/config.yaml), creating it if necessary. Like Kubernetes, the talosconfig configuration files has multiple “contexts” which correspond to multiple clusters. The <cluster-name> you chose above will be used as the context name.

Kubernetes Bootstrap

All of your machines are configured, and your talosctl client is set up. Now, you are ready to bootstrap your Kubernetes cluster. If that sounds daunting, you haven’t used Talos before.

Bootstrapping your Kubernetes cluster with Talos is as simple as:

  talosctl bootstrap --nodes 192.168.0.2

The IP there can be any of your controlplanes (or the loadbalancer, if you have one). It should only be issued once.

At this point, Talos will form an etcd cluster, generate all of the core Kubernetes assets, and start the Kubernetes controlplane components.

After a few moments, you will be able to download your Kubernetes client configuration and get started:

  talosctl kubeconfig

Running this command will add (merge) you new cluster into you local Kubernetes configuration in the same way as talosctl config merge merged the Talos client configuration into your local Talos client configuration file.

If you would prefer for the configuration to not be merged into your default Kubernetes configuration file, simple tell it a filename:

  talosctl kubeconfig alternative-kubeconfig

If all goes well, you should now be able to connect to Kubernetes and see your nodes:

  kubectl get nodes

1.4 - System Requirements

Minimum Requirements

Role	Memory	Cores
Init/Control Plane	2GB	2
Worker	1GB	1

Role	Memory	Cores
Init/Control Plane	4GB	4
Worker	2GB	2

1.5 - What's New in Talos 0.9

Control Plane as Static Pods

Talos now runs the Kubernetes control plane as static pods managed via machine configuration. This change makes the bootstrap process much more stable and resilient to failures. For single control plane node clusters it eliminates bugs with the control plane being unavailable after a reboot. As the control plane configuration is managed via the Talos API, even if the control plane configuration was wrong and the API server is not available, the change can be rolled back using talosctl to bring the control plane back up. When upgrading from Talos 0.8, control plane can be converted to run as static pods.

ECDSA Certificates and Keys for Kubernetes

Talos now generates uses ECDSA keys for Kubernetes and etcd PKI. ECDSA keys are much smaller than RSA keys and all PKI operations are much faster (for example, generating a certificate from the CA) which leads to much faster bootstrap and boot times.

Immediate Machine Configuration Updates

Changes to the .cluster part of Talos machine configuration can now be applied immediately (without a reboot). This allows, for example, updating versions of control plane components, adding additional arguments or modifying bootstrap manifests. Future versions of Talos will expand on this to allow most of the machine configuration to be applied without a reboot.

Disk Encryption

Talos now supports encryption for STATE and EPHEMERAL partitions of the system disk. The STATE partition holds machine configuration and the EPHEMERAL partition is mounted as /var which stores container runtime state, and configuration files laid on top of Talos read-only immutable root filesystem. The encryption key in Talos 0.9 is derived from the Node UUID which is a unique machine identifier provided by the manufacturer. Disk encryption is not enabled by default: it needs to be enabled via machine configuration.

Virtual IP for the Control Plane Endpoint

Talos adds support for Virtual L2 shared IP for the control plane: control plane nodes ensure only one of the nodes advertise the shared IP via ARP. If one of the control plane nodes goes down, another node takes over the shared IP.

Updated Components

Linux: 5.10.1 -> 5.10.19

Kubernetes: 1.20.1 -> 1.20.5

CoreDNS: 1.7.0 -> 1.8.0

etcd: 3.4.14 -> 3.4.15

containerd: 1.4.3 -> 1.4.4

2 - Bare Metal Platforms

2.1 - Digital Rebar

In this guide we will create an Kubernetes cluster with 1 worker node, and 2 controlplane nodes using an existing digital rebar deployment.

Prerequisites

3 nodes (please see hardware requirements)
Loadbalancer
Digital Rebar Server
Talosctl access (see talosctl setup)

Creating a Cluster

In this guide we will create an Kubernetes cluster with 1 worker node, and 2 controlplane nodes. We assume an existing digital rebar deployment, and some familiarity with iPXE.

We leave it up to the user to decide if they would like to use static networking, or DHCP. The setup and configuration of DHCP will not be covered.

Create the Machine Configuration Files

Generating Base Configurations

Using the DNS name of the load balancer, generate the base configuration files for the Talos machines:

$ talosctl gen config talos-k8s-metal-tutorial https://<load balancer IP or DNS>:<port>
created init.yaml
created controlplane.yaml
created join.yaml
created talosconfig

The loadbalancer is used to distribute the load across multiple controlplane nodes. This isn’t covered in detail, because we asume some loadbalancing knowledge before hand. If you think this should be added to the docs, please create a issue.

At this point, you can modify the generated configs to your liking.

Validate the Configuration Files

$ talosctl validate --config init.yaml --mode metal
init.yaml is valid for metal mode
$ talosctl validate --config controlplane.yaml --mode metal
controlplane.yaml is valid for metal mode
$ talosctl validate --config join.yaml --mode metal
join.yaml is valid for metal mode

Publishing the Machine Configuration Files

Digital Rebar has a build-in fileserver, which means we can use this feature to expose the talos configuration files. We will place init.yaml, controlplane.yaml, and worker.yaml into Digital Rebar file server by using the drpcli tools.

Copy the generated files from the step above into your Digital Rebar installation.

drpcli file upload <file>.yaml as <file>.yaml

Replacing <file> with init, controlplane or worker.

Download the boot files

Download a recent version of boot.tar.gz from github.

Upload to DRB:

$ drpcli isos upload boot.tar.gz as talos.tar.gz
{
  "Path": "talos.tar.gz",
  "Size": 96470072
}

We have some Digital Rebar example files in the Git repo you can use to provision Digital Rebar with drpcli.

To apply these configs you need to create them, and then apply them as follow:

$ drpcli bootenvs create talos
{
  "Available": true,
  "BootParams": "",
  "Bundle": "",
  "Description": "",
  "Documentation": "",
  "Endpoint": "",
  "Errors": [],
  "Initrds": [],
  "Kernel": "",
  "Meta": {},
  "Name": "talos",
  "OS": {
    "Codename": "",
    "Family": "",
    "IsoFile": "",
    "IsoSha256": "",
    "IsoUrl": "",
    "Name": "",
    "SupportedArchitectures": {},
    "Version": ""
  },
  "OnlyUnknown": false,
  "OptionalParams": [],
  "ReadOnly": false,
  "RequiredParams": [],
  "Templates": [],
  "Validated": true
}

drpcli bootenvs update talos - < bootenv.yaml

You need to do this for all files in the example directory. If you don’t have access to the drpcli tools you can also use the webinterface.

It’s important to have a corresponding SHA256 hash matching the boot.tar.gz

Bootenv BootParams

We’re using some of Digital Rebar build in templating to make sure the machine gets the correct role assigned.

talos.platform=metal talos.config={{ .ProvisionerURL }}/files/{{.Param \"talos/role\"}}.yaml"

This is why we also include a params.yaml in the example directory to make sure the role is set to one of the following:

controlplane
init
worker

The {{.Param \"talos/role\"}} then gets populated with one of the above roles.

Boot the Machines

In the UI of Digital Rebar you need to select the machines you want te provision. Once selected, you need to assign to following:

Profile
Workflow

This will provision the Stage and Bootenv with the talos values. Once this is done, you can boot the machine.

To understand the boot process, we have a higher level overview located at metal overview.

Retrieve the `kubeconfig`

Once everything is running we can retrieve the admin kubeconfig by running:

talosctl --talosconfig talosconfig config endpoint <control plane 1 IP>
talosctl --talosconfig talosconfig config node <control plane 1 IP>
talosctl --talosconfig talosconfig kubeconfig .

2.2 - Equinix Metal

Creating Talos cluster using Equinix Metal.

Prerequisites

This guide assumes the user has a working API token, the Equinix Metal CLI installed, and some familiarity with the CLI.

Network Booting

To install Talos to a server a working TFTP and iPXE server are needed. How this is done varies and is left as an exercise for the user. In general this requires a Talos kernel vmlinuz and initramfs. These assets can be downloaded from a given release.

Special Considerations

PXE Boot Kernel Parameters

The following is a list of kernel parameters required by Talos:

talos.platform: set this to packet
init_on_alloc=1: required by KSPP
slab_nomerge: required by KSPP
pti=on: required by KSPP

User Data

To configure a Talos you can use the metadata service provide by Equinix Metal. It is required to add a shebang to the top of the configuration file. The shebang is arbitrary in the case of Talos, and the convention we use is #!talos.

Creating a Cluster via the Equinix Metal CLI

Control Plane Endpoint

The strategy used for an HA cluster varies and is left as an exercise for the user. Some of the known ways are:

DNS
Load Balancer
BGP

Create the Machine Configuration Files

Generating Base Configurations

Using the DNS name of the loadbalancer created earlier, generate the base configuration files for the Talos machines:

$ talosctl gen config talos-k8s-aws-tutorial https://<load balancer IP or DNS>:<port>
created init.yaml
created controlplane.yaml
created join.yaml
created talosconfig

Now add the required shebang (e.g. #!talos) at the top of init.yaml, controlplane.yaml, and join.yaml At this point, you can modify the generated configs to your liking.

Validate the Configuration Files

talosctl validate --config init.yaml --mode metal
talosctl validate --config controlplane.yaml --mode metal
talosctl validate --config join.yaml --mode metal

Note: Validation of the install disk could potentially fail as the validation is performed on you local machine and the specified disk may not exist.

Create the Bootstrap Node

packet device create \
  --project-id $PROJECT_ID \
  --facility $FACILITY \
  --ipxe-script-url $PXE_SERVER \
  --operating-system "custom_ipxe" \
  --plan $PLAN\
  --hostname $HOSTNAME\
  --userdata-file init.yaml

Create the Remaining Control Plane Nodes

packet device create \
  --project-id $PROJECT_ID \
  --facility $FACILITY \
  --ipxe-script-url $PXE_SERVER \
  --operating-system "custom_ipxe" \
  --plan $PLAN\
  --hostname $HOSTNAME\
  --userdata-file controlplane.yaml

Note: The above should be invoked at least twice in order for etcd to form quorum.

Create the Worker Nodes

packet device create \
  --project-id $PROJECT_ID \
  --facility $FACILITY \
  --ipxe-script-url $PXE_SERVER \
  --operating-system "custom_ipxe" \
  --plan $PLAN\
  --hostname $HOSTNAME\
  --userdata-file join.yaml

Retrieve the `kubeconfig`

At this point we can retrieve the admin kubeconfig by running:

talosctl --talosconfig talosconfig config endpoint <control plane 1 IP>
talosctl --talosconfig talosconfig config node <control plane 1 IP>
talosctl --talosconfig talosconfig kubeconfig .

2.3 - Matchbox

In this guide we will create an HA Kubernetes cluster with 3 worker nodes using an existing load balancer and matchbox deployment.

Creating a Cluster

In this guide we will create an HA Kubernetes cluster with 3 worker nodes. We assume an existing load balancer, matchbox deployment, and some familiarity with iPXE.

We leave it up to the user to decide if they would like to use static networking, or DHCP. The setup and configuration of DHCP will not be covered.

Create the Machine Configuration Files

Generating Base Configurations

Using the DNS name of the load balancer, generate the base configuration files for the Talos machines:

$ talosctl gen config talos-k8s-metal-tutorial https://<load balancer IP or DNS>:<port>
created init.yaml
created controlplane.yaml
created join.yaml
created talosconfig

At this point, you can modify the generated configs to your liking.

Validate the Configuration Files

$ talosctl validate --config init.yaml --mode metal
init.yaml is valid for metal mode
$ talosctl validate --config controlplane.yaml --mode metal
controlplane.yaml is valid for metal mode
$ talosctl validate --config join.yaml --mode metal
join.yaml is valid for metal mode

Publishing the Machine Configuration Files

In bare-metal setups it is up to the user to provide the configuration files over HTTP(S). A special kernel parameter (talos.config) must be used to inform Talos about where it should retreive its’ configuration file. To keep things simple we will place init.yaml, controlplane.yaml, and join.yaml into Matchbox’s assets directory. This directory is automatically served by Matchbox.

Create the Matchbox Configuration Files

The profiles we will create will reference vmlinuz, and initramfs.xz. Download these files from the release of your choice, and place them in /var/lib/matchbox/assets.

Profiles

The Bootstrap Node

{
  "id": "init",
  "name": "init",
  "boot": {
    "kernel": "/assets/vmlinuz",
    "initrd": ["/assets/initramfs.xz"],
    "args": [
      "initrd=initramfs.xz",
      "init_on_alloc=1",
      "slab_nomerge",
      "pti=on",
      "console=tty0",
      "console=ttyS0",
      "printk.devkmsg=on",
      "talos.platform=metal",
      "talos.config=http://matchbox.talos.dev/assets/init.yaml"
    ]
  }
}

Note: Be sure to change http://matchbox.talos.dev to the endpoint of your matchbox server.

Additional Control Plane Nodes

{
  "id": "control-plane",
  "name": "control-plane",
  "boot": {
    "kernel": "/assets/vmlinuz",
    "initrd": ["/assets/initramfs.xz"],
    "args": [
      "initrd=initramfs.xz",
      "init_on_alloc=1",
      "slab_nomerge",
      "pti=on",
      "console=tty0",
      "console=ttyS0",
      "printk.devkmsg=on",
      "talos.platform=metal",
      "talos.config=http://matchbox.talos.dev/assets/controlplane.yaml"
    ]
  }
}

Worker Nodes

{
  "id": "default",
  "name": "default",
  "boot": {
    "kernel": "/assets/vmlinuz",
    "initrd": ["/assets/initramfs.xz"],
    "args": [
      "initrd=initramfs.xz",
      "init_on_alloc=1",
      "slab_nomerge",
      "pti=on",
      "console=tty0",
      "console=ttyS0",
      "printk.devkmsg=on",
      "talos.platform=metal",
      "talos.config=http://matchbox.talos.dev/assets/join.yaml"
    ]
  }
}

Groups

Now, create the following groups, and ensure that the selectors are accurate for your specific setup.

{
  "id": "control-plane-1",
  "name": "control-plane-1",
  "profile": "init",
  "selector": {
    ...
  }
}

{
  "id": "control-plane-2",
  "name": "control-plane-2",
  "profile": "control-plane",
  "selector": {
    ...
  }
}

{
  "id": "control-plane-3",
  "name": "control-plane-3",
  "profile": "control-plane",
  "selector": {
    ...
  }
}

{
  "id": "default",
  "name": "default",
  "profile": "default"
}

Boot the Machines

Now that we have our configuraton files in place, boot all the machines. Talos will come up on each machine, grab its’ configuration file, and bootstrap itself.

Retrieve the `kubeconfig`

At this point we can retrieve the admin kubeconfig by running:

talosctl --talosconfig talosconfig config endpoint <control plane 1 IP>
talosctl --talosconfig talosconfig config node <control plane 1 IP>
talosctl --talosconfig talosconfig kubeconfig .

2.4 - Sidero

Sidero is a project created by the Talos team that has native support for Talos.

Sidero is a project created by the Talos team that has native support for Talos. The best way to get started with Sidero is to visit the website.

3 - Virtualized Platforms

3.1 - Hyper-V

Talos is known to work on Hyper-V; however, it is currently undocumented.

3.2 - KVM

Talos is known to work on KVM; however, it is currently undocumented.

3.3 - Proxmox

Creating Talos Kubernetes cluster using Proxmox.

In this guide we will create a Kubernetes cluster using Proxmox.

Video Walkthrough

To see a live demo of this writeup, visit Youtube here:

Installation

How to Get Proxmox

It is assumed that you have already installed Proxmox onto the server you wish to create Talos VMs on. Visit the Proxmox downloads page if necessary.

Install talosctl

You can download talosctl via github.com/talos-systems/talos/releases

curl https://github.com/siderolabs/talos/releases/download/<version>/talosctl-<platform>-<arch> -L -o talosctl

For example version v0.9.0 for linux platform:

curl https://github.com/talos-systems/talos/releases/latest/download/talosctl-linux-amd64 -L -o talosctl
sudo cp talosctl /usr/local/bin
sudo chmod +x /usr/local/bin/talosctl

Download ISO Image

In order to install Talos in Proxmox, you will need the ISO image from the Talos release page. You can download talos-amd64.iso via github.com/talos-systems/talos/releases

mkdir -p _out/
curl https://github.com/siderolabs/talos/releases/download/<version>/talos-<arch>.iso -L -o _out/talos-<arch>.iso

For example version v0.9.0 for linux platform:

mkdir -p _out/
curl https://github.com/talos-systems/talos/releases/latest/download/talos-amd64.iso -L -o _out/talos-amd64.iso

Upload ISO

From the Proxmox UI, select the “local” storage and enter the “Content” section. Click the “Upload” button:

Select the ISO you downloaded previously, then hit “Upload”

Create VMs

Start by creating a new VM by clicking the “Create VM” button in the Proxmox UI:

Fill out a name for the new VM:

In the OS tab, select the ISO we uploaded earlier:

Keep the defaults set in the “System” tab.

Keep the defaults in the “Hard Disk” tab as well, only changing the size if desired.

In the “CPU” section, give at least 2 cores to the VM:

Verify that the RAM is set to at least 2GB:

Keep the default values for networking, verifying that the VM is set to come up on the bridge interface:

Finish creating the VM by clicking through the “Confirm” tab and then “Finish”.

Repeat this process for a second VM to use as a worker node. You can also repeat this for additional nodes desired.

Start Control Plane Node

Once the VMs have been created and updated, start the VM that will be the first control plane node. This VM will boot the ISO image specified earlier and enter “maintenance mode”. Once the machine has entered maintenance mode, there will be a console log that details the IP address that the node received. Take note of this IP address, which will be referred to as $CONTROL_PLANE_IP for the rest of this guide. If you wish to export this IP as a bash variable, simply issue a command like export CONTROL_PLANE_IP=1.2.3.4.

Generate Machine Configurations

With the IP address above, you can now generate the machine configurations to use for installing Talos and Kubernetes. Issue the following command, updating the output directory, cluster name, and control plane IP as you see fit:

talosctl gen config talos-vbox-cluster https://$CONTROL_PLANE_IP:6443 --output-dir _out

This will create several files in the _out directory: init.yaml, controlplane.yaml, join.yaml, and talosconfig.

Create Control Plane Node

Using the init.yaml generated above, you can now apply this config using talosctl. Issue:

talosctl apply-config --insecure --nodes $CONTROL_PLANE_IP --file _out/init.yaml

You should now see some action in the Proxmox console for this VM. Talos will be installed to disk, the VM will reboot, and then Talos will configure the Kubernetes control plane on this VM.

Note: This process can be repeated multiple times to create an HA control plane. Simply apply controlplane.yaml instead of init.yaml for subsequent nodes.

Create Worker Node

Create at least a single worker node using a process similar to the control plane creation above. Start the worker node VM and wait for it to enter “maintenance mode”. Take note of the worker node’s IP address, which will be referred to as $WORKER_IP

Issue:

talosctl apply-config --insecure --nodes $WORKER_IP --file _out/join.yaml

Note: This process can be repeated multiple times to add additional workers.

Using the Cluster

Once the cluster is available, you can make use of talosctl and kubectl to interact with the cluster. For example, to view current running containers, run talosctl containers for a list of containers in the system namespace, or talosctl containers -k for the k8s.io namespace. To view the logs of a container, use talosctl logs <container> or talosctl logs -k <container>.

First, configure talosctl to talk to your control plane node by issuing the following, updating paths and IPs as necessary:

export TALOSCONFIG="_out/talosconfig"
talosctl config endpoint $CONTROL_PLANE_IP
talosctl config node $CONTROL_PLANE_IP

Retrieve and Configure the `kubeconfig`

Fetch the kubeconfig file from the control plane node by issuing:

talosctl kubeconfig

You can then use kubectl in this fashion:

kubectl get nodes

Cleaning Up

To cleanup, simply stop and delete the virtual machines from the Proxmox UI.

3.4 - VMware

Creating Talos Kubernetes cluster using VMware.

Creating a Cluster via the `govc` CLI

In this guide we will create an HA Kubernetes cluster with 3 worker nodes. We will use the govc cli which can be downloaded here.

Prerequisites

Prior to starting, it is important to have the following infrastructure in place and available:

DHCP server
Load Balancer or DNS address for cluster endpoint
- If using a load balancer, the most common setup is to balance tcp/443 across the control plane nodes tcp/6443
- If using a DNS address, the A record should return back the addresses of the control plane nodes

Create the Machine Configuration Files

Generating Base Configurations

Using the DNS name or name of the loadbalancer used in the prereq steps, generate the base configuration files for the Talos machines:

$ talosctl gen config talos-k8s-vmware-tutorial https://<load balancer IP or DNS>:<port>
created init.yaml
created controlplane.yaml
created join.yaml
created talosconfig

$ talosctl gen config talos-k8s-vmware-tutorial https://<DNS name>:6443
created init.yaml
created controlplane.yaml
created join.yaml
created talosconfig

At this point, you can modify the generated configs to your liking.

Validate the Configuration Files

$ talosctl validate --config init.yaml --mode cloud
init.yaml is valid for cloud mode
$ talosctl validate --config controlplane.yaml --mode cloud
controlplane.yaml is valid for cloud mode
$ talosctl validate --config join.yaml --mode cloud
join.yaml is valid for cloud mode

Set Environment Variables

govc makes use of the following environment variables

export GOVC_URL=<vCenter url>
export GOVC_USERNAME=<vCenter username>
export GOVC_PASSWORD=<vCenter password>

Note: If your vCenter installation makes use of self signed certificates, you’ll want to export GOVC_INSECURE=true.

There are some additional variables that you may need to set:

export GOVC_DATACENTER=<vCenter datacenter>
export GOVC_RESOURCE_POOL=<vCenter resource pool>
export GOVC_DATASTORE=<vCenter datastore>
export GOVC_NETWORK=<vCenter network>

Download the OVA

A talos.ova asset is published with each release. We will refer to the version of the release as $TALOS_VERSION below. It can be easily exported with export TALOS_VERSION="v0.3.0-alpha.10" or similar.

curl -LO https://github.com/siderolabs/talos/releases/download/$TALOS_VERSION/talos.ova

Import the OVA into vCenter

We’ll need to repeat this step for each Talos node we want to create. In a typical HA setup, we’ll have 3 control plane nodes and N workers. In the following example, we’ll setup a HA control plane with two worker nodes.

govc import.ova -name talos-$TALOS_VERSION /path/to/downloaded/talos.ova

Create the Bootstrap Node

We’ll clone the OVA to create the bootstrap node (our first control plane node).

govc vm.clone -on=false -vm talos-$TALOS_VERSION control-plane-1

Talos makes use of the guestinfo facility of VMware to provide the machine/cluster configuration. This can be set using the govc vm.change command. To facilitate persistent storage using the vSphere cloud provider integration with Kubernetes, disk.enableUUID=1 is used.

govc vm.change \
  -e "guestinfo.talos.config=$(cat init.yaml | base64)" \
  -e "disk.enableUUID=1" \
  -vm /ha-datacenter/vm/control-plane-1

Update Hardware Resources for the Bootstrap Node

-c is used to configure the number of cpus
-m is used to configure the amount of memory (in MB)

govc vm.change \
  -c 2 \
  -m 4096 \
  -vm /ha-datacenter/vm/control-plane-1

The following can be used to adjust the ephemeral disk size.

govc vm.disk.change -vm control-plane-1 -disk.name disk-1000-0 -size 10G

govc vm.power -on control-plane-1

Create the Remaining Control Plane Nodes

govc vm.clone -on=false -vm talos-$TALOS_VERSION control-plane-2
govc vm.change \
  -e "guestinfo.talos.config=$(base64 controlplane.yaml)" \
  -e "disk.enableUUID=1" \
  -vm /ha-datacenter/vm/control-plane-2
govc vm.clone -on=false -vm talos-$TALOS_VERSION control-plane-3
govc vm.change \
  -e "guestinfo.talos.config=$(base64 controlplane.yaml)" \
  -e "disk.enableUUID=1" \
  -vm /ha-datacenter/vm/control-plane-3

govc vm.change \
  -c 2 \
  -m 4096 \
  -vm /ha-datacenter/vm/control-plane-2
govc vm.change \
  -c 2 \
  -m 4096 \
  -vm /ha-datacenter/vm/control-plane-3

govc vm.disk.change -vm control-plane-2 -disk.name disk-1000-0 -size 10G
govc vm.disk.change -vm control-plane-3 -disk.name disk-1000-0 -size 10G

govc vm.power -on control-plane-2
govc vm.power -on control-plane-3

Update Settings for the Worker Nodes

govc vm.clone -on=false -vm talos-$TALOS_VERSION worker-1
govc vm.change \
  -e "guestinfo.talos.config=$(base64 join.yaml)" \
  -e "disk.enableUUID=1" \
  -vm /ha-datacenter/vm/worker-1
govc vm.clone -on=false -vm talos-$TALOS_VERSION worker-2
govc vm.change \
  -e "guestinfo.talos.config=$(base64 join.yaml)" \
  -e "disk.enableUUID=1" \
  -vm /ha-datacenter/vm/worker-2

govc vm.change \
  -c 4 \
  -m 8192 \
  -vm /ha-datacenter/vm/worker-1
govc vm.change \
  -c 4 \
  -m 8192 \
  -vm /ha-datacenter/vm/worker-2

govc vm.disk.change -vm worker-1 -disk.name disk-1000-0 -size 50G
govc vm.disk.change -vm worker-2 -disk.name disk-1000-0 -size 50G

govc vm.power -on worker-1
govc vm.power -on worker-2

Retrieve the `kubeconfig`

At this point we can retrieve the admin kubeconfig by running:

talosctl --talosconfig talosconfig config endpoint <control plane 1 IP>
talosctl --talosconfig talosconfig config node <control plane 1 IP>
talosctl --talosconfig talosconfig kubeconfig .

3.5 - Xen

Talos is known to work on Xen; however, it is currently undocumented.

4 - Cloud Platforms

4.1 - AWS

Creating a cluster via the AWS CLI.

Creating a Cluster via the AWS CLI

In this guide we will create an HA Kubernetes cluster with 3 worker nodes. We assume an existing VPC, and some familiarity with AWS. If you need more information on AWS specifics, please see the official AWS documentation.

Create the Subnet

aws ec2 create-subnet \
    --region $REGION \
    --vpc-id $VPC \
    --cidr-block ${CIDR_BLOCK}

Create the AMI

Prepare the Import Prerequisites

Create the S3 Bucket

aws s3api create-bucket \
    --bucket $BUCKET \
    --create-bucket-configuration LocationConstraint=$REGION \
    --acl private

Create the `vmimport` Role

In order to create an AMI, ensure that the vmimport role exists as described in the official AWS documentation.

Note that the role should be associated with the S3 bucket we created above.

Create the Image Snapshot

First, download the AWS image from a Talos release:

curl -LO https://github.com/talos-systems/talos/releases/latest/download/aws-amd64.tar.gz | tar -xv

Copy the RAW disk to S3 and import it as a snapshot:

aws s3 cp disk.raw s3://$BUCKET/talos-aws-tutorial.raw
aws ec2 import-snapshot \
    --region $REGION \
    --description "Talos kubernetes tutorial" \
    --disk-container "Format=raw,UserBucket={S3Bucket=$BUCKET,S3Key=talos-aws-tutorial.raw}"

Save the SnapshotId, as we will need it once the import is done. To check on the status of the import, run:

aws ec2 describe-import-snapshot-tasks \
    --region $REGION \
    --import-task-ids

Once the SnapshotTaskDetail.Status indicates completed, we can register the image.

Register the Image

aws ec2 register-image \
    --region $REGION \
    --block-device-mappings "DeviceName=/dev/xvda,VirtualName=talos,Ebs={DeleteOnTermination=true,SnapshotId=$SNAPSHOT,VolumeSize=4,VolumeType=gp2}" \
    --root-device-name /dev/xvda \
    --virtualization-type hvm \
    --architecture x86_64 \
    --ena-support \
    --name talos-aws-tutorial-ami

We now have an AMI we can use to create our cluster. Save the AMI ID, as we will need it when we create EC2 instances.

Create a Security Group

aws ec2 create-security-group \
    --region $REGION \
    --group-name talos-aws-tutorial-sg \
    --description "Security Group for EC2 instances to allow ports required by Talos"

Using the security group ID from above, allow all internal traffic within the same security group:

aws ec2 authorize-security-group-ingress \
    --region $REGION \
    --group-name talos-aws-tutorial-sg \
    --protocol all \
    --port 0 \
    --source-group $SECURITY_GROUP

and expose the Talos and Kubernetes APIs:

aws ec2 authorize-security-group-ingress \
    --region $REGION \
    --group-name talos-aws-tutorial-sg \
    --protocol tcp \
    --port 6443 \
    --cidr 0.0.0.0/0

aws ec2 authorize-security-group-ingress \
    --region $REGION \
    --group-name talos-aws-tutorial-sg \
    --protocol tcp \
    --port 50000-50001 \
    --cidr 0.0.0.0/0

Create a Load Balancer

aws elbv2 create-load-balancer \
    --region $REGION \
    --name talos-aws-tutorial-lb \
    --type network --subnets $SUBNET

Take note of the DNS name and ARN. We will need these soon.

Create the Machine Configuration Files

Generating Base Configurations

Using the DNS name of the loadbalancer created earlier, generate the base configuration files for the Talos machines:

$ talosctl gen config talos-k8s-aws-tutorial https://<load balancer IP or DNS>:<port>
created init.yaml
created controlplane.yaml
created join.yaml
created talosconfig

Take note that in this version of Talos, the generated configs are too long for AWS userdata field. Comments can be removed to workaround this with a sed command like:

cat init.yaml | sed 's/ #.//' > temp.yaml; mv temp.yaml init.yaml

cat controlplane.yaml | sed 's/ #.//' > temp.yaml; mv temp.yaml controlplane.yaml

At this point, you can modify the generated configs to your liking.

Validate the Configuration Files

$ talosctl validate --config init.yaml --mode cloud
init.yaml is valid for cloud mode
$ talosctl validate --config controlplane.yaml --mode cloud
controlplane.yaml is valid for cloud mode
$ talosctl validate --config join.yaml --mode cloud
join.yaml is valid for cloud mode

Create the EC2 Instances

Note: There is a known issue that prevents Talos from running on T2 instance types. Please use T3 if you need burstable instance types.

Create the Bootstrap Node

aws ec2 run-instances \
    --region $REGION \
    --image-id $AMI \
    --count 1 \
    --instance-type t3.small \
    --user-data file://init.yaml \
    --subnet-id $SUBNET \
    --security-group-ids $SECURITY_GROUP \
    --associate-public-ip-address \
    --tag-specifications "ResourceType=instance,Tags=[{Key=Name,Value=talos-aws-tutorial-cp-0}]"

Create the Remaining Control Plane Nodes

CP_COUNT=1
while [[ "$CP_COUNT" -lt 3 ]]; do
  aws ec2 run-instances \
    --region $REGION \
    --image-id $AMI \
    --count 1 \
    --instance-type t3.small \
    --user-data file://controlplane.yaml \
    --subnet-id $SUBNET \
    --security-group-ids $SECURITY_GROUP \
    --associate-public-ip-address \
    --tag-specifications "ResourceType=instance,Tags=[{Key=Name,Value=talos-aws-tutorial-cp-$CP_COUNT}]"
  ((CP_COUNT++))
done

Make a note of the resulting PrivateIpAddress from the init and controlplane nodes for later use.

Create the Worker Nodes

aws ec2 run-instances \
    --region $REGION \
    --image-id $AMI \
    --count 3 \
    --instance-type t3.small \
    --user-data file://join.yaml \
    --subnet-id $SUBNET \
    --security-group-ids $SECURITY_GROUP
    --tag-specifications "ResourceType=instance,Tags=[{Key=Name,Value=talos-aws-tutorial-worker}]"

Configure the Load Balancer

aws elbv2 create-target-group \
    --region $REGION \
    --name talos-aws-tutorial-tg \
    --protocol TCP \
    --port 6443 \
    --target-type ip \
    --vpc-id $VPC

Now, using the target group’s ARN, and the PrivateIpAddress from the instances that you created :

aws elbv2 register-targets \
    --region $REGION \
    --target-group-arn $TARGET_GROUP_ARN \
    --targets Id=$CP_NODE_1_IP  Id=$CP_NODE_2_IP  Id=$CP_NODE_3_IP

Using the ARNs of the load balancer and target group from previous steps, create the listener:

aws elbv2 create-listener \
    --region $REGION \
    --load-balancer-arn $LOAD_BALANCER_ARN \
    --protocol TCP \
    --port 443 \
    --default-actions Type=forward,TargetGroupArn=$TARGET_GROUP_ARN

Retrieve the `kubeconfig`

At this point we can retrieve the admin kubeconfig by running:

talosctl --talosconfig talosconfig config endpoint <control plane 1 IP>
talosctl --talosconfig talosconfig config node <control plane 1 IP>
talosctl --talosconfig talosconfig kubeconfig .

4.2 - Azure

Creating a cluster via the CLI on Azure.

Creating a Cluster via the CLI

In this guide we will create an HA Kubernetes cluster with 1 worker node. We assume existing Blob Storage, and some familiarity with Azure. If you need more information on Azure specifics, please see the official Azure documentation.

Environment Setup

We’ll make use of the following environment variables throughout the setup. Edit the variables below with your correct information.

# Storage account to use
export STORAGE_ACCOUNT="StorageAccountName"

# Storage container to upload to
export STORAGE_CONTAINER="StorageContainerName"

# Resource group name
export GROUP="ResourceGroupName"

# Location
export LOCATION="centralus"

# Get storage account connection string based on info above
export CONNECTION=$(az storage account show-connection-string \
                    -n $STORAGE_ACCOUNT \
                    -g $GROUP \
                    -o tsv)

Create the Image

First, download the Azure image from a Talos release. Once downloaded, untar with tar -xvf /path/to/azure-amd64.tar.gz

Upload the VHD

Once you have pulled down the image, you can upload it to blob storage with:

az storage blob upload \
  --connection-string $CONNECTION \
  --container-name $STORAGE_CONTAINER \
  -f /path/to/extracted/talos-azure.vhd \
  -n talos-azure.vhd

Register the Image

Now that the image is present in our blob storage, we’ll register it.

az image create \
  --name talos \
  --source https://$STORAGE_ACCOUNT.blob.core.windows.net/$STORAGE_CONTAINER/talos-azure.vhd \
  --os-type linux \
  -g $GROUP

Network Infrastructure

Virtual Networks and Security Groups

Once the image is prepared, we’ll want to work through setting up the network. Issue the following to create a network security group and add rules to it.

# Create vnet
az network vnet create \
  --resource-group $GROUP \
  --location $LOCATION \
  --name talos-vnet \
  --subnet-name talos-subnet

# Create network security group
az network nsg create -g $GROUP -n talos-sg

# Client -> apid
az network nsg rule create \
  -g $GROUP \
  --nsg-name talos-sg \
  -n apid \
  --priority 1001 \
  --destination-port-ranges 50000 \
  --direction inbound

# Trustd
az network nsg rule create \
  -g $GROUP \
  --nsg-name talos-sg \
  -n trustd \
  --priority 1002 \
  --destination-port-ranges 50001 \
  --direction inbound

# etcd
az network nsg rule create \
  -g $GROUP \
  --nsg-name talos-sg \
  -n etcd \
  --priority 1003 \
  --destination-port-ranges 2379-2380 \
  --direction inbound

# Kubernetes API Server
az network nsg rule create \
  -g $GROUP \
  --nsg-name talos-sg \
  -n kube \
  --priority 1004 \
  --destination-port-ranges 6443 \
  --direction inbound

Load Balancer

We will create a public ip, load balancer, and a health check that we will use for our control plane.

# Create public ip
az network public-ip create \
  --resource-group $GROUP \
  --name talos-public-ip \
  --allocation-method static

# Create lb
az network lb create \
  --resource-group $GROUP \
  --name talos-lb \
  --public-ip-address talos-public-ip \
  --frontend-ip-name talos-fe \
  --backend-pool-name talos-be-pool

# Create health check
az network lb probe create \
  --resource-group $GROUP \
  --lb-name talos-lb \
  --name talos-lb-health \
  --protocol tcp \
  --port 6443

# Create lb rule for 6443
az network lb rule create \
  --resource-group $GROUP \
  --lb-name talos-lb \
  --name talos-6443 \
  --protocol tcp \
  --frontend-ip-name talos-fe \
  --frontend-port 6443 \
  --backend-pool-name talos-be-pool \
  --backend-port 6443 \
  --probe-name talos-lb-health

Network Interfaces

In Azure, we have to pre-create the NICs for our control plane so that they can be associated with our load balancer.

for i in $( seq 0 1 2 ); do
  # Create public IP for each nic
  az network public-ip create \
    --resource-group $GROUP \
    --name talos-controlplane-public-ip-$i \
    --allocation-method static


  # Create nic
  az network nic create \
    --resource-group $GROUP \
    --name talos-controlplane-nic-$i \
    --vnet-name talos-vnet \
    --subnet talos-subnet \
    --network-security-group talos-sg \
    --public-ip-address talos-controlplane-public-ip-$i\
    --lb-name talos-lb \
    --lb-address-pools talos-be-pool
done

Cluster Configuration

With our networking bits setup, we’ll fetch the IP for our load balancer and create our configuration files.

LB_PUBLIC_IP=$(az network public-ip show \
              --resource-group $GROUP \
              --name talos-public-ip \
              --query [ipAddress] \
              --output tsv)

talosctl gen config talos-k8s-azure-tutorial https://${LB_PUBLIC_IP}:6443

Compute Creation

We are now ready to create our azure nodes.

# Create availability set
az vm availability-set create \
  --name talos-controlplane-av-set \
  -g $GROUP

# Create controlplane 0
az vm create \
  --name talos-controlplane-0 \
  --image talos \
  --custom-data ./init.yaml \
  -g $GROUP \
  --admin-username talos \
  --generate-ssh-keys \
  --verbose \
  --boot-diagnostics-storage $STORAGE_ACCOUNT \
  --os-disk-size-gb 20 \
  --nics talos-controlplane-nic-0 \
  --availability-set talos-controlplane-av-set \
  --no-wait

# Create 2 more controlplane nodes
for i in $( seq 1 2 ); do
  az vm create \
    --name talos-controlplane-$i \
    --image talos \
    --custom-data ./controlplane.yaml \
    -g $GROUP \
    --admin-username talos \
    --generate-ssh-keys \
    --verbose \
    --boot-diagnostics-storage $STORAGE_ACCOUNT \
    --os-disk-size-gb 20 \
    --nics talos-controlplane-nic-$i \
    --availability-set talos-controlplane-av-set \
    --no-wait
done

# Create worker node
  az vm create \
    --name talos-worker-0 \
    --image talos \
    --vnet-name talos-vnet \
    --subnet talos-subnet \
    --custom-data ./join.yaml \
    -g $GROUP \
    --admin-username talos \
    --generate-ssh-keys \
    --verbose \
    --boot-diagnostics-storage $STORAGE_ACCOUNT \
    --nsg talos-sg \
    --os-disk-size-gb 20 \
    --no-wait

# NOTES:
# `--admin-username` and `--generate-ssh-keys` are required by the az cli,
# but are not actually used by talos
# `--os-disk-size-gb` is the backing disk for Kubernetes and any workload containers
# `--boot-diagnostics-storage` is to enable console output which may be necessary
# for troubleshooting

Retrieve the `kubeconfig`

You should now be able to interact with your cluster with talosctl. We will need to discover the public IP for our first control plane node first.

CONTROL_PLANE_0_IP=$(az network public-ip show \
                    --resource-group $GROUP \
                    --name talos-controlplane-public-ip-0 \
                    --query [ipAddress] \
                    --output tsv)
talosctl --talosconfig ./talosconfig config endpoint $CONTROL_PLANE_0_IP
talosctl --talosconfig ./talosconfig config node $CONTROL_PLANE_0_IP
talosctl --talosconfig ./talosconfig kubeconfig .
kubectl --kubeconfig ./kubeconfig get nodes

4.3 - DigitalOcean

Creating a cluster via the CLI on DigitalOcean.

Creating a Cluster via the CLI

In this guide we will create an HA Kubernetes cluster with 1 worker node. We assume an existing Space, and some familiarity with DigitalOcean. If you need more information on DigitalOcean specifics, please see the official DigitalOcean documentation.

Create the Image

First, download the DigitalOcean image from a Talos release. Extract the archive to get the disk.raw file, compress it using gzip to disk.raw.gz.

Using an upload method of your choice (doctl does not have Spaces support), upload the image to a space. Now, create an image using the URL of the uploaded image:

doctl compute image create \
    --region $REGION \
    --image-description talos-digital-ocean-tutorial \
    --image-url https://talos-tutorial.$REGION.digitaloceanspaces.com/disk.raw.gz \
    Talos

Save the image ID. We will need it when creating droplets.

Create a Load Balancer

doctl compute load-balancer create \
    --region $REGION \
    --name talos-digital-ocean-tutorial-lb \
    --tag-name talos-digital-ocean-tutorial-control-plane \
    --health-check protocol:tcp,port:6443,check_interval_seconds:10,response_timeout_seconds:5,healthy_threshold:5,unhealthy_threshold:3 \
    --forwarding-rules entry_protocol:tcp,entry_port:443,target_protocol:tcp,target_port:6443

We will need the IP of the load balancer. Using the ID of the load balancer, run:

doctl compute load-balancer get --format IP <load balancer ID>

Save it, as we will need it in the next step.

Create the Machine Configuration Files

Generating Base Configurations

Using the DNS name of the loadbalancer created earlier, generate the base configuration files for the Talos machines:

$ talosctl gen config talos-k8s-digital-ocean-tutorial https://<load balancer IP or DNS>:<port>
created init.yaml
created controlplane.yaml
created join.yaml
created talosconfig

At this point, you can modify the generated configs to your liking.

Validate the Configuration Files

$ talosctl validate --config init.yaml --mode cloud
init.yaml is valid for cloud mode
$ talosctl validate --config controlplane.yaml --mode cloud
controlplane.yaml is valid for cloud mode
$ talosctl validate --config join.yaml --mode cloud
join.yaml is valid for cloud mode

Create the Droplets

Create the Bootstrap Node

doctl compute droplet create \
    --region $REGION \
    --image <image ID> \
    --size s-2vcpu-4gb \
    --enable-private-networking \
    --tag-names talos-digital-ocean-tutorial-control-plane \
    --user-data-file init.yaml \
    --ssh-keys <ssh key fingerprint> \
    talos-control-plane-1

Note: Although SSH is not used by Talos, DigitalOcean still requires that an SSH key be associated with the droplet. Create a dummy key that can be used to satisfy this requirement.

Create the Remaining Control Plane Nodes

Run the following twice, to give ourselves three total control plane nodes:

doctl compute droplet create \
    --region $REGION \
    --image <image ID> \
    --size s-2vcpu-4gb \
    --enable-private-networking \
    --tag-names talos-digital-ocean-tutorial-control-plane \
    --user-data-file controlplane.yaml \
    --ssh-keys <ssh key fingerprint> \
    talos-control-plane-2
doctl compute droplet create \
    --region $REGION \
    --image <image ID> \
    --size s-2vcpu-4gb \
    --enable-private-networking \
    --tag-names talos-digital-ocean-tutorial-control-plane \
    --user-data-file controlplane.yaml \
    --ssh-keys <ssh key fingerprint> \
    talos-control-plane-3

Create the Worker Nodes

Run the following to create a worker node:

doctl compute droplet create \
    --region $REGION \
    --image <image ID> \
    --size s-2vcpu-4gb \
    --enable-private-networking \
    --user-data-file join.yaml \
    --ssh-keys <ssh key fingerprint> \
    talos-worker-1

Retrieve the `kubeconfig`

To configure talosctl we will need the first control plane node’s IP:

doctl compute droplet get --format PublicIPv4 <droplet ID>

At this point we can retrieve the admin kubeconfig by running:

talosctl --talosconfig talosconfig config endpoint <control plane 1 IP>
talosctl --talosconfig talosconfig config node <control plane 1 IP>
talosctl --talosconfig talosconfig kubeconfig .

4.4 - GCP

Creating a cluster via the CLI on Google Cloud Platform.

Creating a Cluster via the CLI

In this guide, we will create an HA Kubernetes cluster in GCP with 1 worker node. We will assume an existing Cloud Storage bucket, and some familiarity with Google Cloud. If you need more information on Google Cloud specifics, please see the official Google documentation.

Environment Setup

We’ll make use of the following environment variables throughout the setup. Edit the variables below with your correct information.

# Storage account to use
export STORAGE_BUCKET="StorageBucketName"
# Region
export REGION="us-central1"

Create the Image

First, download the Google Cloud image from a Talos release. These images are called gcp-$ARCH.tar.gz.

Upload the Image

Once you have downloaded the image, you can upload it to your storage bucket with:

gsutil cp /path/to/gcp-amd64.tar.gz gs://$STORAGE_BUCKET

Register the image

Now that the image is present in our bucket, we’ll register it.

gcloud compute images create talos \
 --source-uri=gs://$STORAGE_BUCKET/gcp-amd64.tar.gz \
 --guest-os-features=VIRTIO_SCSI_MULTIQUEUE

Network Infrastructure

Load Balancers and Firewalls

Once the image is prepared, we’ll want to work through setting up the network. Issue the following to create a firewall, load balancer, and their required components.

# Create Instance Group
gcloud compute instance-groups unmanaged create talos-ig \
  --zone $REGION-b

# Create port for IG
gcloud compute instance-groups set-named-ports talos-ig \
    --named-ports tcp6443:6443 \
    --zone $REGION-b

# Create health check
gcloud compute health-checks create tcp talos-health-check --port 6443

# Create backend
gcloud compute backend-services create talos-be \
    --global \
    --protocol TCP \
    --health-checks talos-health-check \
    --timeout 5m \
    --port-name tcp6443

# Add instance group to backend
gcloud compute backend-services add-backend talos-be \
    --global \
    --instance-group talos-ig \
    --instance-group-zone $REGION-b

# Create tcp proxy
gcloud compute target-tcp-proxies create talos-tcp-proxy \
    --backend-service talos-be \
    --proxy-header NONE

# Create LB IP
gcloud compute addresses create talos-lb-ip --global

# Forward 443 from LB IP to tcp proxy
gcloud compute forwarding-rules create talos-fwd-rule \
    --global \
    --ports 443 \
    --address talos-lb-ip \
    --target-tcp-proxy talos-tcp-proxy

# Create firewall rule for health checks
gcloud compute firewall-rules create talos-controlplane-firewall \
     --source-ranges 130.211.0.0/22,35.191.0.0/16 \
     --target-tags talos-controlplane \
     --allow tcp:6443

# Create firewall rule to allow talosctl access
gcloud compute firewall-rules create talos-controlplane-talosctl \
  --source-ranges 0.0.0.0/0 \
  --target-tags talos-controlplane \
  --allow tcp:50000

Cluster Configuration

With our networking bits setup, we’ll fetch the IP for our load balancer and create our configuration files.

LB_PUBLIC_IP=$(gcloud compute forwarding-rules describe talos-fwd-rule \
               --global \
               --format json \
               | jq -r .IPAddress)

talosctl gen config talos-k8s-gcp-tutorial https://${LB_PUBLIC_IP}:443

Compute Creation

We are now ready to create our GCP nodes.

# Create control plane 0
gcloud compute instances create talos-controlplane-0 \
  --image talos \
  --zone $REGION-b \
  --tags talos-controlplane \
  --boot-disk-size 20GB \
  --metadata-from-file=user-data=./init.yaml

# Create control plane 1/2
for i in $( seq 1 2 ); do
  gcloud compute instances create talos-controlplane-$i \
    --image talos \
    --zone $REGION-b \
    --tags talos-controlplane \
    --boot-disk-size 20GB \
    --metadata-from-file=user-data=./controlplane.yaml
done

# Add control plane nodes to instance group
for i in $( seq 0 1 2 ); do
  gcloud compute instance-groups unmanaged add-instances talos-ig \
      --zone $REGION-b \
      --instances talos-controlplane-$i
done

# Create worker
gcloud compute instances create talos-worker-0 \
  --image talos \
  --zone $REGION-b \
  --boot-disk-size 20GB \
  --metadata-from-file=user-data=./join.yaml

Retrieve the `kubeconfig`

You should now be able to interact with your cluster with talosctl. We will need to discover the public IP for our first control plane node first.

CONTROL_PLANE_0_IP=$(gcloud compute instances describe talos-controlplane-0 \
                     --zone $REGION-b \
                     --format json \
                     | jq -r '.networkInterfaces[0].accessConfigs[0].natIP')

talosctl --talosconfig ./talosconfig config endpoint $CONTROL_PLANE_0_IP
talosctl --talosconfig ./talosconfig config node $CONTROL_PLANE_0_IP
talosctl --talosconfig ./talosconfig kubeconfig .
kubectl --kubeconfig ./kubeconfig get nodes

4.5 - OpenStack

Creating a cluster via the CLI on OpenStack.

Creating a Cluster via the CLI

In this guide, we will create an HA Kubernetes cluster in OpenStack with 1 worker node. We will assume an existing some familiarity with OpenStack. If you need more information on OpenStack specifics, please see the official OpenStack documentation.

Environment Setup

You should have an existing openrc file. This file will provide environment variables necessary to talk to your OpenStack cloud. See here for instructions on fetching this file.

Create the Image

First, download the OpenStack image from a Talos release. These images are called openstack-$ARCH.tar.gz. Untar this file with tar -xvf openstack-$ARCH.tar.gz. The resulting file will be called disk.raw.

Upload the Image

Once you have the image, you can upload to OpenStack with:

openstack image create --public --disk-format raw --file disk.raw talos

Network Infrastructure

Load Balancer and Network Ports

Once the image is prepared, you will need to work through setting up the network. Issue the following to create a load balancer, the necessary network ports for each control plane node, and associations between the two.

Creating loadbalancer:

# Create load balancer, updating vip-subnet-id if necessary
openstack loadbalancer create --name talos-control-plane --vip-subnet-id public

# Create listener
openstack loadbalancer listener create --name talos-control-plane-listener --protocol TCP --protocol-port 6443 talos-control-plane

# Pool and health monitoring
openstack loadbalancer pool create --name talos-control-plane-pool --lb-algorithm ROUND_ROBIN --listener talos-control-plane-listener --protocol TCP
openstack loadbalancer healthmonitor create --delay 5 --max-retries 4 --timeout 10 --type TCP talos-control-plane-pool

Creating ports:

# Create ports for control plane nodes, updating network name if necessary
openstack port create --network shared talos-control-plane-1
openstack port create --network shared talos-control-plane-2
openstack port create --network shared talos-control-plane-3

# Create floating IPs for the ports, so that you will have talosctl connectivity to each control plane
openstack floating ip create --port talos-control-plane-1 public
openstack floating ip create --port talos-control-plane-2 public
openstack floating ip create --port talos-control-plane-3 public

Note: Take notice of the private and public IPs associated with each of these ports, as they will be used in the next step. Additionally, take node of the port ID, as it will be used in server creation.

Associate port’s private IPs to loadbalancer:

# Create members for each port IP, updating subnet-id and address as necessary.
openstack loadbalancer member create --subnet-id shared-subnet --address <PRIVATE IP OF talos-control-plane-1 PORT> --protocol-port 6443 talos-control-plane-pool
openstack loadbalancer member create --subnet-id shared-subnet --address <PRIVATE IP OF talos-control-plane-2 PORT> --protocol-port 6443 talos-control-plane-pool
openstack loadbalancer member create --subnet-id shared-subnet --address <PRIVATE IP OF talos-control-plane-3 PORT> --protocol-port 6443 talos-control-plane-pool

Security Groups

This example uses the default security group in OpenStack. Ports have been opened to ensure that connectivity from both inside and outside the group is possible. You will want to allow, at a minimum, ports 6443 (Kubernetes API server) and 50000 (Talos API) from external sources. It is also recommended to allow communication over all ports from within the subnet.

Cluster Configuration

With our networking bits setup, we’ll fetch the IP for our load balancer and create our configuration files.

LB_PUBLIC_IP=$(openstack loadbalancer show talos-control-plane -f json | jq -r .vip_address)

talosctl gen config talos-k8s-openstack-tutorial https://${LB_PUBLIC_IP}:6443

Compute Creation

We are now ready to create our OpenStack nodes.

Create control plane:

# Create control plane 1. Substitute the correct path to configuration files and the desired flavor.
openstack server create talos-control-plane-1 --flavor m1.small --nic port-id=talos-control-plane-1 --image talos --user-data /path/to/init.yaml

# Create control planes 2 and 3, substituting the same info.
for i in $( seq 2 3 ); do
  openstack server create talos-control-plane-$i --flavor m1.small --nic port-id=talos-control-plane-$i --image talos --user-data /path/to/controlplane.yaml
done

Create worker:

# Update network name as necessary.
openstack server create talos-worker-1 --flavor m1.small --network shared --image talos --user-data /path/to/join.yaml

Note: This step can be repeated to add more workers.

Retrieve the `kubeconfig`

You should now be able to interact with your cluster with talosctl. We will use one of the floating IPs we allocated earlier. It does not matter which one.

talosctl --talosconfig ./talosconfig config endpoint <FLOATING_IP>
talosctl --talosconfig ./talosconfig config node <FLOATING_IP>
talosctl --talosconfig ./talosconfig kubeconfig
kubectl --kubeconfig ./kubeconfig get nodes

5 - Local Platforms

5.1 - Docker

Creating Talos Kubernetes cluster using Docker.

In this guide we will create a Kubernetes cluster in Docker, using a containerized version of Talos.

Running Talos in Docker is intended to be used in CI pipelines, and local testing when you need a quick and easy cluster. Furthermore, if you are running Talos in production, it provides an excellent way for developers to develop against the same version of Talos.

Requirements

The follow are requirements for running Talos in Docker:

Docker 18.03 or greater
a recent version of talosctl

Caveats

Due to the fact that Talos runs in a container, certain APIs are not available when running in Docker. For example upgrade, reset, and APIs like these don’t apply in container mode.

Create the Cluster

Creating a local cluster is as simple as:

talosctl cluster create --wait

Once the above finishes successfully, your talosconfig(~/.talos/config) will be configured to point to the new cluster.

If you are running on MacOS, an additional command is required:

talosctl config --endpoints 127.0.0.1

Note: Startup times can take up to a minute before the cluster is available.

Retrieve and Configure the `kubeconfig`

talosctl kubeconfig .
kubectl --kubeconfig kubeconfig config set-cluster talos-default --server https://127.0.0.1:6443

Using the Cluster

Cleaning Up

To cleanup, run:

talosctl cluster destroy

5.2 - Firecracker

Creating Talos Kubernetes cluster using Firecracker VMs.

In this guide we will create a Kubernetes cluster using Firecracker.

Note: Talos on QEMU offers easier way to run Talos in a set of VMs.

Requirements

Linux
a kernel with
- KVM enabled (/dev/kvm must exist)
- CONFIG_NET_SCH_NETEM enabled
- CONFIG_NET_SCH_INGRESS enabled
at least CAP_SYS_ADMIN and CAP_NET_ADMIN capabilities
firecracker (v0.21.0 or higher)
bridge, static and firewall CNI plugins from the standard CNI plugins, and tc-redirect-tap CNI plugin from the awslabs tc-redirect-tap installed to /opt/cni/bin
iptables
/etc/cni/conf.d directory should exist
/var/run/netns directory should exist

Installation

How to get firecracker (v0.21.0 or higher)

You can download firecracker binary via github.com/firecracker-microvm/firecracker/releases

curl https://github.com/firecracker-microvm/firecracker/releases/download/<version>/firecracker-<version>-<arch> -L -o firecracker

For example version v0.21.1 for linux platform:

curl https://github.com/firecracker-microvm/firecracker/releases/download/v0.21.1/firecracker-v0.21.1-x86_64 -L -o firecracker
sudo cp firecracker /usr/local/bin
sudo chmod +x /usr/local/bin/firecracker

Install talosctl

You can download talosctl and all required binaries via github.com/talos-systems/talos/releases

curl https://github.com/siderolabs/talos/releases/download/<version>/talosctl-<platform>-<arch> -L -o talosctl

For example version v0.9.0 for linux platform:

curl https://github.com/talos-systems/talos/releases/latest/download/talosctl-linux-amd64 -L -o talosctl
sudo cp talosctl /usr/local/bin
sudo chmod +x /usr/local/bin/talosctl

Install bridge, firewall and static required CNI plugins

You can download standard CNI required plugins via github.com/containernetworking/plugins/releases

curl https://github.com/containernetworking/plugins/releases/download/<version>/cni-plugins-<platform>-<arch>-<version>tgz -L -o cni-plugins-<platform>-<arch>-<version>.tgz

For example version v0.9.5 for linux platform:

curl https://github.com/containernetworking/plugins/releases/download/v0.9.5/cni-plugins-linux-amd64-v0.9.5.tgz -L -o cni-plugins-linux-amd64-v0.9.5.tgz
mkdir cni-plugins-linux
tar -xf cni-plugins-linux-amd64-v0.9.5.tgz -C cni-plugins-linux
sudo mkdir -p /opt/cni/bin
sudo cp cni-plugins-linux/{bridge,firewall,static} /opt/cni/bin

Install tc-redirect-tap CNI plugin

You should install CNI plugin from the tc-redirect-tap repository github.com/awslabs/tc-redirect-tap

go get -d github.com/awslabs/tc-redirect-tap/cmd/tc-redirect-tap
cd $GOPATH/src/github.com/awslabs/tc-redirect-tap
make all
sudo cp tc-redirect-tap /opt/cni/bin

Note: if $GOPATH is not set, it defaults to ~/go.

Install Talos kernel and initramfs

Firecracker provisioner depends on Talos uncompressed kernel (vmlinuz) and initramfs (initramfs.xz). These files can be downloaded from the Talos release:

mkdir -p _out/
curl https://github.com/siderolabs/talos/releases/download/<version>/vmlinuz -L -o _out/vmlinuz
curl https://github.com/siderolabs/talos/releases/download/<version>/initramfs.xz -L -o _out/initramfs.xz

For example version v0.9.0:

curl https://github.com/talos-systems/talos/releases/latest/download/vmlinuz -L -o _out/vmlinuz
curl https://github.com/talos-systems/talos/releases/latest/download/initramfs.xz -L -o _out/initramfs.xz

Create the Cluster

sudo talosctl cluster create --provisioner firecracker

Once the above finishes successfully, your talosconfig(~/.talos/config) will be configured to point to the new cluster.

Retrieve and Configure the `kubeconfig`

talosctl kubeconfig .

Using the Cluster

A bridge interface will be created, and assigned the default IP 10.5.0.1. Each node will be directly accessible on the subnet specified at cluster creation time. A loadbalancer runs on 10.5.0.1 by default, which handles loadbalancing for the Talos, and Kubernetes APIs.

You can see a summary of the cluster state by running:

$ talosctl cluster show --provisioner firecracker
PROVISIONER       firecracker
NAME              talos-default
NETWORK NAME      talos-default
NETWORK CIDR      10.5.0.0/24
NETWORK GATEWAY   10.5.0.1
NETWORK MTU       1500

NODES:

NAME                     TYPE           IP         CPU    RAM      DISK
talos-default-master-1   Init           10.5.0.2   1.00   1.6 GB   4.3 GB
talos-default-master-2   ControlPlane   10.5.0.3   1.00   1.6 GB   4.3 GB
talos-default-master-3   ControlPlane   10.5.0.4   1.00   1.6 GB   4.3 GB
talos-default-worker-1   Join           10.5.0.5   1.00   1.6 GB   4.3 GB

Cleaning Up

To cleanup, run:

sudo talosctl cluster destroy --provisioner firecracker

Note: In that case that the host machine is rebooted before destroying the cluster, you may need to manually remove ~/.talos/clusters/talos-default.

Manual Clean Up

The talosctl cluster destroy command depends heavily on the clusters state directory. It contains all related information of the cluster. The PIDs and network associated with the cluster nodes.

If you happened to have deleted the state folder by mistake or you would like to cleanup the environment, here are the steps how to do it manually:

Stopping VMs

Find the process of firecracker --api-sock execute:

ps -elf | grep '[f]irecracker --api-sock'

To stop the VMs manually, execute:

sudo kill -s SIGTERM <PID>

Example output, where VMs are running with PIDs 158065 and 158216

ps -elf | grep '[f]irecracker --api-sock'
4 S root      158065  157615 44  80   0 - 264152 -     07:54 ?        00:34:25 firecracker --api-sock /root/.talos/clusters/k8s/k8s-master-1.sock
4 S root      158216  157617 18  80   0 - 264152 -     07:55 ?        00:14:47 firecracker --api-sock /root/.talos/clusters/k8s/k8s-worker-1.sock
sudo kill -s SIGTERM 158065
sudo kill -s SIGTERM 158216

Remove VMs

Find the process of talosctl firecracker-launch execute:

ps -elf | grep 'talosctl firecracker-launch'

To remove the VMs manually, execute:

sudo kill -s SIGTERM <PID>

Example output, where VMs are running with PIDs 157615 and 157617

ps -elf | grep '[t]alosctl firecracker-launch'
0 S root      157615    2835  0  80   0 - 184934 -     07:53 ?        00:00:00 talosctl firecracker-launch
0 S root      157617    2835  0  80   0 - 185062 -     07:53 ?        00:00:00 talosctl firecracker-launch
sudo kill -s SIGTERM 157615
sudo kill -s SIGTERM 157617

Remove load balancer

Find the process of talosctl loadbalancer-launch execute:

ps -elf | grep 'talosctl loadbalancer-launch'

To remove the LB manually, execute:

sudo kill -s SIGTERM <PID>

Example output, where loadbalancer is running with PID 157609

ps -elf | grep '[t]alosctl loadbalancer-launch'
4 S root      157609    2835  0  80   0 - 184998 -     07:53 ?        00:00:07 talosctl loadbalancer-launch --loadbalancer-addr 10.5.0.1 --loadbalancer-upstreams 10.5.0.2
sudo kill -s SIGTERM 157609

Remove network

This is more tricky part as if you have already deleted the state folder. If you didn’t then it is written in the state.yaml in the /root/.talos/clusters/<cluster-name> directory.

sudo cat /root/.talos/clusters/<cluster-name>/state.yaml | grep bridgename
bridgename: talos<uuid>

If you only had one cluster, then it will be the interface with name talos<uuid>

46: talos<uuid>: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether a6:72:f4:0a:d3:9c brd ff:ff:ff:ff:ff:ff
    inet 10.5.0.1/24 brd 10.5.0.255 scope global talos17c13299
       valid_lft forever preferred_lft forever
    inet6 fe80::a472:f4ff:fe0a:d39c/64 scope link
       valid_lft forever preferred_lft forever

To remove this interface:

sudo ip link del talos<uuid>

Remove state directory

To remove the state directory execute:

sudo rm -Rf /root/.talos/clusters/<cluster-name>

Troubleshooting

Logs

Inspect logs directory

sudo cat /root/.talos/clusters/<cluster-name>/*.log

Logs are saved under <cluster-name>-<role>-<node-id>.log

For example in case of k8s cluster name:

sudo ls -la /root/.talos/clusters/k8s | grep log
-rw-r--r--. 1 root root      69415 Apr 26 20:58 k8s-master-1.log
-rw-r--r--. 1 root root      68345 Apr 26 20:58 k8s-worker-1.log
-rw-r--r--. 1 root root      24621 Apr 26 20:59 lb.log

Inspect logs during the installation

sudo su -
tail -f /root/.talos/clusters/<cluster-name>/*.log

Post-installation

After executing these steps and you should be able to use kubectl

sudo talosctl kubeconfig .
mv kubeconfig $HOME/.kube/config
sudo chown $USER:$USER $HOME/.kube/config

5.3 - QEMU

Creating Talos Kubernetes cluster using QEMU VMs.

In this guide we will create a Kubernetes cluster using QEMU.

Video Walkthrough

To see a live demo of this writeup, see the video below:

Requirements

Linux
a kernel with
- KVM enabled (/dev/kvm must exist)
- CONFIG_NET_SCH_NETEM enabled
- CONFIG_NET_SCH_INGRESS enabled
at least CAP_SYS_ADMIN and CAP_NET_ADMIN capabilities
QEMU
bridge, static and firewall CNI plugins from the standard CNI plugins, and tc-redirect-tap CNI plugin from the awslabs tc-redirect-tap installed to /opt/cni/bin (installed automatically by talosctl)
iptables
/var/run/netns directory should exist

Installation

How to get QEMU

Install QEMU with your operating system package manager. For example, on Ubuntu for x86:

apt install qemu-system-x86 qemu-kvm

Install talosctl

You can download talosctl and all required binaries via github.com/talos-systems/talos/releases

curl https://github.com/siderolabs/talos/releases/download/<version>/talosctl-<platform>-<arch> -L -o talosctl

For example version v0.9.0 for linux platform:

curl https://github.com/talos-systems/talos/releases/latest/download/talosctl-linux-amd64 -L -o talosctl
sudo cp talosctl /usr/local/bin
sudo chmod +x /usr/local/bin/talosctl

Install Talos kernel and initramfs

QEMU provisioner depends on Talos kernel (vmlinuz) and initramfs (initramfs.xz). These files can be downloaded from the Talos release:

mkdir -p _out/
curl https://github.com/siderolabs/talos/releases/download/<version>/vmlinuz-<arch> -L -o _out/vmlinuz-<arch>
curl https://github.com/siderolabs/talos/releases/download/<version>/initramfs-<arch>.xz -L -o _out/initramfs-<arch>.xz

For example version v0.9.0:

curl https://github.com/siderolabs/talos/releases/download/v0.9.0/vmlinuz-amd64 -L -o _out/vmlinuz-amd64
curl https://github.com/siderolabs/talos/releases/download/v0.9.0/initramfs-amd64.xz -L -o _out/initramfs-amd64.xz

Create the Cluster

For the first time, create root state directory as your user so that you can inspect the logs as non-root user:

mkdir -p ~/.talos/clusters

Create the cluster:

sudo -E talosctl cluster create --provisioner qemu

Before the first cluster is created, talosctl will download the CNI bundle for the VM provisioning and install it to ~/.talos/cni directory.

Once the above finishes successfully, your talosconfig (~/.talos/config) will be configured to point to the new cluster, and kubeconfig will be downloaded and merged into default kubectl config location (~/.kube/config).

Cluster provisioning process can be optimized with registry pull-through caches.

Using the Cluster

Once the cluster is available, you can make use of talosctl and kubectl to interact with the cluster. For example, to view current running containers, run talosctl -n 10.5.0.2 containers for a list of containers in the system namespace, or talosctl -n 10.5.0.2 containers -k for the k8s.io namespace. To view the logs of a container, use talosctl -n 10.5.0.2 logs <container> or talosctl -n 10.5.0.2 logs -k <container>.

You can see a summary of the cluster state by running:

$ talosctl cluster show --provisioner qemu
PROVISIONER       qemu
NAME              talos-default
NETWORK NAME      talos-default
NETWORK CIDR      10.5.0.0/24
NETWORK GATEWAY   10.5.0.1
NETWORK MTU       1500

NODES:

NAME                     TYPE           IP         CPU    RAM      DISK
talos-default-master-1   Init           10.5.0.2   1.00   1.6 GB   4.3 GB
talos-default-master-2   ControlPlane   10.5.0.3   1.00   1.6 GB   4.3 GB
talos-default-master-3   ControlPlane   10.5.0.4   1.00   1.6 GB   4.3 GB
talos-default-worker-1   Join           10.5.0.5   1.00   1.6 GB   4.3 GB

Cleaning Up

To cleanup, run:

sudo -E talosctl cluster destroy --provisioner qemu

Note: In that case that the host machine is rebooted before destroying the cluster, you may need to manually remove ~/.talos/clusters/talos-default.

Manual Clean Up

The talosctl cluster destroy command depends heavily on the clusters state directory. It contains all related information of the cluster. The PIDs and network associated with the cluster nodes.

If you happened to have deleted the state folder by mistake or you would like to cleanup the environment, here are the steps how to do it manually:

Remove VM Launchers

Find the process of talosctl qemu-launch:

ps -elf | grep 'talosctl qemu-launch'

To remove the VMs manually, execute:

sudo kill -s SIGTERM <PID>

Example output, where VMs are running with PIDs 157615 and 157617

ps -elf | grep '[t]alosctl qemu-launch'
0 S root      157615    2835  0  80   0 - 184934 -     07:53 ?        00:00:00 talosctl qemu-launch
0 S root      157617    2835  0  80   0 - 185062 -     07:53 ?        00:00:00 talosctl qemu-launch
sudo kill -s SIGTERM 157615
sudo kill -s SIGTERM 157617

Stopping VMs

Find the process of qemu-system:

ps -elf | grep 'qemu-system'

To stop the VMs manually, execute:

sudo kill -s SIGTERM <PID>

Example output, where VMs are running with PIDs 158065 and 158216

ps -elf | grep qemu-system
2 S root     1061663 1061168 26  80   0 - 1786238 -    14:05 ?        01:53:56 qemu-system-x86_64 -m 2048 -drive format=raw,if=virtio,file=/home/username/.talos/clusters/talos-default/bootstrap-master.disk -smp cpus=2 -cpu max -nographic -netdev tap,id=net0,ifname=tap0,script=no,downscript=no -device virtio-net-pci,netdev=net0,mac=1e:86:c6:b4:7c:c4 -device virtio-rng-pci -no-reboot -boot order=cn,reboot-timeout=5000 -smbios type=1,uuid=7ec0a73c-826e-4eeb-afd1-39ff9f9160ca -machine q35,accel=kvm
2 S root     1061663 1061170 67  80   0 - 621014 -     21:23 ?        00:00:07 qemu-system-x86_64 -m 2048 -drive format=raw,if=virtio,file=/homeusername/.talos/clusters/talos-default/pxe-1.disk -smp cpus=2 -cpu max -nographic -netdev tap,id=net0,ifname=tap0,script=no,downscript=no -device virtio-net-pci,netdev=net0,mac=36:f3:2f:c3:9f:06 -device virtio-rng-pci -no-reboot -boot order=cn,reboot-timeout=5000 -smbios type=1,uuid=ce12a0d0-29c8-490f-b935-f6073ab916a6 -machine q35,accel=kvm
sudo kill -s SIGTERM 1061663
sudo kill -s SIGTERM 1061663

Remove load balancer

Find the process of talosctl loadbalancer-launch:

ps -elf | grep 'talosctl loadbalancer-launch'

To remove the LB manually, execute:

sudo kill -s SIGTERM <PID>

Example output, where loadbalancer is running with PID 157609

ps -elf | grep '[t]alosctl loadbalancer-launch'
4 S root      157609    2835  0  80   0 - 184998 -     07:53 ?        00:00:07 talosctl loadbalancer-launch --loadbalancer-addr 10.5.0.1 --loadbalancer-upstreams 10.5.0.2
sudo kill -s SIGTERM 157609

Remove DHCP server

Find the process of talosctl dhcpd-launch:

ps -elf | grep 'talosctl dhcpd-launch'

To remove the LB manually, execute:

sudo kill -s SIGTERM <PID>

Example output, where loadbalancer is running with PID 157609

ps -elf | grep '[t]alosctl dhcpd-launch'
4 S root      157609    2835  0  80   0 - 184998 -     07:53 ?        00:00:07 talosctl dhcpd-launch --state-path /home/username/.talos/clusters/talos-default --addr 10.5.0.1 --interface talosbd9c32bc
sudo kill -s SIGTERM 157609

Remove network

This is more tricky part as if you have already deleted the state folder. If you didn’t then it is written in the state.yaml in the ~/.talos/clusters/<cluster-name> directory.

sudo cat ~/.talos/clusters/<cluster-name>/state.yaml | grep bridgename
bridgename: talos<uuid>

If you only had one cluster, then it will be the interface with name talos<uuid>

46: talos<uuid>: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether a6:72:f4:0a:d3:9c brd ff:ff:ff:ff:ff:ff
    inet 10.5.0.1/24 brd 10.5.0.255 scope global talos17c13299
       valid_lft forever preferred_lft forever
    inet6 fe80::a472:f4ff:fe0a:d39c/64 scope link
       valid_lft forever preferred_lft forever

To remove this interface:

sudo ip link del talos<uuid>

Remove state directory

To remove the state directory execute:

sudo rm -Rf /home/$USER/.talos/clusters/<cluster-name>

Troubleshooting

Logs

Inspect logs directory

sudo cat ~/.talos/clusters/<cluster-name>/*.log

Logs are saved under <cluster-name>-<role>-<node-id>.log

For example in case of k8s cluster name:

ls -la ~/.talos/clusters/k8s | grep log
-rw-r--r--. 1 root root      69415 Apr 26 20:58 k8s-master-1.log
-rw-r--r--. 1 root root      68345 Apr 26 20:58 k8s-worker-1.log
-rw-r--r--. 1 root root      24621 Apr 26 20:59 lb.log

Inspect logs during the installation

tail -f ~/.talos/clusters/<cluster-name>/*.log

5.4 - VirtualBox

Creating Talos Kubernetes cluster using VurtualBox VMs.

In this guide we will create a Kubernetes cluster using VirtualBox.

Video Walkthrough

To see a live demo of this writeup, visit Youtube here:

Installation

How to Get VirtualBox

Install VirtualBox with your operating system package manager or from the website. For example, on Ubuntu for x86:

apt install virtualbox

Install talosctl

You can download talosctl via github.com/talos-systems/talos/releases

curl https://github.com/siderolabs/talos/releases/download/<version>/talosctl-<platform>-<arch> -L -o talosctl

For example version v0.9.0 for linux platform:

curl https://github.com/talos-systems/talos/releases/latest/download/talosctl-linux-amd64 -L -o talosctl
sudo cp talosctl /usr/local/bin
sudo chmod +x /usr/local/bin/talosctl

Download ISO Image

In order to install Talos in VirtualBox, you will need the ISO image from the Talos release page. You can download talos-amd64.iso via github.com/talos-systems/talos/releases

mkdir -p _out/
curl https://github.com/siderolabs/talos/releases/download/<version>/talos-<arch>.iso -L -o _out/talos-<arch>.iso

For example version v0.9.0 for linux platform:

mkdir -p _out/
curl https://github.com/talos-systems/talos/releases/latest/download/talos-amd64.iso -L -o _out/talos-amd64.iso

Create VMs

Start by creating a new VM by clicking the “New” button in the VirtualBox UI:

Supply a name for this VM, and specify the Type and Version:

Edit the memory to supply at least 2GB of RAM for the VM:

Proceed through the disk settings, keeping the defaults. You can increase the disk space if desired.

Once created, select the VM and hit “Settings”:

In the “System” section, supply at least 2 CPUs:

In the “Network” section, switch the network “Attached To” section to “Bridged Adapter”:

Finally, in the “Storage” section, select the optical drive and, on the right, select the ISO by browsing your filesystem:

Repeat this process for a second VM to use as a worker node. You can also repeat this for additional nodes desired.

Start Control Plane Node

Generate Machine Configurations

talosctl gen config talos-vbox-cluster https://$CONTROL_PLANE_IP:6443 --output-dir _out

This will create several files in the _out directory: init.yaml, controlplane.yaml, join.yaml, and talosconfig.

Create Control Plane Node

Using the init.yaml generated above, you can now apply this config using talosctl. Issue:

talosctl apply-config --insecure --nodes $CONTROL_PLANE_IP --file _out/init.yaml

You should now see some action in the VirtualBox console for this VM. Talos will be installed to disk, the VM will reboot, and then Talos will configure the Kubernetes control plane on this VM.

Note: This process can be repeated multiple times to create an HA control plane. Simply apply controlplane.yaml instead of init.yaml for subsequent nodes.

Create Worker Node

Issue:

talosctl apply-config --insecure --nodes $WORKER_IP --file _out/join.yaml

Note: This process can be repeated multiple times to add additional workers.

Using the Cluster

First, configure talosctl to talk to your control plane node by issuing the following, updating paths and IPs as necessary:

export TALOSCONFIG="_out/talosconfig"
talosctl config endpoint $CONTROL_PLANE_IP
talosctl config node $CONTROL_PLANE_IP

Retrieve and Configure the `kubeconfig`

Fetch the kubeconfig file from the control plane node by issuing:

talosctl kubeconfig

You can then use kubectl in this fashion:

kubectl get nodes

Cleaning Up

To cleanup, simply stop and delete the virtual machines from the VirtualBox UI.

6 - Single Board Computers

6.1 - Banana Pi M64

Installing Talos on Banana Pi M64 SBC using raw disk image.

Prerequisites

You will need

talosctl
an SD card

Download the latest alpha talosctl.

curl -Lo /usr/local/bin/talosctl https://github.com/talos-systems/talos/releases/latest/download/talosctl-$(uname -s | tr "[:upper:]" "[:lower:]")-amd64
chmod +x /usr/local/bin/talosctl

Download the Image

Download the image and decompress it:

curl -LO https://github.com/talos-systems/talos/releases/latest/download/metal-bananapi_m64-arm64.img.xz
xz -d metal-bananapi_m64-arm64.img.xz

Writing the Image

The path to your SD card can be found using fdisk on Linux or diskutil on Mac OS. In this example we will assume /dev/mmcblk0.

Now dd the image to your SD card:

sudo dd if=metal-bananapi_m64-arm64.img of=/dev/mmcblk0 conv=fsync bs=4M

Bootstrapping the Node

Insert the SD card to your board, turn it on and wait for the console to show you the instructions for bootstrapping the node. Following the instructions in the console output to connect to the interactive installer:

talosctl apply-config --insecure --interactive --nodes <node IP or DNS name>

Once the interactive installation is applied, the cluster will form and you can then use kubectl.

Retrieve the `kubeconfig`

Retrieve the admin kubeconfig by running:

talosctl kubeconfig

6.2 - Libre Computer Board ALL-H3-CC

Installing Talos on Libre Computer Board ALL-H3-CC SBC using raw disk image.

Prerequisites

You will need

talosctl
an SD card

Download the latest alpha talosctl.

curl -Lo /usr/local/bin/talosctl https://github.com/talos-systems/talos/releases/latest/download/talosctl-$(uname -s | tr "[:upper:]" "[:lower:]")-amd64
chmod +x /usr/local/bin/talosctl

Download the Image

Download the image and decompress it:

curl -LO https://github.com/talos-systems/talos/releases/latest/download/metal-libretech_all_h3_cc_h5-arm64.img.xz
xz -d metal-libretech_all_h3_cc_h5-arm64.img.xz

Writing the Image

The path to your SD card can be found using fdisk on Linux or diskutil on Mac OS. In this example we will assume /dev/mmcblk0.

Now dd the image to your SD card:

sudo dd if=metal-libretech_all_h3_cc_h5-arm64.img of=/dev/mmcblk0 conv=fsync bs=4M

Bootstrapping the Node

talosctl apply-config --insecure --interactive --nodes <node IP or DNS name>

Once the interactive installation is applied, the cluster will form and you can then use kubectl.

Retrieve the `kubeconfig`

Retrieve the admin kubeconfig by running:

talosctl kubeconfig

6.3 - Pine64 Rock64

Installing Talos on Pine64 Rock64 SBC using raw disk image.

Prerequisites

You will need

talosctl
an SD card

Download the latest alpha talosctl.

curl -Lo /usr/local/bin/talosctl https://github.com/talos-systems/talos/releases/latest/download/talosctl-$(uname -s | tr "[:upper:]" "[:lower:]")-amd64
chmod +x /usr/local/bin/talosctl

Download the Image

Download the image and decompress it:

curl -LO https://github.com/talos-systems/talos/releases/latest/download/metal-rock64-arm64.img.xz
xz -d metal-rock64-arm64.img.xz

Writing the Image

The path to your SD card can be found using fdisk on Linux or diskutil on Mac OS. In this example we will assume /dev/mmcblk0.

Now dd the image to your SD card:

sudo dd if=metal-rock64-arm64.img of=/dev/mmcblk0 conv=fsync bs=4M

Bootstrapping the Node

talosctl apply-config --insecure --interactive --nodes <node IP or DNS name>

Once the interactive installation is applied, the cluster will form and you can then use kubectl.

Retrieve the `kubeconfig`

Retrieve the admin kubeconfig by running:

talosctl kubeconfig

6.4 - Radxa ROCK PI 4c

Installing Talos on Radxa ROCK PI 4c SBC using raw disk image.

Prerequisites

You will need

talosctl
an SD card

Download the latest alpha talosctl.

curl -Lo /usr/local/bin/talosctl https://github.com/talos-systems/talos/releases/latest/download/talosctl-$(uname -s | tr "[:upper:]" "[:lower:]")-amd64
chmod +x /usr/local/bin/talosctl

Download the Image

Download the image and decompress it:

curl -LO https://github.com/talos-systems/talos/releases/latest/download/metal-rockpi_4-arm64.img.xz
xz -d metal-rockpi_4-arm64.img.xz

Writing the Image

The path to your SD card can be found using fdisk on Linux or diskutil on Mac OS. In this example we will assume /dev/mmcblk0.

Now dd the image to your SD card:

sudo dd if=metal-rockpi_4-arm64.img of=/dev/mmcblk0 conv=fsync bs=4M

Bootstrapping the Node

talosctl apply-config --insecure --interactive --nodes <node IP or DNS name>

Once the interactive installation is applied, the cluster will form and you can then use kubectl.

Retrieve the `kubeconfig`

Retrieve the admin kubeconfig by running:

talosctl kubeconfig

6.5 - Raspberry Pi 4 Model B

Installing Talos on Rpi4 SBC using raw disk image.

Video Walkthrough

To see a live demo of this writeup, see the video below:

Prerequisites

You will need

talosctl
an SD card

Download the latest alpha talosctl.

curl -Lo /usr/local/bin/talosctl https://github.com/talos-systems/talos/releases/latest/download/talosctl-$(uname -s | tr "[:upper:]" "[:lower:]")-amd64
chmod +x /usr/local/bin/talosctl

Updating the EEPROM

At least version v2020.09.03-138a1 of the bootloader (rpi-eeprom) is required. To update the bootloader we will need an SD card. Insert the SD card into your computer and run the following: The path to your SD card can be found using fdisk on Linux or diskutil on Mac OS. In this example we will assume /dev/mmcblk0.

curl -LO https://github.com/raspberrypi/rpi-eeprom/releases/download/v2020.09.03-138a1/rpi-boot-eeprom-recovery-2020-09-03-vl805-000138a1.zip
sudo mkfs.fat -I /dev/mmcblk0
sudo mount /dev/mmcblk0 /mnt
sudo bsdtar rpi-boot-eeprom-recovery-2020-09-03-vl805-000138a1.zip -C /mnt

Remove the SD card from your local machine and insert it into the Raspberry Pi. Power the Raspberry Pi on, and wait at least 10 seconds. If successful, the green LED light will blink rapidly (forever), otherwise an error pattern will be displayed. If an HDMI display is attached then the screen will display green for success or red if a failure occurs. Power off the Raspberry Pi and remove the SD card from it.

Note: Updating the bootloader only needs to be done once.

Download the Image

Download the image and decompress it:

curl -LO https://github.com/talos-systems/talos/releases/latest/download/metal-rpi_4-arm64.img.xz
xz -d metal-rpi_4-arm64.img.xz

Writing the Image

Now dd the image to your SD card:

sudo dd if=metal-rpi_4-arm64.img of=/dev/mmcblk0 conv=fsync bs=4M

Bootstrapping the Node

talosctl apply-config --insecure --interactive --nodes <node IP or DNS name>

Once the interactive installation is applied, the cluster will form and you can then use kubectl.

Retrieve the `kubeconfig`

Retrieve the admin kubeconfig by running:

talosctl kubeconfig

Troubleshooting

The following table can be used to troubleshoot booting issues:

Long Flashes	Short Flashes	Status
0	3	Generic failure to boot
0	4	start*.elf not found
0	7	Kernel image not found
0	8	SDRAM failure
0	9	Insufficient SDRAM
0	10	In HALT state
2	1	Partition not FAT
2	2	Failed to read from partition
2	3	Extended partition not FAT
2	4	File signature/hash mismatch - Pi 4
4	4	Unsupported board type
4	5	Fatal firmware error
4	6	Power failure type A
4	7	Power failure type B

7 - Guides

7.1 - Advanced Networking

Static Addressing

Static addressing is comprised of specifying cidr, routes ( remember to add your default gateway ), and interface. Most likely you’ll also want to define the nameservers so you have properly functioning DNS.

machine:
  network:
    hostname: talos
    nameservers:
      - 10.0.0.1
    interfaces:
      - interface: eth0
        cidr: 10.0.0.201/8
        mtu: 8765
        routes:
          - network: 0.0.0.0/0
            gateway: 10.0.0.1
      - interface: eth1
        ignore: true
  time:
    servers:
      - time.cloudflare.com

Additional Addresses for an Interface

In some environments you may need to set additional addresses on an interface. In the following example, we set two additional addresses on the loopback interface.

machine:
  network:
    interfaces:
      - interface: lo0
        cidr: 192.168.0.21/24
      - interface: lo0
        cidr: 10.2.2.2/24

Bonding

The following example shows how to create a bonded interface.

machine:
  network:
    interfaces:
      - interface: bond0
        dhcp: true
        bond:
          mode: 802.3ad
          lacpRate: fast
          xmitHashPolicy: layer3+4
          miimon: 100
          updelay: 200
          downdelay: 200
          interfaces:
            - eth0
            - eth1

VLANs

To setup vlans on a specific device use an array of VLANs to add. The master device may be configured without addressing by setting dhcp to false.

machine:
  network:
    interfaces:
      - interface: eth0
        dhcp: false
        vlans:
          - vlanId: 100
            cidr: "192.168.2.10/28"
            routes:
              - network: 0.0.0.0/0
                gateway: 192.168.2.1

7.2 - Air-gapped Environments

In this guide we will create a Talos cluster running in an air-gapped environment with all the required images being pulled from an internal registry. We will use the QEMU provisioner available in talosctl to create a local cluster, but the same approach could be used to deploy Talos in bigger air-gapped networks.

Requirements

The follow are requirements for this guide:

Docker 18.03 or greater
Requirements for the Talos QEMU cluster

Identifying Images

In air-gapped environments, access to the public Internet is restricted, so Talos can’t pull images from public Docker registries (docker.io, ghcr.io, etc.) We need to identify the images required to install and run Talos. The same strategy can be used for images required by custom workloads running on the cluster.

The talosctl images command provides a list of default images used by the Talos cluster (with default configuration settings). To print the list of images, run:

talosctl images

This list contains images required by a default deployment of Talos. There might be additional images required for the workloads running on this cluster, and those should be added to this list.

Preparing the Internal Registry

As access to the public registries is restricted, we have to run an internal Docker registry. In this guide, we will launch the registry on the same machine using Docker:

$ docker run -d -p 6000:5000 --restart always --name registry-aigrapped registry:2
1bf09802bee1476bc463d972c686f90a64640d87dacce1ac8485585de69c91a5

This registry will be accepting connections on port 6000 on the host IPs. The registry is empty by default, so we have fill it with the images required by Talos.

First, we pull all the images to our local Docker daemon:

$ for image in `talosctl images`; do docker pull $image; done
v0.12.0-amd64: Pulling from coreos/flannel
Digest: sha256:6d451d92c921f14bfb38196aacb6e506d4593c5b3c9d40a8b8a2506010dc3e10
...

All images are now stored in the Docker daemon store:

$ docker images
ghcr.io/talos-systems/install-cni    v0.3.0-12-g90722c3      980d36ee2ee1        5 days ago          79.7MB
k8s.gcr.io/kube-proxy-amd64          v1.20.0                 33c60812eab8        2 weeks ago         118MB
...

Now we need to re-tag them so that we can push them to our local registry. We are going to replace the first component of the image name (before the first slash) with our registry endpoint 127.0.0.1:6000:

$ for image in `talosctl images`; do \
    docker tag $image `echo $image | sed -E 's#^[^/]+/#127.0.0.1:6000/#'` \
  done

As the next step, we push images to the internal registry:

$ for image in `talosctl images`; do \
    docker push `echo $image | sed -E 's#^[^/]+/#127.0.0.1:6000/#'` \
  done

We can now verify that the images are pushed to the registry:

$ curl  http://127.0.0.1:6000/v2/_catalog
{"repositories":["autonomy/kubelet","coredns","coreos/flannel","etcd-development/etcd","kube-apiserver-amd64","kube-controller-manager-amd64","kube-proxy-amd64","kube-scheduler-amd64","talos-systems/install-cni","talos-systems/installer"]}

Note: images in the registry don’t have the registry endpoint prefix anymore.

Launching Talos in an Air-gapped Environment

For Talos to use the internal registry, we use the registry mirror feature to redirect all the image pull requests to the internal registry. This means that the registry endpoint (as the first component of the image reference) gets ignored, and all pull requests are sent directly to the specified endpoint.

We are going to use a QEMU-based Talos cluster for this guide, but the same approach works with Docker-based clusters as well. As QEMU-based clusters go through the Talos install process, they can be used better to model a real air-gapped environment.

The talosctl cluster create command provides conveniences for common configuration options. The only required flag for this guide is --registry-mirror '*'=http://10.5.0.1:6000 which redirects every pull request to the internal registry. The endpoint being used is 10.5.0.1, as this is the default bridge interface address which will be routable from the QEMU VMs (127.0.0.1 IP will be pointing to the VM itself).

$ sudo -E talosctl cluster create --provisioner=qemu --registry-mirror '*'=http://10.5.0.1:6000 --install-image=ghcr.io/talos-systems/installer:v0.9.0
validating CIDR and reserving IPs
generating PKI and tokens
creating state directory in "/home/smira/.talos/clusters/talos-default"
creating network talos-default
creating load balancer
creating dhcpd
creating master nodes
creating worker nodes
waiting for API
...

Note: --install-image should match the image which was copied into the internal registry in the previous step.

You can be verify that the cluster is air-gapped by inspecting the registry logs: docker logs -f registry-airgapped.

Closing Notes

Running in an air-gapped environment might require additional configuration changes, for example using custom settings for DNS and NTP servers.

When scaling this guide to the bare-metal environment, following Talos config snippet could be used as an equivalent of the --registry-mirror flag above:

machine:
  ...
  registries:
      mirrors:
      '*':
          endpoints:
          - http://10.5.0.1:6000/
...

Other implementations of Docker registry can be used in place of the Docker registry image used above to run the registry. If required, auth can be configured for the internal registry (and custom TLS certificates if needed).

7.3 - Configuring Certificate Authorities

Appending the Certificate Authority

Put into each machine the PEM encoded certificate:

machine:
  ...
  files:
    - content: |
        -----BEGIN CERTIFICATE-----
        ...
        -----END CERTIFICATE-----        
      permissions: 0644
      path: /etc/ssl/certs/ca-certificates
      op: append

7.4 - Configuring Containerd

The base containerd configuration expects to merge in any additional configs present in /var/cri/conf.d/*.toml.

An example of exposing metrics

Into each machine config, add the following:

machine:
  ...
  files:
    - content: |
        [metrics]
          address = "0.0.0.0:11234"        
      path: /var/cri/conf.d/metrics.toml
      op: create

Create cluster like normal and see that metrics are now present on this port:

$ curl 127.0.0.1:11234/v1/metrics
# HELP container_blkio_io_service_bytes_recursive_bytes The blkio io service bytes recursive
# TYPE container_blkio_io_service_bytes_recursive_bytes gauge
container_blkio_io_service_bytes_recursive_bytes{container_id="0677d73196f5f4be1d408aab1c4125cf9e6c458a4bea39e590ac779709ffbe14",device="/dev/dm-0",major="253",minor="0",namespace="k8s.io",op="Async"} 0
container_blkio_io_service_bytes_recursive_bytes{container_id="0677d73196f5f4be1d408aab1c4125cf9e6c458a4bea39e590ac779709ffbe14",device="/dev/dm-0",major="253",minor="0",namespace="k8s.io",op="Discard"} 0
...
...

7.5 - Configuring Corporate Proxies

Appending the Certificate Authority of MITM Proxies

Put into each machine the PEM encoded certificate:

machine:
  ...
  files:
    - content: |
        -----BEGIN CERTIFICATE-----
        ...
        -----END CERTIFICATE-----        
      permissions: 0644
      path: /etc/ssl/certs/ca-certificates
      op: append

Configuring a Machine to Use the Proxy

To make use of a proxy:

machine:
  env:
    http_proxy: <http proxy>
    https_proxy: <https proxy>
    no_proxy: <no proxy>

Additionally, configure the DNS nameservers, and NTP servers:

machine:
  env:
  ...
  time:
    servers:
      - <server 1>
      - <server ...>
      - <server n>
  ...
  network:
    nameservers:
      - <ip 1>
      - <ip ...>
      - <ip n>

7.6 - Configuring Network Connectivity

Configuring Network Connectivity

The simplest way to deploy Talos is by ensuring that all the remote components of the system (talosctl, the control plane nodes, and worker nodes) all have layer 2 connectivity. This is not always possible, however, so this page lays out the minimal network access that is required to configure and operate a talos cluster.

Note: These are the ports required for Talos specifically, and should be configured in addition to the ports required by kubernetes. See the kubernetes docs for information on the ports used by kubernetes itself.

Control plane node(s)

Protocol	Direction	Port Range	Purpose	Used By
TCP	Inbound	50000*	apid	talosctl
TCP	Inbound	50001*	trustd	Control plane nodes, worker nodes

Ports marked with a * are not currently configurable, but that may change in the future. Follow along here.

Worker node(s)

Protocol	Direction	Port Range	Purpose	Used By
TCP	Inbound	50001*	trustd	Control plane nodes

Ports marked with a * are not currently configurable, but that may change in the future. Follow along here.

7.7 - Configuring Pull Through Cache

In this guide we will create a set of local caching Docker registry proxies to minimize local cluster startup time.

When running Talos locally, pulling images from Docker registries might take a significant amount of time. We spin up local caching pass-through registries to cache images and configure a local Talos cluster to use those proxies. A similar approach might be used to run Talos in production in air-gapped environments. It can be also used to verify that all the images are available in local registries.

Video Walkthrough

To see a live demo of this writeup, see the video below:

Requirements

The follow are requirements for creating the set of caching proxies:

Docker 18.03 or greater
Local cluster requirements for either docker or QEMU.

Launch the Caching Docker Registry Proxies

Talos pulls from docker.io, k8s.gcr.io, gcr.io, ghcr.io and quay.io by default. If your configuration is different, you might need to modify the commands below:

docker run -d -p 5000:5000 \
    -e REGISTRY_PROXY_REMOTEURL=https://registry-1.docker.io \
    --restart always \
    --name registry-docker.io registry:2

docker run -d -p 5001:5000 \
    -e REGISTRY_PROXY_REMOTEURL=https://k8s.gcr.io \
    --restart always \
    --name registry-k8s.gcr.io registry:2

docker run -d -p 5002:5000 \
    -e REGISTRY_PROXY_REMOTEURL=https://quay.io \
    --restart always \
    --name registry-quay.io registry:2.5

docker run -d -p 5003:5000 \
    -e REGISTRY_PROXY_REMOTEURL=https://gcr.io \
    --restart always \
    --name registry-gcr.io registry:2

docker run -d -p 5004:5000 \
    -e REGISTRY_PROXY_REMOTEURL=https://ghcr.io \
    --restart always \
    --name registry-ghcr.io registry:2

Note: Proxies are started as docker containers, and they’re automatically configured to start with Docker daemon. Please note that quay.io proxy doesn’t support recent Docker image schema, so we run older registry image version (2.5).

As a registry container can only handle a single upstream Docker registry, we launch a container per upstream, each on its own host port (5000, 5001, 5002, 5003 and 5004).

Using Caching Registries with `QEMU` Local Cluster

With a QEMU local cluster, a bridge interface is created on the host. As registry containers expose their ports on the host, we can use bridge IP to direct proxy requests.

sudo talosctl cluster create --provisioner qemu \
    --registry-mirror docker.io=http://10.5.0.1:5000 \
    --registry-mirror k8s.gcr.io=http://10.5.0.1:5001 \
    --registry-mirror quay.io=http://10.5.0.1:5002 \
    --registry-mirror gcr.io=http://10.5.0.1:5003 \
    --registry-mirror ghcr.io=http://10.5.0.1:5004

The Talos local cluster should now start pulling via caching registries. This can be verified via registry logs, e.g. docker logs -f registry-docker.io. The first time cluster boots, images are pulled and cached, so next cluster boot should be much faster.

Note: 10.5.0.1 is a bridge IP with default network (10.5.0.0/24), if using custom --cidr, value should be adjusted accordingly.

Using Caching Registries with `docker` Local Cluster

With a docker local cluster we can use docker bridge IP, default value for that IP is 172.17.0.1. On Linux, the docker bridge address can be inspected with ip addr show docker0.

talosctl cluster create --provisioner docker \
    --registry-mirror docker.io=http://172.17.0.1:5000 \
    --registry-mirror k8s.gcr.io=http://172.17.0.1:5001 \
    --registry-mirror quay.io=http://172.17.0.1:5002 \
    --registry-mirror gcr.io=http://172.17.0.1:5003 \
    --registry-mirror ghcr.io=http://172.17.0.1:5004

Cleaning Up

To cleanup, run:

docker rm -f registry-docker.io
docker rm -f registry-k8s.gcr.io
docker rm -f registry-quay.io
docker rm -f registry-gcr.io
docker rm -f registry-ghcr.io

Note: Removing docker registry containers also removes the image cache. So if you plan to use caching registries, keep the containers running.

7.8 - Configuring the Cluster Endpoint

In this section, we will step through the configuration of a Talos based Kubernetes cluster. There are three major components we will configure:

apid and talosctl
the master nodes
the worker nodes

Talos enforces a high level of security by using mutual TLS for authentication and authorization.

We recommend that the configuration of Talos be performed by a cluster owner. A cluster owner should be a person of authority within an organization, perhaps a director, manager, or senior member of a team. They are responsible for storing the root CA, and distributing the PKI for authorized cluster administrators.

Recommended settings

Talos runs great out of the box, but if you tweak some minor settings it will make your life a lot easier in the future. This is not a requirement, but rather a document to explain some key settings.

Endpoint

To configure the talosctl endpoint, it is recommended you use a resolvable DNS name. This way, if you decide to upgrade to a multi-controlplane cluster you only have to add the ip adres to the hostname configuration. The configuration can either be done on a Loadbalancer, or simply trough DNS.

For example:

This is in the config file for the cluster e.g. init.yaml, controlplane.yaml and join.yaml. for more details, please see: v1alpha1 endpoint configuration

.....
cluster:
  controlPlane:
    endpoint: https://endpoint.example.local:6443
.....

If you have a DNS name as the endpoint, you can upgrade your talos cluster with multiple controlplanes in the future (if you don’t have a multi-controlplane setup from the start) Using a DNS name generates the corresponding Certificates (Kubernetes and Talos) for the correct hostname.

7.9 - Configuring Wireguard Network

In this guide you will learn how to set up Wireguard network using Kernel module.

Configuring Wireguard Network

Quick Start

The quickest way to try out Wireguard is to use talosctl cluster create command:

talosctl cluster create --wireguard-cidr 10.1.0.0/24

It will automatically generate Wireguard network configuration for each node with the following network topology:

Where all controlplane nodes will be used as Wireguard servers which listen on port 51111. All controlplanes and workers will connect to all controlplanes. It also sets PersistentKeepalive to 5 seconds to establish controlplanes to workers connection.

After the cluster is deployed it should be possible to verify Wireguard network connectivity. It is possible to deploy a container with hostNetwork enabled, then do kubectl exec <container> /bin/bash and either do:

ping 10.1.0.2

Or install wireguard-tools package and run:

wg show

Wireguard show should output something like this:

interface: wg0
  public key: OMhgEvNIaEN7zeCLijRh4c+0Hwh3erjknzdyvVlrkGM=
  private key: (hidden)
  listening port: 47946

peer: 1EsxUygZo8/URWs18tqB5FW2cLVlaTA+lUisKIf8nh4=
  endpoint: 10.5.0.2:51111
  allowed ips: 10.1.0.0/24
  latest handshake: 1 minute, 55 seconds ago
  transfer: 3.17 KiB received, 3.55 KiB sent
  persistent keepalive: every 5 seconds

It is also possible to use generated configuration as a reference by pulling generated config files using:

talosctl read -n 10.5.0.2 /system/state/config.yaml > controlplane.yaml
talosctl read -n 10.5.0.3 /system/state/config.yaml > join.yaml

Manual Configuration

All Wireguard configuration can be done by changing Talos machine config files. As an example we will use this official Wireguard quick start tutorial.

Key Generation

This part is exactly the same:

wg genkey | tee privatekey | wg pubkey > publickey

Setting up Device

Inline comments show relations between configs and wg quickstart tutorial commands:

...
network:
  interfaces:
    ...
      # ip link add dev wg0 type wireguard
    - interface: wg0
      mtu: 1500
      # ip address add dev wg0 192.168.2.1/24
      cidr: 192.168.2.1/24
      # wg set wg0 listen-port 51820 private-key /path/to/private-key peer ABCDEF... allowed-ips 192.168.88.0/24 endpoint 209.202.254.14:8172
      wireguard:
        privateKey: <privatekey file contents>
        listenPort: 51820
        peers:
          allowedIPs:
            - 192.168.88.0/24
          endpoint: 209.202.254.14.8172
          publicKey: ABCDEF...
...

When networkd gets this configuration it will create the device, configure it and will bring it up (equivalent to ip link set up dev wg0).

All supported config parameters are described in the Machine Config Reference.

7.10 - Converting Control Plane

How to convert Talos self-hosted Kubernetes control plane (pre-0.9) to static pods based one.

Talos version 0.9 runs Kubernetes control plane in a new way: static pods managed by Talos. Talos version 0.8 and below runs self-hosted control plane. After Talos OS upgrade to version 0.9 Kubernetes control plane should be converted to run as static pods.

This guide describes automated conversion script and also shows detailed manual conversion process.

Video Walkthrough

To see a live demo of this writeup, see the video below:

Automated Conversion

First, make sure all nodes are updated to Talos 0.9:

$ kubectl get nodes -o wide
NAME                     STATUS   ROLES                  AGE   VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE         KERNEL-VERSION   CONTAINER-RUNTIME
talos-default-master-1   Ready    control-plane,master   58m   v1.20.4   172.20.0.2    <none>        Talos (v0.9.0)   5.10.19-talos    containerd://1.4.4
talos-default-master-2   Ready    control-plane,master   58m   v1.20.4   172.20.0.3    <none>        Talos (v0.9.0)   5.10.19-talos    containerd://1.4.4
talos-default-master-3   Ready    control-plane,master   58m   v1.20.4   172.20.0.4    <none>        Talos (v0.9.0)   5.10.19-talos    containerd://1.4.4
talos-default-worker-1   Ready    <none>                 58m   v1.20.4   172.20.0.5    <none>        Talos (v0.9.0)   5.10.19-talos    containerd://1.4.4

Start the conversion script:

$ talosctl -n <IP> convert-k8s
discovered master nodes ["172.20.0.2" "172.20.0.3" "172.20.0.4"]
current self-hosted status: true
gathering control plane configuration
aggregator CA key can't be recovered from bootkube-boostrapped control plane, generating new CA
patching master node "172.20.0.2" configuration
patching master node "172.20.0.3" configuration
patching master node "172.20.0.4" configuration
waiting for static pod definitions to be generated
waiting for manifests to be generated
Talos generated control plane static pod definitions and bootstrap manifests, please verify them with commands:
    talosctl -n <master node IP> get StaticPods.kubernetes.talos.dev
    talosctl -n <master node IP> get Manifests.kubernetes.talos.dev

in order to remove self-hosted control plane, pod-checkpointer component needs to be disabled
once pod-checkpointer is disabled, the cluster shouldn't be rebooted until the entire conversion process is complete
confirm disabling pod-checkpointer to proceed with control plane update [yes/no]:

Script stops at this point waiting for confirmation. Talos still runs self-hosted control plane, and static pods were not rendered yet.

As instructed by the script, please verify that static pod definitions are correct:

$ talosctl -n <IP> get staticpods -o yaml
node: 172.20.0.2
metadata:
    namespace: controlplane
    type: StaticPods.kubernetes.talos.dev
    id: kube-apiserver
    version: 1
    phase: running
spec:
    apiVersion: v1
    kind: Pod
    metadata:
        annotations:
            talos.dev/config-version: "2"
            talos.dev/secrets-version: "1"
        creationTimestamp: null
        labels:
            k8s-app: kube-apiserver
            tier: control-plane
        name: kube-apiserver
        namespace: kube-system
    spec:
        containers:
            - command:
...

Static pod definitions are generated from the machine configuration and should match pod template as generated by Talos on bootstrap of self-hosted control plane unless there were some manual changes applied to the daemonset specs after bootstrap. Talos patches the machine configuration with the container image versions scraped from the daemonset definition, fetches the service account key from Kubernetes secrets.

Aggregator CA can’t be recovered from the self-hosted control plane, so new CA gets generated. This is generally harmless and not visible from outside the cluster. The Aggregator CA is not the same CA as is used by Talos or Kubernetes standard API. It is a special PKI used for aggregating API extension services inside your cluster. If you have non-standard apiserver aggregations (fairly rare, and you should know if you do), then you may need to restart these services after the new CA is in place.

Verify that bootstrap manifests are correct:

$ talosctl -n <IP> get manifests --namespace controlplane
NODE         NAMESPACE      TYPE       ID                               VERSION
172.20.0.2   controlplane   Manifest   00-kubelet-bootstrapping-token   1
172.20.0.2   controlplane   Manifest   01-csr-approver-role-binding     1
172.20.0.2   controlplane   Manifest   01-csr-node-bootstrap            1
172.20.0.2   controlplane   Manifest   01-csr-renewal-role-binding      1
172.20.0.2   controlplane   Manifest   02-kube-system-sa-role-binding   1
172.20.0.2   controlplane   Manifest   03-default-pod-security-policy   1
172.20.0.2   controlplane   Manifest   10-kube-proxy                    1
172.20.0.2   controlplane   Manifest   11-core-dns                      1
172.20.0.2   controlplane   Manifest   11-core-dns-svc                  1
172.20.0.2   controlplane   Manifest   11-kube-config-in-cluster        1

$ talosctl -n <IP> get manifests --namespace=extras
NODE         NAMESPACE   TYPE       ID                                                        VERSION
172.20.0.2   extras      Manifest   05-https://docs.projectcalico.org/manifests/calico.yaml   1

Make sure that manifests and static pods are correct across all control plane nodes, as each node reconciles control plane state on its own. For example, CNI configuration in machine config should be in sync across all the nodes. Talos nodes try to create any missing Kubernetes resources from the manifests, but it never updates or deletes existing resources.

If something looks wrong, script can be aborted and machine configuration should be updated to fix the problem. Once configuration is updated, the script can be restarted.

If static pod definitions and manifests look good, confirm next step to disable pod-checkpointer:

$ talosctl -n <IP> convert-k8s
...
confirm disabling pod-checkpointer to proceed with control plane update [yes/no]: yes
disabling pod-checkpointer
deleting daemonset "pod-checkpointer"
checking for active pod checkpoints
2021/03/09 23:37:25 retrying error: found 3 active pod checkpoints: [pod-checkpointer-655gc-talos-default-master-3 pod-checkpointer-pw6mv-talos-default-master-1 pod-checkpointer-zdw9z-talos-default-master-2]
2021/03/09 23:42:25 retrying error: found 1 active pod checkpoints: [pod-checkpointer-pw6mv-talos-default-master-1]
confirm applying static pod definitions and manifests [yes/no]:

Self-hosted control plane runs pod-checkpointer to work around issues with control plane availability. It should be disabled before conversion starts to allow self-hosted control plane to be removed. It takes around 5 minutes for the pod-checkpointer to be fully disabled. Script verifies that all checkpoints are removed before proceeding.

This last confirmation before proceeding is at the point when there is no way to keep running self-hosted control plane: static pods are released, bootstrap manifests are applied, self-hosted control plane is removed.

$ talosctl -n <IP> convert-k8s
...
confirm applying static pod definitions and manifests [yes/no]: yes
removing self-hosted initialized key
waiting for static pods for "kube-apiserver" to be present in the API server state
waiting for static pods for "kube-controller-manager" to be present in the API server state
waiting for static pods for "kube-scheduler" to be present in the API server state
deleting daemonset "kube-apiserver"
waiting for static pods for "kube-apiserver" to be present in the API server state
deleting daemonset "kube-controller-manager"
waiting for static pods for "kube-controller-manager" to be present in the API server state
deleting daemonset "kube-scheduler"
waiting for static pods for "kube-scheduler" to be present in the API server state
conversion process completed successfully

As soon as the control plane static pods are rendered, the kubelet starts the control plane static pods. It is expected that the pods for kube-apiserver will crash initially. Only one kube-apiserver can be bound to the host Node’s port 6443 at a time. Eventually, the old kube-apiserver will be killed, and the new one will be able to start. This is all handled automatically. The script will continue by removing each self-hosted daemonset and verifying that static pods are ready and healthy.

Manual Conversion

Check that Talos runs self-hosted control plane:

$ talosctl -n <CONTROL_PLANE_IP> get bs
NODE         NAMESPACE   TYPE              ID              VERSION   SELF HOSTED
172.20.0.2   runtime     BootstrapStatus   control-plane   2         true

Talos machine configuration need to be updated to the 0.9 format; there are two new required machine configuration settings:

.cluster.serviceAccount is the service account PEM-encoded private key.
.cluster.aggregatorCA is the aggregator CA for kube-apiserver (certficiate and private key).

Current service account can be fetched from the Kubernetes secrets:

$ kubectl -n kube-system get secrets kube-controller-manager -o jsonpath='{.data.service\-account\.key}'
LS0tLS1CRUdJTiBSU0EgUFJJVkFURS...

All control plane node machine configurations should be patched with the service account key:

$ talosctl -n <CONTROL_PLANE_IP1>,<CONTROL_PLANE_IP2>,... patch mc --immediate -p '[{"op": "add", "path": "/cluster/serviceAccount", "value": {"key": "LS0tLS1CRUdJTiBSU0EgUFJJVkFURS..."}}]'
patched mc at the node 172.20.0.2

Aggregator CA can be generated using OpenSSL or any other certificate generation tools: RSA or ECDSA certificate with CN front-proxy valid for 10 years. PEM-encoded CA certificate and key should be base64-encoded and patched into the machine config at path /cluster/aggregatorCA:

$ talosctl -n <CONTROL_PLANE_IP1>,<CONTROL_PLANE_IP2>,... patch mc --immediate -p '[{"op": "add", "path": "/cluster/aggregatorCA", "value": {"crt": "S0tLS1CRUdJTiBDRVJUSUZJQ...", "key": "LS0tLS1CRUdJTiBFQy..."}}]'
patched mc at the node 172.20.0.2

At this point static pod definitions and bootstrap manifests should be rendered, please see “Automated Conversion” on how to verify generated objects. Feel free to continue to refine your machine configuration until the generated static pod definitions and bootstrap manifests look good.

If static pod definitions are not generated, check logs with talosctl -n <IP> logs controller-runtime.

Disable pod-checkpointer with:

$ kubectl -n kube-system delete ds pod-checkpointer
daemonset.apps "pod-checkpointer" deleted

Wait for all pod checkpoints to be removed:

$ kubectl -n kube-system get pods
NAME                                            READY   STATUS    RESTARTS   AGE
...
pod-checkpointer-8q2lh-talos-default-master-2   1/1     Running   0          3m34s
pod-checkpointer-nnm5w-talos-default-master-3   1/1     Running   0          3m24s
pod-checkpointer-qnmdt-talos-default-master-1   1/1     Running   0          2m21s

Pod checkpoints have annotation checkpointer.alpha.coreos.com/checkpoint-of.

Once all the pod checkpoints are removed (it takes 5 minutes for the checkpoints to be removed), proceed by removing self-hosted initialized key:

talosctl -n <CONTROL_PLANE_IP> convert-k8s --remove-initialized-key

Talos controllers will now render static pod definitions, and the kubelet will launch any resulting static pods.

Once static pods are visible in kubectl get pods -n kube-system output, proceed by removing each of the self-hosted daemonsets:

$ kubectl -n kube-system delete daemonset kube-apiserver
daemonset.apps "kube-apiserver" deleted

Make sure static pods for kube-apiserver got started successfully, pods are running and ready.

Proceed by deleting kube-controller-manager and kube-scheduler daemonsets, verifying that static pods are running between each step:

$ kubectl -n kube-system delete daemonset kube-controller-manager
daemonset.apps "kube-controller-manager" deleted

$ kubectl -n kube-system delete daemonset kube-scheduler
daemonset.apps "kube-scheduler" deleted

7.11 - Customizing the Kernel

FROM scratch AS customization
COPY --from=<custom kernel image> /lib/modules /lib/modules

FROM docker.io/andrewrynhard/installer:latest
COPY --from=<custom kernel image> /boot/vmlinuz /usr/install/vmlinuz

docker build --build-arg RM="/lib/modules" -t talos-installer .

Note: You can use the --squash flag to create smaller images.

Now that we have a custom installer we can build Talos for the specific platform we wish to deploy to.

7.12 - Customizing the Root Filesystem

The installer image contains ONBUILD instructions that handle the following:

the decompression, and unpacking of the initramfs.xz
the unsquashing of the rootfs
the copying of new rootfs files
the squashing of the new rootfs
and the packing, and compression of the new initramfs.xz

When used as a base image, the installer will perform the above steps automatically with the requirement that a customization stage be defined in the Dockerfile.

For example, say we have an image that contains the contents of a library we wish to add to the Talos rootfs. We need to define a stage with the name customization:

FROM scratch AS customization
COPY --from=<name|index> <src> <dest>

Using a multi-stage Dockerfile we can define the customization stage and build FROM the installer image:

FROM scratch AS customization
COPY --from=<name|index> <src> <dest>

FROM ghcr.io/talos-systems/installer:latest

When building the image, the customization stage will automatically be copied into the rootfs. The customization stage is not limited to a single COPY instruction. In fact, you can do whatever you would like in this stage, but keep in mind that everything in / will be copied into the rootfs.

Note: <dest> is the path relative to the rootfs that you wish to place the contents of <src>.

To build the image, run:

docker build --squash -t <organization>/installer:latest .

In the case that you need to perform some cleanup before adding additional files to the rootfs, you can specify the RM build-time variable:

docker build --squash --build-arg RM="[<path> ...]" -t <organization>/installer:latest .

This will perform a rm -rf on the specified paths relative to the rootfs.

Note: RM must be a whitespace delimited list.

The resulting image can be used to:

generate an image for any of the supported providers
perform bare-metall installs
perform upgrades

We will step through common customizations in the remainder of this section.

7.13 - Disk Encryption

Guide on using system disk encryption

It is possible to enable encryption for system disks at the OS level. As of this writing, only STATE and EPHEMERAL partitions can be encrypted. STATE contains the most sensitive node data: secrets and certs. EPHEMERAL partition may contain some sensitive workload data. Data is encrypted using LUKS2, which is provided by the Linux kernel modules and cryptsetup utility. The operating system will run additional setup steps when encryption is enabled.

If the disk encryption is enabled for the STATE partition, the system will:

Save STATE encryption config as JSON in the META partition.
Before mounting the STATE partition, load encryption configs either from the machine config or from the META partition. Note that the machine config is always preferred over the META one.
Before mounting the STATE partition, format and encrypt it. This occurs only if the STATE partition is empty and has no filesystem.

If the disk encryption is enabled for the EPHEMERAL partition, the system will:

Get the encryption config from the machine config.
Before mounting the EPHEMERAL partition, encrypt and format it. This occurs only if the EPHEMERAL partition is empty and has no filesystem.

Configuration

Right now this encryption is disabled by default. To enable disk encryption you should modify the machine configuration with the following options:

machine:
  ...
  systemDiskEncryption:
    ephemeral:
      keys:
        - nodeID: {}
          slot: 0
    state:
      keys:
        - nodeID: {}
          slot: 0

Encryption Keys

Note: What the LUKS2 docs call “keys” are, in reality, a passphrase. When this passphrase is added, LUKS2 runs argon2 to create an actual key from that passphrase.

LUKS2 supports up to 32 encryption keys and it is possible to specify all of them in the machine configuration. Talos always tries to sync the keys list defined in the machine config with the actual keys defined for the LUKS2 partition. So if you update the keys list you should have at least one key that is not changed to be used for keys management.

When you define a key you should specify the key kind and the slot:

machine:
  ...
  state:
    keys:
      - nodeID: {} # key kind
        slot: 1

  ephemeral:
    keys:
      - static:
          passphrase: supersecret
        slot: 0

Take a note that key order does not play any role on which key slot is used. Every key must always have a slot defined.

Encryption Key Kinds

Talos supports two kinds of keys:

nodeID which is generated using the node UUID and the partition label (note that if the node UUID is not really random it will fail the entropy check).
static which you define right in the configuration.

Note: Use static keys only if your STATE partition is encrypted and only for the EPHEMERAL partition. For the STATE partition it will be stored in the META partition, which is not encrypted.

Key Rotation

It is necessary to do talosctl apply-config a couple of times to rotate keys, since there is a need to always maintain a single working key while changing the other keys around it.

So, for example, first add a new key:

machine:
  ...
  ephemeral:
    keys:
      - static:
          passphrase: oldkey
        keySlot: 0
      - static:
          passphrase: newkey
        keySlot: 1
  ...

Run:

talosctl apply-config -n <node> -f config.yaml

Then remove the old key:

machine:
  ...
  ephemeral:
    keys:
      - static:
          passphrase: newkey
        keySlot: 1
  ...

Run:

talosctl apply-config -n <node> -f config.yaml

Going from Unencrypted to Encrypted and Vice Versa

Ephemeral Partition

There is no in-place encryption support for the partitions right now, so to avoid losing any data only empty partitions can be encrypted.

As such, migration from unencrypted to encrypted needs some additional handling, especially around explicitly wiping partitions.

apply-config should be called with --on-reboot flag.
Partition should be wiped after apply-config, but before the reboot.

Edit your machine config and add the encryption configuration:

vim config.yaml

Apply the configuration with --on-reboot flag:

talosctl apply-config -f config.yaml -n <node ip> --on-reboot

Wipe the partition you’re going to encrypt:

talosctl reset --system-labels-to-wipe EPHEMERAL -n <node ip> --reboot=true

That’s it! After you run the last command, the partition will be wiped and the node will reboot. During the next boot the system will encrypt the partition.

State Partition

Calling wipe against the STATE partition will make the node lose the config, so the previous flow is not going to work.

The flow should be to first wipe the STATE partition:

talosctl reset  --system-labels-to-wipe STATE -n <node ip> --reboot=true

Node will enter into maintenance mode, then run apply-config with --insecure flag:

talosctl apply-config --insecure -n <node ip> -f config.yaml

After installation is complete the node should encrypt the STATE partition.

7.14 - Editing Machine Configuration

How to edit and patch Talos machine configuration, with reboot, immediately, or stage update on reboot.

Talos node state is fully defined by machine configuration. Initial configuration is delivered to the node at bootstrap time, but configuration can be updated while the node is running.

Note: Be sure that config is persisted so that configuration updates are not overwritten on reboots. Configuration persistence was enabled by default since Talos 0.5 (persist: true in machine configuration).

There are three talosctl commands which facilitate machine configuration updates:

talosctl apply-config to apply configuration from the file
talosctl edit machineconfig to launch an editor with existing node configuration, make changes and apply configuration back
talosctl patch machineconfig to apply automated machine configuration via JSON patch

Each of these commands can operate in one of three modes:

apply change with a reboot (default): update configuration, reboot Talos node to apply configuration change
apply change immediately (--immediate flag): change is applied immediately without a reboot, only .cluster sub-tree of the machine configuration can be updated in Talos 0.9
apply change on next reboot (--on-reboot): change is staged to be applied after a reboot, but node is not rebooted

Note: applying change on next reboot (--on-reboot) doesn’t modify current node configuration, so next call to talosctl edit machineconfig --on-reboot will not see changes

`talosctl apply-config`

This command is mostly used to submit initial machine configuration to the node (generated by talosctl gen config). It can be used to apply new configuration from the file to the running node as well, but most of the time it’s not convenient, as it doesn’t operate on the current node machine configuration.

Example:

talosctl -n <IP> apply-config -f config.yaml

Command apply-config can also be invoked as apply machineconfig:

talosctl -n <IP> apply machineconfig -f config.yaml

Applying machine configuration immediately (without a reboot):

talosctl -n IP apply machineconfig -f config.yaml --immediate

`taloctl edit machineconfig`

Command talosctl edit loads current machine configuration from the node and launches configured editor to modify the config. If config hasn’t been changed in the editor (or if updated config is empty), update is not applied.

Note: Talos uses environment variables TALOS_EDITOR, EDITOR to pick up the editor preference. If environment variables are missing, vi editor is used by default.

Example:

talosctl -n <IP> edit machineconfig

Configuration can be edited for multiple nodes if multiple IP addresses are specified:

talosctl -n <IP1>,<IP2>,... edit machineconfig

Applying machine configuration change immediately (without a reboot):

talosctl -n <IP> edit machineconfig --immediate

`talosctl patch machineconfig`

Command talosctl patch works similar to talosctl edit command - it loads current machine configuration, but instead of launching configured editor it applies JSON patch to the configuration and writes result back to the node.

Example, updating kubelet version (with a reboot):

$ talosctl -n <IP> patch machineconfig -p '[{"op": "replace", "path": "/machine/kubelet/image", "value": "ghcr.io/talos-systems/kubelet:v1.20.5"}]'
patched mc at the node <IP>

Updating kube-apiserver version in immediate mode (without a reboot):

$ talosctl -n <IP> patch machineconfig --immediate -p '[{"op": "replace", "path": "/cluster/apiServer/image", "value": "k8s.gcr.io/kube-apiserver:v1.20.5"}]'
patched mc at the node <IP>

Patch might be applied to multiple nodes when multiple IPs are specified:

taloctl -n <IP1>,<IP2>,... patch machineconfig --immediate -p '[{...}]'

Recovering from Node Boot Failures

If a Talos node fails to boot because of wrong configuration (for example, control plane endpoint is incorrect), configuration can be updated to fix the issue. If the boot sequence is still running, Talos might refuse applying config in default mode. In that case --on-reboot mode can be used coupled with talosctl reboot command to trigger a reboot and apply configuration update.

7.15 - Managing PKI

Generating an Administrator Key Pair

In order to create a key pair, you will need the root CA.

Save the CA public key, and CA private key as ca.crt, and ca.key respectively. Now, run the following commands to generate a certificate:

talosctl gen key --name admin
talosctl gen csr --key admin.key --ip 127.0.0.1
talosctl gen crt --ca ca --csr admin.csr --name admin

Now, base64 encode admin.crt, and admin.key:

cat admin.crt | base64
cat admin.key | base64

You can now set the crt and key fields in the talosconfig to the base64 encoded strings.

Renewing an Expired Administrator Certificate

In order to renew the certificate, you will need the root CA, and the admin private key. The base64 encoded key can be found in any one of the control plane node’s configuration file. Where it is exactly will depend on the specific version of the configuration file you are using.

Save the CA public key, CA private key, and admin private key as ca.crt, ca.key, and admin.key respectively. Now, run the following commands to generate a certificate:

talosctl gen csr --key admin.key --ip 127.0.0.1
talosctl gen crt --ca ca --csr admin.csr --name admin

You should see admin.crt in your current directory. Now, base64 encode admin.crt:

cat admin.crt | base64

You can now set the certificate in the talosconfig to the base64 encoded string.

7.16 - Resetting a Machine

From time to time, it may be beneficial to reset a Talos machine to its “original” state. Bear in mind that this is a destructive action for the given machine. Doing this means removing the machine from Kubernetes, Etcd (if applicable), and clears any data on the machine that would normally persist a reboot.

The API command for doing this is talosctl reset. There are a couple of flags as part of this command:

Flags:
      --graceful   if true, attempt to cordon/drain node and leave etcd (if applicable) (default true)
      --reboot     if true, reboot the node after resetting instead of shutting down

The graceful flag is especially important when considering HA vs. non-HA Talos clusters. If the machine is part of an HA cluster, a normal, graceful reset should work just fine right out of the box as long as the cluster is in a good state. However, if this is a single node cluster being used for testing purposes, a graceful reset is not an option since Etcd cannot be “left” if there is only a single member. In this case, reset should be used with --graceful=false to skip performing checks that would normally block the reset.

7.17 - Storage

Talos is known to work with Rook and NFS.

Rook

We recommend at least Rook v1.5.

NFS

The NFS client is part of the kubelet image maintained by the Talos team. This means that the version installed in your running kubelet is the version of NFS supported by Talos.

7.18 - Troubleshooting Control Plane

Troubleshoot control plane failures for running cluster and bootstrap process.

This guide is written as series of topics and detailed answers for each topic. It starts with basics of control plane and goes into Talos specifics.

This document mostly applies only to Talos 0.9 control plane based on static pods. If Talos was upgraded from version 0.8, it might be still running self-hosted control plane, current status can be checked with the command talosctl get bootstrapstatus:

$ talosctl -n <IP> get bs
NODE         NAMESPACE   TYPE              ID              VERSION   SELF HOSTED
172.20.0.2   runtime     BootstrapStatus   control-plane   1         false

In this guide we assume that Talos client config is available and Talos API access is available. Kubernetes client configuration can be pulled from control plane nodes with talosctl -n <IP> kubeconfig (this command works before Kubernetes is fully booted).

What is a control plane node?

Talos nodes which have .machine.type of init and controlplane are control plane nodes.

The only difference between init and controlplane nodes is that init node automatically bootstraps a single-node etcd cluster on a first boot if the etcd data directory is empty. A node with type init can be replaced with a controlplane node which is triggered to run etcd bootstrap with talosctl --nodes <IP> bootstrap command.

Use of init type nodes is discouraged, as it might lead to split-brain scenario if one node in existing cluster is reinstalled while config type is still init.

It is critical to make sure only one control plane runs in bootstrap mode (either with node type init or via bootstrap API/talosctl bootstrap), as having more than node in bootstrap mode leads to split-brain scenario (multiple etcd clusters are built instead of a single cluster).

What is special about control plane node?

Control plane nodes in Talos run etcd which provides data store for Kubernetes and Kubernetes control plane components (kube-apiserver, kube-controller-manager and kube-scheduler).

Control plane nodes are tainted by default to prevent workloads from being scheduled to control plane nodes.

How many control plane nodes should be deployed?

With a single control plane node, cluster is not HA: if that single node experiences hardware failure, cluster control plane is broken and can’t be recovered. Single control plane node clusters are still used as test clusters and in edge deployments, but it should be noted that this setup is not HA.

Number of control plane should be odd (1, 3, 5, …), as with even number of nodes, etcd quorum doesn’t tolerate failures correctly: e.g. with 2 control plane nodes quorum is 2, so failure of any node breaks quorum, so this setup is almost equivalent to single control plane node cluster.

With three control plane nodes cluster can tolerate a failure of any single control plane node. With five control plane nodes cluster can tolerate failure of any two control plane nodes.

What is control plane endpoint?

Kubernetes requires having a control plane endpoint which points to any healthy API server running on a control plane node. Control plane endpoint is specified as URL like https://endpoint:6443/. At any point in time, even during failures control plane endpoint should point to a healthy API server instance. As kube-apiserver runs with host network, control plane endpoint should point to one of the control plane node IPs: node1:6443, node2:6443, …

For single control plane node clusters, control plane endpoint might be https://IP:6443/ or https://DNS:6443/, where IP is the IP of the control plane node and DNS points to IP. DNS form of the endpoint allows to change the IP address of the control plane if that IP changes over time.

For HA clusters, control plane can be implemented as:

TCP L7 loadbalancer with active health checks against port 6443
round-robin DNS with active health checks against port 6443
BGP anycast IP with health checks
virtual shared L2 IP

It is critical that control plane endpoint works correctly during cluster bootstrap phase, as nodes discover each other using control plane endpoint.

kubelet is not running on control plane node

Service kubelet should be running on control plane node as soon as networking is configured:

$ talosctl -n <IP> service kubelet
NODE     172.20.0.2
ID       kubelet
STATE    Running
HEALTH   OK
EVENTS   [Running]: Health check successful (2m54s ago)
         [Running]: Health check failed: Get "http://127.0.0.1:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused (3m4s ago)
         [Running]: Started task kubelet (PID 2334) for container kubelet (3m6s ago)
         [Preparing]: Creating service runner (3m6s ago)
         [Preparing]: Running pre state (3m15s ago)
         [Waiting]: Waiting for service "timed" to be "up" (3m15s ago)
         [Waiting]: Waiting for service "cri" to be "up", service "timed" to be "up" (3m16s ago)
         [Waiting]: Waiting for service "cri" to be "up", service "networkd" to be "up", service "timed" to be "up" (3m18s ago)

If kubelet is not running, it might be caused by wrong configuration, check kubelet logs with talosctl logs:

$ talosctl -n <IP> logs kubelet
172.20.0.2: I0305 20:45:07.756948    2334 controller.go:101] kubelet config controller: starting controller
172.20.0.2: I0305 20:45:07.756995    2334 controller.go:267] kubelet config controller: ensuring filesystem is set up correctly
172.20.0.2: I0305 20:45:07.757000    2334 fsstore.go:59] kubelet config controller: initializing config checkpoints directory "/etc/kubernetes/kubelet/store"

etcd is not running on bootstrap node

etcd should be running on bootstrap node immediately (bootstrap node is either init node or controlplane node after talosctl bootstrap command was issued). When node boots for the first time, etcd data directory /var/lib/etcd directory is empty and Talos launches etcd in a mode to build the initial cluster of a single node. At this time /var/lib/etcd directory becomes non-empty and etcd runs as usual.

If etcd is not running, check service etcd state:

$ talosctl -n <IP> service etcd
NODE     172.20.0.2
ID       etcd
STATE    Running
HEALTH   OK
EVENTS   [Running]: Health check successful (3m21s ago)
         [Running]: Started task etcd (PID 2343) for container etcd (3m26s ago)
         [Preparing]: Creating service runner (3m26s ago)
         [Preparing]: Running pre state (3m26s ago)
         [Waiting]: Waiting for service "cri" to be "up", service "networkd" to be "up", service "timed" to be "up" (3m26s ago)

If service is stuck in Preparing state for bootstrap node, it might be related to slow network - at this stage Talos pulls etcd image from the container registry.

If etcd service is crashing and restarting, check service logs with talosctl -n <IP> logs etcd. Most common reasons for crashes are:

wrong arguments passed via extraArgs in the configuration;
booting Talos on non-empty disk with previous Talos installation, /var/lib/etcd contains data from old cluster.

etcd is not running on non-bootstrap control plane node

Service etcd on non-bootstrap control plane node waits for Kubernetes to boot successfully on bootstrap node to find other peers to build a cluster. As soon as bootstrap node boots Kubernetes control plane components, and kubectl get endpoints returns IP of bootstrap control plane node, other control plane nodes will start joining the cluster followed by Kubernetes control plane components on each control plane node.

Kubernetes static pod definitions are not generated

Talos should write down static pod definitions for the Kubernetes control plane:

$ talosctl -n <IP> ls /etc/kubernetes/manifests
NODE         NAME
172.20.0.2   .
172.20.0.2   talos-kube-apiserver.yaml
172.20.0.2   talos-kube-controller-manager.yaml
172.20.0.2   talos-kube-scheduler.yaml

If static pod definitions are not rendered, check etcd and kubelet service health (see above), and controller runtime logs (talosctl logs controller-runtime).

Talos prints error `an error on the server ("") has prevented the request from succeeding`

This is expected during initial cluster bootstrap and sometimes after a reboot:

[   70.093289] [talos] task labelNodeAsMaster (1/1): starting
[   80.094038] [talos] retrying error: an error on the server ("") has prevented the request from succeeding (get nodes talos-default-master-1)

Initially kube-apiserver component is not running yet, and it takes some time before it becomes fully up during bootstrap (image should be pulled from the Internet, etc.) Once control plane endpoint is up Talos should proceed.

If Talos doesn’t proceed further, it might be a configuration issue.

In any case, status of control plane components can be checked with talosctl containers -k:

$ talosctl -n <IP> containers --kubernetes
NODE         NAMESPACE   ID                                                                                      IMAGE                                        PID    STATUS
172.20.0.2   k8s.io      kube-system/kube-apiserver-talos-default-master-1                                       k8s.gcr.io/pause:3.2                         2539   SANDBOX_READY
172.20.0.2   k8s.io      └─ kube-system/kube-apiserver-talos-default-master-1:kube-apiserver                     k8s.gcr.io/kube-apiserver:v1.20.4            2572   CONTAINER_RUNNING

If kube-apiserver shows as CONTAINER_EXITED, it might have exited due to configuration error. Logs can be checked with taloctl logs --kubernetes (or with -k as a shorthand):

$ talosctl -n <IP> logs -k kube-system/kube-apiserver-talos-default-master-1:kube-apiserver
172.20.0.2: 2021-03-05T20:46:13.133902064Z stderr F 2021/03/05 20:46:13 Running command:
172.20.0.2: 2021-03-05T20:46:13.133933824Z stderr F Command env: (log-file=, also-stdout=false, redirect-stderr=true)
172.20.0.2: 2021-03-05T20:46:13.133938524Z stderr F Run from directory:
172.20.0.2: 2021-03-05T20:46:13.13394154Z stderr F Executable path: /usr/local/bin/kube-apiserver
...

Talos prints error `nodes "talos-default-master-1" not found`

This error means that kube-apiserver is up, and control plane endpoint is healthy, but kubelet hasn’t got its client certificate yet and wasn’t able to register itself.

For the kubelet to get its client certificate, following conditions should apply:

control plane endpoint is healthy (kube-apiserver is running)
bootstrap manifests got successfully deployed (for CSR auto-approval)
kube-controller-manager is running

CSR state can be checked with kubectl get csr:

$ kubectl get csr
NAME        AGE   SIGNERNAME                                    REQUESTOR                 CONDITION
csr-jcn9j   14m   kubernetes.io/kube-apiserver-client-kubelet   system:bootstrap:q9pyzr   Approved,Issued
csr-p6b9q   14m   kubernetes.io/kube-apiserver-client-kubelet   system:bootstrap:q9pyzr   Approved,Issued
csr-sw6rm   14m   kubernetes.io/kube-apiserver-client-kubelet   system:bootstrap:q9pyzr   Approved,Issued
csr-vlghg   14m   kubernetes.io/kube-apiserver-client-kubelet   system:bootstrap:q9pyzr   Approved,Issued

Talos prints error `node not ready`

Node in Kubernetes is marked as Ready once CNI is up. It takes a minute or two for the CNI images to be pulled and for the CNI to start. If the node is stuck in this state for too long, check CNI pods and logs with kubectl, usually CNI resources are created in kube-system namespace. For example, for Talos default Flannel CNI:

$ kubectl -n kube-system get pods
NAME                                             READY   STATUS    RESTARTS   AGE
...
kube-flannel-25drx                               1/1     Running   0          23m
kube-flannel-8lmb6                               1/1     Running   0          23m
kube-flannel-gl7nx                               1/1     Running   0          23m
kube-flannel-jknt9                               1/1     Running   0          23m
...

Talos prints error `x509: certificate signed by unknown authority`

Full error might look like:

x509: certificate signed by unknown authority (possiby because of crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes"

Commonly, the control plane endpoint points to a different cluster, as the client certificate generated by Talos doesn’t match CA of the cluster at control plane endpoint.

etcd is running on bootstrap node, but stuck in `pre` state on non-bootstrap nodes

Please see question etcd is not running on non-bootstrap control plane node.

Checking `kube-controller-manager` and `kube-scheduler`

If control plane endpoint is up, status of the pods can be performed with kubectl:

$ kubectl get pods -n kube-system -l k8s-app=kube-controller-manager
NAME                                             READY   STATUS    RESTARTS   AGE
kube-controller-manager-talos-default-master-1   1/1     Running   0          28m
kube-controller-manager-talos-default-master-2   1/1     Running   0          28m
kube-controller-manager-talos-default-master-3   1/1     Running   0          28m

If control plane endpoint is not up yet, container status can be queried with talosctl containers --kubernetes:

$ talosctl -n <IP> c -k
NODE         NAMESPACE   ID                                                                                      IMAGE                                        PID    STATUS
...
172.20.0.2   k8s.io      kube-system/kube-controller-manager-talos-default-master-1                              k8s.gcr.io/pause:3.2                         2547   SANDBOX_READY
172.20.0.2   k8s.io      └─ kube-system/kube-controller-manager-talos-default-master-1:kube-controller-manager   k8s.gcr.io/kube-controller-manager:v1.20.4   2580   CONTAINER_RUNNING
172.20.0.2   k8s.io      kube-system/kube-scheduler-talos-default-master-1                                       k8s.gcr.io/pause:3.2                         2638   SANDBOX_READY
172.20.0.2   k8s.io      └─ kube-system/kube-scheduler-talos-default-master-1:kube-scheduler                     k8s.gcr.io/kube-scheduler:v1.20.4            2670   CONTAINER_RUNNING
...

If some of the containers are not running, it could be that image is still being pulled. Otherwise process might crashing, in that case logs can be checked with talosctl logs --kubernetes <containerID>:

$ talosctl -n <IP> logs -k kube-system/kube-controller-manager-talos-default-master-1:kube-controller-manager
172.20.0.3: 2021-03-09T13:59:34.291667526Z stderr F 2021/03/09 13:59:34 Running command:
172.20.0.3: 2021-03-09T13:59:34.291702262Z stderr F Command env: (log-file=, also-stdout=false, redirect-stderr=true)
172.20.0.3: 2021-03-09T13:59:34.291707121Z stderr F Run from directory:
172.20.0.3: 2021-03-09T13:59:34.291710908Z stderr F Executable path: /usr/local/bin/kube-controller-manager
172.20.0.3: 2021-03-09T13:59:34.291719163Z stderr F Args (comma-delimited): /usr/local/bin/kube-controller-manager,--allocate-node-cidrs=true,--cloud-provider=,--cluster-cidr=10.244.0.0/16,--service-cluster-ip-range=10.96.0.0/12,--cluster-signing-cert-file=/system/secrets/kubernetes/kube-controller-manager/ca.crt,--cluster-signing-key-file=/system/secrets/kubernetes/kube-controller-manager/ca.key,--configure-cloud-routes=false,--kubeconfig=/system/secrets/kubernetes/kube-controller-manager/kubeconfig,--leader-elect=true,--root-ca-file=/system/secrets/kubernetes/kube-controller-manager/ca.crt,--service-account-private-key-file=/system/secrets/kubernetes/kube-controller-manager/service-account.key,--profiling=false
172.20.0.3: 2021-03-09T13:59:34.293870359Z stderr F 2021/03/09 13:59:34 Now listening for interrupts
172.20.0.3: 2021-03-09T13:59:34.761113762Z stdout F I0309 13:59:34.760982      10 serving.go:331] Generated self-signed cert in-memory
...

Checking controller runtime logs

Talos runs a set of controllers which work on resources to build and support Kubernetes control plane.

Some debugging information can be queried from the controller logs with talosctl logs controller-runtime:

$ talosctl -n <IP> logs controller-runtime
172.20.0.2: 2021/03/09 13:57:11  secrets.EtcdController: controller starting
172.20.0.2: 2021/03/09 13:57:11  config.MachineTypeController: controller starting
172.20.0.2: 2021/03/09 13:57:11  k8s.ManifestApplyController: controller starting
172.20.0.2: 2021/03/09 13:57:11  v1alpha1.BootstrapStatusController: controller starting
172.20.0.2: 2021/03/09 13:57:11  v1alpha1.TimeStatusController: controller starting
...

Controllers run reconcile loop, so they might be starting, failing and restarting, that is expected behavior. Things to look for:

v1alpha1.BootstrapStatusController: bootkube initialized status not found: control plane is not self-hosted, running with static pods.

k8s.KubeletStaticPodController: writing static pod "/etc/kubernetes/manifests/talos-kube-apiserver.yaml": static pod definitions were rendered successfully.

k8s.ManifestApplyController: controller failed: error creating mapping for object /v1/Secret/bootstrap-token-q9pyzr: an error on the server ("") has prevented the request from succeeding: control plane endpoint is not up yet, bootstrap manifests can’t be injected, controller is going to retry.

k8s.KubeletStaticPodController: controller failed: error refreshing pod status: error fetching pod status: an error on the server ("Authorization error (user=apiserver-kubelet-client, verb=get, resource=nodes, subresource=proxy)") has prevented the request from succeeding: kubelet hasn’t been able to contact kube-apiserver yet to push pod status, controller is going to retry.

k8s.ManifestApplyController: created rbac.authorization.k8s.io/v1/ClusterRole/psp:privileged: one of the bootstrap manifests got successfully applied.

secrets.KubernetesController: controller failed: missing cluster.aggregatorCA secret: Talos is running with 0.8 configuration, if the cluster was upgraded from 0.8, this is expected, and conversion process will fix machine config automatically. If this cluster was bootstrapped with version 0.9, machine configuration should be regenerated with 0.9 talosctl.

If there are no new messages in controller-runtime log, it means that controllers finished reconciling successfully.

Checking static pod definitions

Talos generates static pod definitions for kube-apiserver, kube-controller-manager, and kube-scheduler components based on machine configuration. These definitions can be checked as resources with talosctl get staticpods:

$ talosctl -n <IP> get staticpods -o yaml
get staticpods -o yaml
node: 172.20.0.2
metadata:
    namespace: controlplane
    type: StaticPods.kubernetes.talos.dev
    id: kube-apiserver
    version: 2
    phase: running
    finalizers:
        - k8s.StaticPodStatus("kube-apiserver")
spec:
    apiVersion: v1
    kind: Pod
    metadata:
        annotations:
            talos.dev/config-version: "1"
            talos.dev/secrets-version: "1"
        creationTimestamp: null
        labels:
            k8s-app: kube-apiserver
            tier: control-plane
        name: kube-apiserver
        namespace: kube-system
...

Status of the static pods can queried with talosctl get staticpodstatus:

$ talosctl -n <IP> get staticpodstatus
NODE         NAMESPACE      TYPE              ID                                                           VERSION   READY
172.20.0.2   controlplane   StaticPodStatus   kube-system/kube-apiserver-talos-default-master-1            1         True
172.20.0.2   controlplane   StaticPodStatus   kube-system/kube-controller-manager-talos-default-master-1   1         True
172.20.0.2   controlplane   StaticPodStatus   kube-system/kube-scheduler-talos-default-master-1            1         True

Most important status is Ready printed as last column, complete status can be fetched by adding -o yaml flag.

Checking bootstrap manifests

As part of bootstrap process, Talos injects bootstrap manifests into Kubernetes API server. There are two kinds of manifests: system manifests built-in into Talos and extra manifests downloaded (custom CNI, extra manifests in the machine config):

$ talosctl -n <IP> get manifests --namespace=controlplane
NODE         NAMESPACE      TYPE       ID                               VERSION
172.20.0.2   controlplane   Manifest   00-kubelet-bootstrapping-token   1
172.20.0.2   controlplane   Manifest   01-csr-approver-role-binding     1
172.20.0.2   controlplane   Manifest   01-csr-node-bootstrap            1
172.20.0.2   controlplane   Manifest   01-csr-renewal-role-binding      1
172.20.0.2   controlplane   Manifest   02-kube-system-sa-role-binding   1
172.20.0.2   controlplane   Manifest   03-default-pod-security-policy   1
172.20.0.2   controlplane   Manifest   10-kube-proxy                    1
172.20.0.2   controlplane   Manifest   11-core-dns                      1
172.20.0.2   controlplane   Manifest   11-core-dns-svc                  1
172.20.0.2   controlplane   Manifest   11-kube-config-in-cluster        1

$ talosctl -n <IP> get manifests --namespace=extras
NODE         NAMESPACE   TYPE       ID                                                        VERSION
172.20.0.2   extras      Manifest   05-https://docs.projectcalico.org/manifests/calico.yaml   1

Details of each manifests can be queried by adding -o yaml:

$ talosctl -n <IP> get manifests 01-csr-approver-role-binding --namespace=controlplane -o yaml
node: 172.20.0.2
metadata:
    namespace: controlplane
    type: Manifests.kubernetes.talos.dev
    id: 01-csr-approver-role-binding
    version: 1
    phase: running
spec:
    - apiVersion: rbac.authorization.k8s.io/v1
      kind: ClusterRoleBinding
      metadata:
        name: system-bootstrap-approve-node-client-csr
      roleRef:
        apiGroup: rbac.authorization.k8s.io
        kind: ClusterRole
        name: system:certificates.k8s.io:certificatesigningrequests:nodeclient
      subjects:
        - apiGroup: rbac.authorization.k8s.io
          kind: Group
          name: system:bootstrappers

Worker node is stuck with `apid` health check failures

Control plane nodes have enough secret material to generate apid server certificates, but worker nodes depend on control plane trustd services to generate certificates. Worker nodes wait for kubelet to join the cluster, then apid queries Kubernetes endpoints via control plane endpoint to find trustd endpoints, and use trustd to issue the certficiate.

So if apid health checks is failing on worker node:

make sure control plane endpoint is healthy
check that worker node kubelet joined the cluster

7.19 - Upgrading Kubernetes

This guide covers Kubernetes control plane upgrade for clusters running Talos-managed control plane. If the cluster is still running self-hosted control plane (after upgrade from Talos 0.8), please refer to 0.8 docs.

Video Walkthrough

To see a live demo of this writeup, see the video below:

Automated Kubernetes Upgrade

To upgrade from Kubernetes v1.20.1 to v1.20.4 run:

$ talosctl --nodes <master node> upgrade-k8s --from 1.20.1 --to 1.20.4
discovered master nodes ["172.20.0.2" "172.20.0.3" "172.20.0.4"]
updating "kube-apiserver" to version "1.20.4"
 > updating node "172.20.0.2"
2021/03/09 19:55:01 retrying error: config version mismatch: got "2", expected "3"
 > updating node "172.20.0.3"
2021/03/09 19:55:05 retrying error: config version mismatch: got "2", expected "3"
 > updating node "172.20.0.4"
2021/03/09 19:55:07 retrying error: config version mismatch: got "2", expected "3"
updating "kube-controller-manager" to version "1.20.4"
 > updating node "172.20.0.2"
2021/03/09 19:55:27 retrying error: config version mismatch: got "2", expected "3"
 > updating node "172.20.0.3"
2021/03/09 19:55:47 retrying error: config version mismatch: got "2", expected "3"
 > updating node "172.20.0.4"
2021/03/09 19:56:07 retrying error: config version mismatch: got "2", expected "3"
updating "kube-scheduler" to version "1.20.4"
 > updating node "172.20.0.2"
2021/03/09 19:56:27 retrying error: config version mismatch: got "2", expected "3"
 > updating node "172.20.0.3"
2021/03/09 19:56:47 retrying error: config version mismatch: got "2", expected "3"
 > updating node "172.20.0.4"
2021/03/09 19:57:08 retrying error: config version mismatch: got "2", expected "3"
updating daemonset "kube-proxy" to version "1.20.4"

Script runs in two phases:

In the first phase every control plane node machine configuration is patched with new image version for each control plane component. Talos renders new static pod definition on configuration update which is picked up by the kubelet. Script waits for the change to propagate to the API server state. Messages config version mismatch indicate that script is waiting for the updated container to be registered in the API server.
In the second phase script updates kube-proxy daemonset with the new image version.

If script fails for any reason, it can be safely restarted to continue upgrade process.

Manual Kubernetes Upgrade

Kubernetes can be upgraded manually as well by following the steps outlined below. They are equivalent to the steps performed by the talosctl upgrade-k8s command.

Kubeconfig

In order to edit the control plane, we will need a working kubectl config. If you don’t already have one, you can get one by running:

talosctl --nodes <master node> kubeconfig

API Server

Patch machine configuration using talosctl patch command:

$ talosctl -n <CONTROL_PLANE_IP_1> patch mc --immediate -p '[{"op": "replace", "path": "/cluster/apiServer/image", "value": "k8s.gcr.io/kube-apiserver:v1.20.4"}]'
patched mc at the node 172.20.0.2

JSON patch might need to be adjusted if current machine configuration is missing .cluster.apiServer.image key.

Also machine configuration can be edited manually with talosctl -n <IP> edit mc --immediate.

Capture new version of kube-apiserver config with:

$ talosctl -n <CONTROL_PLANE_IP_1> get kcpc kube-apiserver -o yaml
node: 172.20.0.2
metadata:
    namespace: config
    type: KubernetesControlPlaneConfigs.config.talos.dev
    id: kube-apiserver
    version: 5
    phase: running
spec:
    image: k8s.gcr.io/kube-apiserver:v1.20.4
    cloudProvider: ""
    controlPlaneEndpoint: https://172.20.0.1:6443
    etcdServers:
        - https://127.0.0.1:2379
    localPort: 6443
    serviceCIDR: 10.96.0.0/12
    extraArgs: {}
    extraVolumes: []

In this example, new version is 5. Wait for the new pod definition to propagate to the API server state (replace talos-default-master-1 with the node name):

$ kubectl get pod -n kube-system -l k8s-app=kube-apiserver --field-selector spec.nodeName=talos-default-master-1 -o jsonpath='{.items[0].metadata.annotations.talos\.dev/config\-version}'
5

Check that the pod is running:

$ kubectl get pod -n kube-system -l k8s-app=kube-apiserver --field-selector spec.nodeName=talos-default-master-1
NAME                                    READY   STATUS    RESTARTS   AGE
kube-apiserver-talos-default-master-1   1/1     Running   0          16m

Repeat this process for every control plane node, verifying that state got propagated successfully between each node update.

Controller Manager

Patch machine configuration using talosctl patch command:

$ talosctl -n <CONTROL_PLANE_IP_1> patch mc --immediate -p '[{"op": "replace", "path": "/cluster/controllerManager/image", "value": "k8s.gcr.io/kube-controller-manager:v1.20.4"}]'
patched mc at the node 172.20.0.2

JSON patch might need be adjusted if current machine configuration is missing .cluster.controllerManager.image key.

Capture new version of kube-controller-manager config with:

$ talosctl -n <CONTROL_PLANE_IP_1> get kcpc kube-controller-manager -o yaml
node: 172.20.0.2
metadata:
    namespace: config
    type: KubernetesControlPlaneConfigs.config.talos.dev
    id: kube-controller-manager
    version: 3
    phase: running
spec:
    image: k8s.gcr.io/kube-controller-manager:v1.20.4
    cloudProvider: ""
    podCIDR: 10.244.0.0/16
    serviceCIDR: 10.96.0.0/12
    extraArgs: {}
    extraVolumes: []

In this example, new version is 3. Wait for the new pod definition to propagate to the API server state (replace talos-default-master-1 with the node name):

$ kubectl get pod -n kube-system -l k8s-app=kube-controller-manager --field-selector spec.nodeName=talos-default-master-1 -o jsonpath='{.items[0].metadata.annotations.talos\.dev/config\-version}'
3

Check that the pod is running:

$ kubectl get pod -n kube-system -l k8s-app=kube-controller-manager --field-selector spec.nodeName=talos-default-master-1
NAME                                             READY   STATUS    RESTARTS   AGE
kube-controller-manager-talos-default-master-1   1/1     Running   0          35m

Repeat this process for every control plane node, verifying that state got propagated successfully between each node update.

Scheduler

Patch machine configuration using talosctl patch command:

$ talosctl -n <CONTROL_PLANE_IP_1> patch mc --immediate -p '[{"op": "replace", "path": "/cluster/scheduler/image", "value": "k8s.gcr.io/kube-scheduler:v1.20.4"}]'
patched mc at the node 172.20.0.2

JSON patch might need be adjusted if current machine configuration is missing .cluster.scheduler.image key.

Capture new version of kube-scheduler config with:

$ talosctl -n <CONTROL_PLANE_IP_1> get kcpc kube-scheduler -o yaml
node: 172.20.0.2
metadata:
    namespace: config
    type: KubernetesControlPlaneConfigs.config.talos.dev
    id: kube-scheduler
    version: 3
    phase: running
spec:
    image: k8s.gcr.io/kube-scheduler:v1.20.4
    extraArgs: {}
    extraVolumes: []

In this example, new version is 3. Wait for the new pod definition to propagate to the API server state (replace talos-default-master-1 with the node name):

$ kubectl get pod -n kube-system -l k8s-app=kube-scheduler --field-selector spec.nodeName=talos-default-master-1 -o jsonpath='{.items[0].metadata.annotations.talos\.dev/config\-version}'
3

Check that the pod is running:

$ kubectl get pod -n kube-system -l k8s-app=kube-scheduler --field-selector spec.nodeName=talos-default-master-1
NAME                                    READY   STATUS    RESTARTS   AGE
kube-scheduler-talos-default-master-1   1/1     Running   0          39m

Repeat this process for every control plane node, verifying that state got propagated successfully between each node update.

Proxy

In the proxy’s DaemonSet, change:

kind: DaemonSet
...
spec:
  ...
  template:
    ...
    spec:
      containers:
        - name: kube-proxy
          image: k8s.gcr.io/kube-proxy:v1.20.1
      tolerations:
        - ...

to:

kind: DaemonSet
...
spec:
  ...
  template:
    ...
    spec:
      containers:
        - name: kube-proxy
          image: k8s.gcr.io/kube-proxy:v1.20.4
      tolerations:
        - ...
        - key: node-role.kubernetes.io/control-plane
          operator: Exists
          effect: NoSchedule

To edit the DaemonSet, run:

kubectl edit daemonsets -n kube-system kube-proxy

Kubelet

Upgrading Kubelet version requires Talos node reboot after machine configuration change.

For every node, patch machine configuration with new kubelet version, wait for the node to reboot:

$ talosctl -n <IP> patch mc -p '[{"op": "replace", "path": "/machine/kubelet/image", "value": "ghcr.io/talos-systems/kubelet:v1.20.4"}]'
patched mc at the node 172.20.0.2

Once node boots with the new configuration, confirm upgrade with kubectl get nodes <name>:

$ kubectl get nodes talos-default-master-1
NAME                     STATUS   ROLES                  AGE    VERSION
talos-default-master-1   Ready    control-plane,master   123m   v1.20.4

7.20 - Upgrading Talos

Talos upgrades are effected by an API call. The talosctl CLI utility will facilitate this.

Video Walkthrough

To see a live demo of this writeup, see the video below:

Upgrading from Talos 0.8

Talos 0.9 drops support for bootkube and self-hosted control plane.

Please make sure Talos is upgraded to the latest minor release of 0.8 first (0.8.4 at the moment of this writing), then proceed with upgrading to the latest minor release of 0.9.

Before Upgrade to 0.9

If cluster was bootstrapped on Talos version < 0.8.3, add checkpointer annotations to the kube-scheduler and kube-controller-manager daemonsets to improve resiliency of self-hosted control plane to reboots (this is critical for single control-plane node clusters):

$ kubectl -n kube-system patch daemonset kube-controller-manager --type json -p '[{"op": "add", "path":"/spec/template/metadata/annotations", "value": {"checkpointer.alpha.coreos.com/checkpoint": "true"}}]'
daemonset.apps/kube-controller-manager patched
$ kubectl -n kube-system patch daemonset kube-scheduler --type json -p '[{"op": "add", "path":"/spec/template/metadata/annotations", "value": {"checkpointer.alpha.coreos.com/checkpoint": "true"}}]'
daemonset.apps/kube-scheduler patched

Talos 0.9 only supports Kubernetes versions 1.19.x and 1.20.x. If running 1.18.x, please upgrade Kubernetes before upgrading Talos.

Make sure cluster is running latest minor release of Talos 0.8.

Prepare by downloading talosctl binary for Talos release 0.9.x.

After Upgrade to 0.9

After the upgrade to 0.9, Talos will still be running self-hosted control plane until the conversion process is run.

Note: Talos 0.9 doesn’t include bootkube recovery option (talosctl recover), so it’s not possible to recover self-hosted control plane after upgrading to 0.9.

As soon as all the nodes get upgraded to 0.9, run talosctl convert-k8s to convert the control plane to the new static pod format for 0.9.

Once the conversion process is complete, Kubernetes can be upgraded.

`talosctl` Upgrade

To manually upgrade a Talos node, you will specify the node’s IP address and the installer container image for the version of Talos to which you wish to upgrade.

For instance, if your Talos node has the IP address 10.20.30.40 and you want to install the official version v0.9.0, you would enter a command such as:

  $ talosctl upgrade --nodes 10.20.30.40 \
      --image ghcr.io/talos-systems/installer:v0.9.0

There is an option to this command: --preserve, which can be used to explicitly tell Talos to either keep intact its ephemeral data or not. In most cases, it is correct to just let Talos perform its default action. However, if you are running a single-node control-plane, you will want to make sure that --preserve=true.

If Talos fails to run the upgrade, the --stage flag may be used to perform the upgrade after a reboot which is followed by another reboot to upgraded version.

Machine Configuration Changes

Talos 0.9 introduces new required parameters in machine configuration:

.cluster.aggregatorCA
.cluster.serviceAccount

Talos supports both ECDSA and RSA certificates and keys for Kubernetes and etcd, with ECDSA being default. Talos <= 0.8 supports only RSA keys and certificates.

Utility talosctl gen config generates by default config in 0.9 format which is not compatible with Talos 0.8, but old format can be generated with talosctl gen config --talos-version=v0.8.

7.21 - Virtual (shared) IP

One of the biggest pain points when building a high-availability controlplane is giving clients a single IP or URL at which they can reach any of the controlplane nodes. The most common approaches all require external resources: reverse proxy, load balancer, BGP, and DNS.

Using a “Virtual” IP address, on the other hand, provides high availability without external coordination or resources, so long as the controlplane members share a layer 2 network. In practical terms, this means that they are all connected via a switch, with no router in between them.

The term “virtual” is misleading here. The IP address is real, and it is assigned to an interface. Instead, what actually happens is that the controlplane machines vie for control of the shared IP address. There can be only one owner of the IP address at any given time, but if that owner disappears or becomes non-responsive, another owner will be chosen, and it will take up the mantle: the IP address.

Talos has (as of version 0.9) built-in support for this form of shared IP address, and it can utilize this for both the Kubernetes API server and the Talos endpoint set. Talos uses etcd for elections and leadership (control) of the IP address.

Video Walkthrough

To see a live demo of this writeup, see the video below:

Choose your Shared IP

To begin with, you should choose your shared IP address. It should generally be a reserved, unused IP address in the same subnet as your controlplane nodes. It should not be assigned or assignable by your DHCP server.

For our example, we will assume that the controlplane nodes have the following IP addresses:

192.168.0.10
192.168.0.11
192.168.0.12

We then choose our shared IP to be:

192.168.0.15

Configure your Talos Machines

The shared IP setting is only valid for controlplane nodes.

For the example above, each of the controlplane nodes should have the following Machine Config snippet:

machine:
  network:
    interfaces:
    - interface: eth0
      dhcp: true
      vip:
        ip: 192.168.0.15

Obviously, for your own environment, the interface and the DHCP setting may differ. You are free to use static addressing (cidr) instead of DHCP.

Caveats

In general, the shared IP should just work. However, since it relies on etcd for elections, the shared IP will not come alive until after you have bootstrapped Kubernetes. In general, this is not a problem, but it does mean that you cannot use the shared IP when issuing the talosctl bootstrap command. Instead, that command will need to target one of the controlplane nodes discretely.

8 - Reference

8.1 - API

Talos gRPC API reference.

common/common.proto
health/health.proto
inspect/inspect.proto
machine/machine.proto
network/network.proto
resource/resource.proto
security/security.proto
storage/storage.proto
time/time.proto
Scalar Value Types

Top

common/common.proto

Data

Field	Type	Label	Description
metadata	Metadata
bytes	bytes

DataResponse

Field	Type	Label	Description
messages	Data	repeated

Empty

Field	Type	Label	Description
metadata	Metadata

EmptyResponse

Field	Type	Label	Description
messages	Empty	repeated

Error

Field	Type	Label
code	Code
message	string
details	google.protobuf.Any	repeated

Metadata

Common metadata message nested in all reply message types

Field	Type	Description
hostname	string	hostname of the server response comes from (injected by proxy)
error	string	error is set if request failed to the upstream (rest of response is undefined)
status	google.rpc.Status	error as gRPC Status

Code

Name	Number	Description
FATAL	0
LOCKED	1

ContainerDriver

Name	Number	Description
CONTAINERD	0
CRI	1

Top

health/health.proto

HealthCheck

Field	Type	Label	Description
status	HealthCheck.ServingStatus

HealthCheckResponse

Field	Type	Label	Description
messages	HealthCheck	repeated

HealthWatchRequest

Field	Type	Label	Description
interval_seconds	int64

ReadyCheck

Field	Type	Label	Description
status	ReadyCheck.ReadyStatus

ReadyCheckResponse

Field	Type	Label	Description
messages	ReadyCheck	repeated

HealthCheck.ServingStatus

Name	Number	Description
UNKNOWN	0
SERVING	1
NOT_SERVING	2

ReadyCheck.ReadyStatus

Name	Number	Description
UNKNOWN	0
READY	1
NOT_READY	2

Health

Method Name	Request Type	Response Type
Check	.google.protobuf.Empty	HealthCheckResponse
Watch	HealthWatchRequest	HealthCheckResponse stream
Ready	.google.protobuf.Empty	ReadyCheckResponse

Top

inspect/inspect.proto

ControllerDependencyEdge

Field	Type	Label	Description
controller_name	string
edge_type	DependencyEdgeType
resource_namespace	string
resource_type	string
resource_id	string

ControllerRuntimeDependenciesResponse

Field	Type	Label	Description
messages	ControllerRuntimeDependency	repeated

ControllerRuntimeDependency

The ControllerRuntimeDependency message contains the graph of controller-resource dependencies.

Field	Type	Label	Description
metadata	common.Metadata
edges	ControllerDependencyEdge	repeated

DependencyEdgeType

Name	Number	Description
MANAGES	0
STRONG	1
WEAK	2

InspectService

The inspect service definition.

InspectService provides auxilary API to inspect OS internals.

Method Name	Request Type	Response Type	Description
ControllerRuntimeDependencies	.google.protobuf.Empty	ControllerRuntimeDependenciesResponse

Top

machine/machine.proto

ApplyConfiguration

ApplyConfigurationResponse describes the response to a configuration request.

Field	Type	Label	Description
metadata	common.Metadata

ApplyConfigurationRequest

rpc applyConfiguration ApplyConfiguration describes a request to assert a new configuration upon a node.

Field	Type	Label	Description
data	bytes
on_reboot	bool
immediate	bool

ApplyConfigurationResponse

Field	Type	Label	Description
messages	ApplyConfiguration	repeated

Bootstrap

The bootstrap message containing the bootstrap status.

Field	Type	Label	Description
metadata	common.Metadata

BootstrapRequest

rpc bootstrap

BootstrapResponse

Field	Type	Label	Description
messages	Bootstrap	repeated

CNIConfig

Field	Type	Label	Description
name	string
urls	string	repeated

CPUInfo

Field	Type	Label
processor	uint32
vendor_id	string
cpu_family	string
model	string
model_name	string
stepping	string
microcode	string
cpu_mhz	double
cache_size	string
physical_id	string
siblings	uint32
core_id	string
cpu_cores	uint32
apic_id	string
initial_apic_id	string
fpu	string
fpu_exception	string
cpu_id_level	uint32
wp	string
flags	string	repeated
bugs	string	repeated
bogo_mips	double
cl_flush_size	uint32
cache_alignment	uint32
address_sizes	string
power_management	string

CPUInfoResponse

Field	Type	Label	Description
messages	CPUsInfo	repeated

CPUStat

Field	Type	Label	Description
user	double
nice	double
system	double
idle	double
iowait	double
irq	double
soft_irq	double
steal	double
guest	double
guest_nice	double

CPUsInfo

Field	Type	Label	Description
metadata	common.Metadata
cpu_info	CPUInfo	repeated

ClusterConfig

Field	Type	Label	Description
name	string
control_plane	ControlPlaneConfig
cluster_network	ClusterNetworkConfig
allow_scheduling_on_masters	bool

ClusterNetworkConfig

Field	Type	Label	Description
dns_domain	string
cni_config	CNIConfig

Container

The messages message containing the requested containers.

Field	Type	Label	Description
metadata	common.Metadata
containers	ContainerInfo	repeated

ContainerInfo

The messages message containing the requested containers.

Field	Type	Label	Description
namespace	string
id	string
image	string
pid	uint32
status	string
pod_id	string
name	string

ContainersRequest

Field	Type	Label	Description
namespace	string
driver	common.ContainerDriver		driver might be default “containerd” or “cri”

ContainersResponse

Field	Type	Label	Description
messages	Container	repeated

ControlPlaneConfig

Field	Type	Label	Description
endpoint	string

CopyRequest

CopyRequest describes a request to copy data out of Talos node

Copy produces .tar.gz archive which is streamed back to the caller

Field	Type	Label	Description
root_path	string		Root path to start copying data out, it might be either a file or directory

DHCPOptionsConfig

Field	Type	Label	Description
route_metric	uint32

DiskStat

Field	Type	Label	Description
name	string
read_completed	uint64
read_merged	uint64
read_sectors	uint64
read_time_ms	uint64
write_completed	uint64
write_merged	uint64
write_sectors	uint64
write_time_ms	uint64
io_in_progress	uint64
io_time_ms	uint64
io_time_weighted_ms	uint64
discard_completed	uint64
discard_merged	uint64
discard_sectors	uint64
discard_time_ms	uint64

DiskStats

Field	Type	Label
metadata	common.Metadata
total	DiskStat
devices	DiskStat	repeated

DiskStatsResponse

Field	Type	Label	Description
messages	DiskStats	repeated

DiskUsageInfo

DiskUsageInfo describes a file or directory’s information for du command

Field	Type	Description
metadata	common.Metadata
name	string	Name is the name (including prefixed path) of the file or directory
size	int64	Size indicates the number of bytes contained within the file
error	string	Error describes any error encountered while trying to read the file information.
relative_name	string	RelativeName is the name of the file or directory relative to the RootPath

DiskUsageRequest

DiskUsageRequest describes a request to list disk usage of directories and regular files

Field	Type	Label	Description
recursion_depth	int32		RecursionDepth indicates how many levels of subdirectories should be recursed. The default (0) indicates that no limit should be enforced.
all	bool		All write sizes for all files, not just directories.
threshold	int64		Threshold exclude entries smaller than SIZE if positive, or entries greater than SIZE if negative.
paths	string	repeated	DiskUsagePaths is the list of directories to calculate disk usage for.

DmesgRequest

dmesg

Field	Type	Label	Description
follow	bool
tail	bool

EtcdForfeitLeadership

Field	Type	Label	Description
metadata	common.Metadata
member	string

EtcdForfeitLeadershipRequest

EtcdForfeitLeadershipResponse

Field	Type	Label	Description
messages	EtcdForfeitLeadership	repeated

EtcdLeaveCluster

Field	Type	Label	Description
metadata	common.Metadata

EtcdLeaveClusterRequest

EtcdLeaveClusterResponse

Field	Type	Label	Description
messages	EtcdLeaveCluster	repeated

EtcdMemberList

Field	Type	Label	Description
metadata	common.Metadata
members	string	repeated

EtcdMemberListRequest

Field	Type	Label	Description
query_local	bool

EtcdMemberListResponse

Field	Type	Label	Description
messages	EtcdMemberList	repeated

EtcdRemoveMember

Field	Type	Label	Description
metadata	common.Metadata

EtcdRemoveMemberRequest

Field	Type	Label	Description
member	string

EtcdRemoveMemberResponse

Field	Type	Label	Description
messages	EtcdRemoveMember	repeated

Event

Field	Type	Label	Description
metadata	common.Metadata
data	google.protobuf.Any
id	string

EventsRequest

Field	Type	Label	Description
tail_events	int32
tail_id	string
tail_seconds	int32

FileInfo

FileInfo describes a file or directory’s information

Field	Type	Description
metadata	common.Metadata
name	string	Name is the name (including prefixed path) of the file or directory
size	int64	Size indicates the number of bytes contained within the file
mode	uint32	Mode is the bitmap of UNIX mode/permission flags of the file
modified	int64	Modified indicates the UNIX timestamp at which the file was last modified

GenerateConfiguration

GenerateConfiguration describes the response to a generate configuration request.

Field	Type	Label
metadata	common.Metadata
data	bytes	repeated
talosconfig	bytes

GenerateConfigurationRequest

GenerateConfigurationRequest describes a request to generate a new configuration on a node.

Field	Type	Label	Description
config_version	string
cluster_config	ClusterConfig
machine_config	MachineConfig
override_time	google.protobuf.Timestamp

GenerateConfigurationResponse

Field	Type	Label	Description
messages	GenerateConfiguration	repeated

Hostname

Field	Type	Label	Description
metadata	common.Metadata
hostname	string

HostnameResponse

Field	Type	Label	Description
messages	Hostname	repeated

InstallConfig

Field	Type	Label	Description
install_disk	string
install_image	string

ListRequest

ListRequest describes a request to list the contents of a directory.

Field	Type	Label	Description
root	string		Root indicates the root directory for the list. If not indicated, ‘/’ is presumed.
recurse	bool		Recurse indicates that subdirectories should be recursed.
recursion_depth	int32		RecursionDepth indicates how many levels of subdirectories should be recursed. The default (0) indicates that no limit should be enforced.
types	ListRequest.Type	repeated	Types indicates what file type should be returned. If not indicated, all files will be returned.

LoadAvg

Field	Type	Label	Description
metadata	common.Metadata
load1	double
load5	double
load15	double

LoadAvgResponse

Field	Type	Label	Description
messages	LoadAvg	repeated

LogsRequest

rpc logs The request message containing the process name.

Field	Type	Description
namespace	string
id	string
driver	common.ContainerDriver	driver might be default “containerd” or “cri”
follow	bool
tail_lines	int32

MachineConfig

Field	Type	Label	Description
type	MachineConfig.MachineType
install_config	InstallConfig
network_config	NetworkConfig
kubernetes_version	string

MemInfo

Field	Type	Label	Description
memtotal	uint64
memfree	uint64
memavailable	uint64
buffers	uint64
cached	uint64
swapcached	uint64
active	uint64
inactive	uint64
activeanon	uint64
inactiveanon	uint64
activefile	uint64
inactivefile	uint64
unevictable	uint64
mlocked	uint64
swaptotal	uint64
swapfree	uint64
dirty	uint64
writeback	uint64
anonpages	uint64
mapped	uint64
shmem	uint64
slab	uint64
sreclaimable	uint64
sunreclaim	uint64
kernelstack	uint64
pagetables	uint64
nfsunstable	uint64
bounce	uint64
writebacktmp	uint64
commitlimit	uint64
committedas	uint64
vmalloctotal	uint64
vmallocused	uint64
vmallocchunk	uint64
hardwarecorrupted	uint64
anonhugepages	uint64
shmemhugepages	uint64
shmempmdmapped	uint64
cmatotal	uint64
cmafree	uint64
hugepagestotal	uint64
hugepagesfree	uint64
hugepagesrsvd	uint64
hugepagessurp	uint64
hugepagesize	uint64
directmap4k	uint64
directmap2m	uint64
directmap1g	uint64

Memory

Field	Type	Label	Description
metadata	common.Metadata
meminfo	MemInfo

MemoryResponse

Field	Type	Label	Description
messages	Memory	repeated

MountStat

The messages message containing the requested processes.

Field	Type	Label	Description
filesystem	string
size	uint64
available	uint64
mounted_on	string

Mounts

The messages message containing the requested df stats.

Field	Type	Label	Description
metadata	common.Metadata
stats	MountStat	repeated

MountsResponse

Field	Type	Label	Description
messages	Mounts	repeated

NetDev

Field	Type	Label	Description
name	string
rx_bytes	uint64
rx_packets	uint64
rx_errors	uint64
rx_dropped	uint64
rx_fifo	uint64
rx_frame	uint64
rx_compressed	uint64
rx_multicast	uint64
tx_bytes	uint64
tx_packets	uint64
tx_errors	uint64
tx_dropped	uint64
tx_fifo	uint64
tx_collisions	uint64
tx_carrier	uint64
tx_compressed	uint64

NetworkConfig

Field	Type	Label	Description
hostname	string
interfaces	NetworkDeviceConfig	repeated

NetworkDeviceConfig

Field	Type	Label
interface	string
cidr	string
mtu	int32
dhcp	bool
ignore	bool
dhcp_options	DHCPOptionsConfig
routes	RouteConfig	repeated

NetworkDeviceStats

Field	Type	Label
metadata	common.Metadata
total	NetDev
devices	NetDev	repeated

NetworkDeviceStatsResponse

Field	Type	Label	Description
messages	NetworkDeviceStats	repeated

PhaseEvent

Field	Type	Label	Description
phase	string
action	PhaseEvent.Action

PlatformInfo

Field	Type	Label	Description
name	string
mode	string

Process

Field	Type	Label	Description
metadata	common.Metadata
processes	ProcessInfo	repeated

ProcessInfo

Field	Type	Label	Description
pid	int32
ppid	int32
state	string
threads	int32
cpu_time	double
virtual_memory	uint64
resident_memory	uint64
command	string
executable	string
args	string

ProcessesRequest

rpc processes

ProcessesResponse

Field	Type	Label	Description
messages	Process	repeated

ReadRequest

Field	Type	Label	Description
path	string

Reboot

rpc reboot The reboot message containing the reboot status.

Field	Type	Label	Description
metadata	common.Metadata

RebootResponse

Field	Type	Label	Description
messages	Reboot	repeated

Recover

The recover message containing the recover status.

Field	Type	Label	Description
metadata	common.Metadata

RecoverRequest

Field	Type	Label	Description
source	RecoverRequest.Source

RecoverResponse

Field	Type	Label	Description
messages	Recover	repeated

RemoveBootkubeInitializedKey

RemoveBootkubeInitializedKeyResponse describes the response to a RemoveBootkubeInitializedKey request.

Field	Type	Label	Description
metadata	common.Metadata

RemoveBootkubeInitializedKeyResponse

Field	Type	Label	Description
messages	RemoveBootkubeInitializedKey	repeated

Reset

The reset message containing the restart status.

Field	Type	Label	Description
metadata	common.Metadata

ResetPartitionSpec

rpc reset

Field	Type	Label	Description
label	string
wipe	bool

ResetRequest

Field	Type	Label	Description
graceful	bool		Graceful indicates whether node should leave etcd before the upgrade, it also enforces etcd checks before leaving.
reboot	bool		Reboot indicates whether node should reboot or halt after resetting.
system_partitions_to_wipe	ResetPartitionSpec	repeated	System_partitions_to_wipe lists specific system disk partitions to be reset (wiped). If system_partitions_to_wipe is empty, all the partitions are erased.

ResetResponse

Field	Type	Label	Description
messages	Reset	repeated

Restart

Field	Type	Label	Description
metadata	common.Metadata

RestartEvent

Field	Type	Label	Description
cmd	int64

RestartRequest

rpc restart The request message containing the process to restart.

Field	Type	Description
namespace	string
id	string
driver	common.ContainerDriver	driver might be default “containerd” or “cri”

RestartResponse

The messages message containing the restart status.

Field	Type	Label	Description
messages	Restart	repeated

Rollback

Field	Type	Label	Description
metadata	common.Metadata

RollbackRequest

rpc rollback

RollbackResponse

Field	Type	Label	Description
messages	Rollback	repeated

RouteConfig

Field	Type	Label	Description
network	string
gateway	string
metric	uint32

SequenceEvent

rpc events

Field	Type	Label	Description
sequence	string
action	SequenceEvent.Action
error	common.Error

ServiceEvent

Field	Type	Label	Description
msg	string
state	string
ts	google.protobuf.Timestamp

ServiceEvents

Field	Type	Label	Description
events	ServiceEvent	repeated

ServiceHealth

Field	Type	Label	Description
unknown	bool
healthy	bool
last_message	string
last_change	google.protobuf.Timestamp

ServiceInfo

Field	Type	Label	Description
id	string
state	string
events	ServiceEvents
health	ServiceHealth

ServiceList

rpc servicelist

Field	Type	Label	Description
metadata	common.Metadata
services	ServiceInfo	repeated

ServiceListResponse

Field	Type	Label	Description
messages	ServiceList	repeated

ServiceRestart

Field	Type	Label	Description
metadata	common.Metadata
resp	string

ServiceRestartRequest

Field	Type	Label	Description
id	string

ServiceRestartResponse

Field	Type	Label	Description
messages	ServiceRestart	repeated

ServiceStart

Field	Type	Label	Description
metadata	common.Metadata
resp	string

ServiceStartRequest

rpc servicestart

Field	Type	Label	Description
id	string

ServiceStartResponse

Field	Type	Label	Description
messages	ServiceStart	repeated

ServiceStateEvent

Field	Type	Label	Description
service	string
action	ServiceStateEvent.Action
message	string
health	ServiceHealth

ServiceStop

Field	Type	Label	Description
metadata	common.Metadata
resp	string

ServiceStopRequest

Field	Type	Label	Description
id	string

ServiceStopResponse

Field	Type	Label	Description
messages	ServiceStop	repeated

Shutdown

rpc shutdown The messages message containing the shutdown status.

Field	Type	Label	Description
metadata	common.Metadata

ShutdownResponse

Field	Type	Label	Description
messages	Shutdown	repeated

SoftIRQStat

Field	Type	Label	Description
hi	uint64
timer	uint64
net_tx	uint64
net_rx	uint64
block	uint64
block_io_poll	uint64
tasklet	uint64
sched	uint64
hrtimer	uint64
rcu	uint64

StartRequest

Field	Type	Label	Description
id	string

StartResponse

Field	Type	Label	Description
resp	string

Stat

The messages message containing the requested stat.

Field	Type	Label	Description
namespace	string
id	string
memory_usage	uint64
cpu_usage	uint64
pod_id	string
name	string

Stats

The messages message containing the requested stats.

Field	Type	Label	Description
metadata	common.Metadata
stats	Stat	repeated

StatsRequest

The request message containing the containerd namespace.

Field	Type	Label	Description
namespace	string
driver	common.ContainerDriver		driver might be default “containerd” or “cri”

StatsResponse

Field	Type	Label	Description
messages	Stats	repeated

StopRequest

Field	Type	Label	Description
id	string

StopResponse

Field	Type	Label	Description
resp	string

SystemStat

Field	Type	Label
metadata	common.Metadata
boot_time	uint64
cpu_total	CPUStat
cpu	CPUStat	repeated
irq_total	uint64
irq	uint64	repeated
context_switches	uint64
process_created	uint64
process_running	uint64
process_blocked	uint64
soft_irq_total	uint64
soft_irq	SoftIRQStat

SystemStatResponse

Field	Type	Label	Description
messages	SystemStat	repeated

TaskEvent

Field	Type	Label	Description
task	string
action	TaskEvent.Action

Upgrade

Field	Type	Label	Description
metadata	common.Metadata
ack	string

UpgradeRequest

rpc upgrade

Field	Type	Label	Description
image	string
preserve	bool
stage	bool
force	bool

UpgradeResponse

Field	Type	Label	Description
messages	Upgrade	repeated

Version

Field	Type	Label	Description
metadata	common.Metadata
version	VersionInfo
platform	PlatformInfo

VersionInfo

Field	Type	Label	Description
tag	string
sha	string
built	string
go_version	string
os	string
arch	string

VersionResponse

Field	Type	Label	Description
messages	Version	repeated

ListRequest.Type

File type.

Name	Number	Description
REGULAR	0	Regular file (not directory, symlink, etc).
DIRECTORY	1	Directory.
SYMLINK	2	Symbolic link.

MachineConfig.MachineType

Name	Number	Description
TYPE_UNKNOWN	0
TYPE_INIT	1
TYPE_CONTROL_PLANE	2
TYPE_JOIN	3

PhaseEvent.Action

Name	Number	Description
START	0
STOP	1

RecoverRequest.Source

Name	Number	Description
ETCD	0
APISERVER	1

SequenceEvent.Action

Name	Number	Description
NOOP	0
START	1
STOP	2

ServiceStateEvent.Action

Name	Number	Description
INITIALIZED	0
PREPARING	1
WAITING	2
RUNNING	3
STOPPING	4
FINISHED	5
FAILED	6
SKIPPED	7

TaskEvent.Action

Name	Number	Description
START	0
STOP	1

MachineService

The machine service definition.

Method Name	Request Type	Response Type
ApplyConfiguration	ApplyConfigurationRequest	ApplyConfigurationResponse
Bootstrap	BootstrapRequest	BootstrapResponse
Containers	ContainersRequest	ContainersResponse
Copy	CopyRequest	.common.Data stream
CPUInfo	.google.protobuf.Empty	CPUInfoResponse
DiskStats	.google.protobuf.Empty	DiskStatsResponse
Dmesg	DmesgRequest	.common.Data stream
Events	EventsRequest	Event stream
EtcdMemberList	EtcdMemberListRequest	EtcdMemberListResponse
EtcdRemoveMember	EtcdRemoveMemberRequest	EtcdRemoveMemberResponse
EtcdLeaveCluster	EtcdLeaveClusterRequest	EtcdLeaveClusterResponse
EtcdForfeitLeadership	EtcdForfeitLeadershipRequest	EtcdForfeitLeadershipResponse
GenerateConfiguration	GenerateConfigurationRequest	GenerateConfigurationResponse
Hostname	.google.protobuf.Empty	HostnameResponse
Kubeconfig	.google.protobuf.Empty	.common.Data stream
List	ListRequest	FileInfo stream
DiskUsage	DiskUsageRequest	DiskUsageInfo stream
LoadAvg	.google.protobuf.Empty	LoadAvgResponse
Logs	LogsRequest	.common.Data stream
Memory	.google.protobuf.Empty	MemoryResponse
Mounts	.google.protobuf.Empty	MountsResponse
NetworkDeviceStats	.google.protobuf.Empty	NetworkDeviceStatsResponse
Processes	.google.protobuf.Empty	ProcessesResponse
Read	ReadRequest	.common.Data stream
Reboot	.google.protobuf.Empty	RebootResponse
Restart	RestartRequest	RestartResponse
Rollback	RollbackRequest	RollbackResponse
Reset	ResetRequest	ResetResponse
Recover	RecoverRequest	RecoverResponse
RemoveBootkubeInitializedKey	.google.protobuf.Empty	RemoveBootkubeInitializedKeyResponse
ServiceList	.google.protobuf.Empty	ServiceListResponse
ServiceRestart	ServiceRestartRequest	ServiceRestartResponse
ServiceStart	ServiceStartRequest	ServiceStartResponse
ServiceStop	ServiceStopRequest	ServiceStopResponse
Shutdown	.google.protobuf.Empty	ShutdownResponse
Stats	StatsRequest	StatsResponse
SystemStat	.google.protobuf.Empty	SystemStatResponse
Upgrade	UpgradeRequest	UpgradeResponse
Version	.google.protobuf.Empty	VersionResponse

Top

network/network.proto

Interface

Interface represents a net.Interface

Field	Type	Label
index	uint32
mtu	uint32
name	string
hardwareaddr	string
flags	InterfaceFlags
ipaddress	string	repeated

Interfaces

Field	Type	Label	Description
metadata	common.Metadata
interfaces	Interface	repeated

InterfacesResponse

Field	Type	Label	Description
messages	Interfaces	repeated

Route

The messages message containing a route.

Field	Type	Description
interface	string	Interface is the interface over which traffic to this destination should be sent
destination	string	Destination is the network prefix CIDR which this route provides
gateway	string	Gateway is the gateway address to which traffic to this destination should be sent
metric	uint32	Metric is the priority of the route, where lower metrics have higher priorities
scope	uint32	Scope desribes the scope of this route
source	string	Source is the source prefix CIDR for the route, if one is defined
family	AddressFamily	Family is the address family of the route. Currently, the only options are AF_INET (IPV4) and AF_INET6 (IPV6).
protocol	RouteProtocol	Protocol is the protocol by which this route came to be in place
flags	uint32	Flags indicate any special flags on the route

Routes

Field	Type	Label	Description
metadata	common.Metadata
routes	Route	repeated

RoutesResponse

The messages message containing the routes.

Field	Type	Label	Description
messages	Routes	repeated

AddressFamily

Name	Number	Description
AF_UNSPEC	0
AF_INET	2
IPV4	2
AF_INET6	10
IPV6	10

InterfaceFlags

Name	Number	Description
FLAG_UNKNOWN	0
FLAG_UP	1
FLAG_BROADCAST	2
FLAG_LOOPBACK	3
FLAG_POINT_TO_POINT	4
FLAG_MULTICAST	5

RouteProtocol

Name	Number	Description
RTPROT_UNSPEC	0
RTPROT_REDIRECT	1	Route installed by ICMP redirects
RTPROT_KERNEL	2	Route installed by kernel
RTPROT_BOOT	3	Route installed during boot
RTPROT_STATIC	4	Route installed by administrator
RTPROT_GATED	8	Route installed by gated
RTPROT_RA	9	Route installed by router advertisement
RTPROT_MRT	10	Route installed by Merit MRT
RTPROT_ZEBRA	11	Route installed by Zebra/Quagga
RTPROT_BIRD	12	Route installed by Bird
RTPROT_DNROUTED	13	Route installed by DECnet routing daemon
RTPROT_XORP	14	Route installed by XORP
RTPROT_NTK	15	Route installed by Netsukuku
RTPROT_DHCP	16	Route installed by DHCP
RTPROT_MROUTED	17	Route installed by Multicast daemon
RTPROT_BABEL	42	Route installed by Babel daemon

NetworkService

The network service definition.

Method Name	Request Type	Response Type	Description
Routes	.google.protobuf.Empty	RoutesResponse
Interfaces	.google.protobuf.Empty	InterfacesResponse

Top

resource/resource.proto

Get

The GetResponse message contains the Resource returned.

Field	Type	Label	Description
metadata	common.Metadata
definition	Resource
resource	Resource

GetRequest

rpc Get

Field	Type	Label	Description
namespace	string
type	string
id	string

GetResponse

Field	Type	Label	Description
messages	Get	repeated

ListRequest

rpc List The ListResponse message contains the Resource returned.

Field	Type	Label	Description
namespace	string
type	string

ListResponse

Field	Type	Label	Description
metadata	common.Metadata
definition	Resource
resource	Resource

Metadata

Field	Type	Label
namespace	string
type	string
id	string
version	string
phase	string
finalizers	string	repeated

Resource

Field	Type	Label	Description
metadata	Metadata
spec	Spec

Spec

Field	Type	Label	Description
yaml	bytes

WatchRequest

rpc Watch The WatchResponse message contains the Resource returned.

Field	Type	Label	Description
namespace	string
type	string
id	string

WatchResponse

Field	Type	Label	Description
metadata	common.Metadata
event_type	EventType
definition	Resource
resource	Resource

EventType

Name	Number	Description
CREATED	0
UPDATED	1
DESTROYED	2

ResourceService

The resource service definition.

ResourceService provides user-facing API for the Talos resources.

Method Name	Request Type	Response Type
Get	GetRequest	GetResponse
List	ListRequest	ListResponse stream
Watch	WatchRequest	WatchResponse stream

Top

security/security.proto

CertificateRequest

The request message containing the process name.

Field	Type	Label	Description
csr	bytes

CertificateResponse

The response message containing the requested logs.

Field	Type	Label	Description
ca	bytes
crt	bytes

ReadFileRequest

The request message for reading a file on disk.

Field	Type	Label	Description
path	string

ReadFileResponse

The response message for reading a file on disk.

Field	Type	Label	Description
data	bytes

WriteFileRequest

The request message containing the process name.

Field	Type	Label	Description
path	string
data	bytes
perm	int32

WriteFileResponse

The response message containing the requested logs.

SecurityService

The security service definition.

Method Name	Request Type	Response Type
Certificate	CertificateRequest	CertificateResponse
ReadFile	ReadFileRequest	ReadFileResponse
WriteFile	WriteFileRequest	WriteFileResponse

Top

storage/storage.proto

Disk

Disk represents a disk.

Field	Type	Description
size	uint64	Size indicates the disk size in bytes.
model	string	Model idicates the disk model.
device_name	string	DeviceName indicates the disk name (e.g. `sda`).

Disks

DisksResponse represents the response of the Disks RPC.

Field	Type	Label	Description
metadata	common.Metadata
disks	Disk	repeated

DisksResponse

Field	Type	Label	Description
messages	Disks	repeated

StorageService

StorageService represents the storage service.

Method Name	Request Type	Response Type	Description
Disks	.google.protobuf.Empty	DisksResponse

Top

time/time.proto

Time

Field	Type	Label	Description
metadata	common.Metadata
server	string
localtime	google.protobuf.Timestamp
remotetime	google.protobuf.Timestamp

TimeRequest

The response message containing the ntp server

Field	Type	Label	Description
server	string

TimeResponse

The response message containing the ntp server, time, and offset

Field	Type	Label	Description
messages	Time	repeated

TimeService

The time service definition.

Method Name	Request Type	Response Type	Description
Time	.google.protobuf.Empty	TimeResponse
TimeCheck	TimeRequest	TimeResponse

Scalar Value Types

.proto Type	Notes	C++	Java	Python	Go	C#	PHP	Ruby
double		double	double	float	float64	double	float	Float
float		float	float	float	float32	float	float	Float
int32	Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead.	int32	int	int	int32	int	integer	Bignum or Fixnum (as required)
int64	Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead.	int64	long	int/long	int64	long	integer/string	Bignum
uint32	Uses variable-length encoding.	uint32	int	int/long	uint32	uint	integer	Bignum or Fixnum (as required)
uint64	Uses variable-length encoding.	uint64	long	int/long	uint64	ulong	integer/string	Bignum or Fixnum (as required)
sint32	Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s.	int32	int	int	int32	int	integer	Bignum or Fixnum (as required)
sint64	Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s.	int64	long	int/long	int64	long	integer/string	Bignum
fixed32	Always four bytes. More efficient than uint32 if values are often greater than 2^28.	uint32	int	int	uint32	uint	integer	Bignum or Fixnum (as required)
fixed64	Always eight bytes. More efficient than uint64 if values are often greater than 2^56.	uint64	long	int/long	uint64	ulong	integer/string	Bignum
sfixed32	Always four bytes.	int32	int	int	int32	int	integer	Bignum or Fixnum (as required)
sfixed64	Always eight bytes.	int64	long	int/long	int64	long	integer/string	Bignum
bool		bool	boolean	boolean	bool	bool	boolean	TrueClass/FalseClass
string	A string must always contain UTF-8 encoded or 7-bit ASCII text.	string	String	str/unicode	string	string	string	String (UTF-8)
bytes	May contain any arbitrary sequence of bytes.	string	ByteString	str	[]byte	ByteString	string	String (ASCII-8BIT)

8.2 - CLI

talosctl apply-config

Apply a new configuration to a node

talosctl apply-config [flags]

Options

      --cert-fingerprint strings   list of server certificate fingeprints to accept (defaults to no check)
  -f, --file string                the filename of the updated configuration
  -h, --help                       help for apply-config
      --immediate                  apply the config immediately (without a reboot)
  -i, --insecure                   apply the config using the insecure (encrypted with no auth) maintenance service
      --interactive                apply the config using text based interactive mode
      --on-reboot                  apply the config on reboot

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl bootstrap

Bootstrap the cluster

talosctl bootstrap [flags]

Options

  -h, --help   help for bootstrap

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl cluster create

Creates a local docker-based or QEMU-based kubernetes cluster

talosctl cluster create [flags]

Options

      --arch string                             cluster architecture (default "amd64")
      --cidr string                             CIDR of the cluster network (IPv4, ULA network for IPv6 is derived in automated way) (default "10.5.0.0/24")
      --cni-bin-path strings                    search path for CNI binaries (VM only) (default [/home/user/.talos/cni/bin])
      --cni-bundle-url string                   URL to download CNI bundle from (VM only) (default "https://github.com/siderolabs/talos/releases/download/v0.10.0-alpha.0/talosctl-cni-bundle-${ARCH}.tar.gz")
      --cni-cache-dir string                    CNI cache directory path (VM only) (default "/home/user/.talos/cni/cache")
      --cni-conf-dir string                     CNI config directory path (VM only) (default "/home/user/.talos/cni/conf.d")
      --config-patch string                     patch generated machineconfigs
      --cpus string                             the share of CPUs as fraction (each container/VM) (default "2.0")
      --crashdump                               print debug crashdump to stderr when cluster startup fails
      --custom-cni-url string                   install custom CNI from the URL (Talos cluster)
      --disk int                                default limit on disk size in MB (each VM) (default 6144)
      --disk-image-path string                  disk image to use
      --dns-domain string                       the dns domain to use for cluster (default "cluster.local")
      --docker-host-ip string                   Host IP to forward exposed ports to (Docker provisioner only) (default "0.0.0.0")
      --encrypt-ephemeral                       enable ephemeral partition encryption
      --encrypt-state                           enable state partition encryption
      --endpoint string                         use endpoint instead of provider defaults
  -p, --exposed-ports string                    Comma-separated list of ports/protocols to expose on init node. Ex -p <hostPort>:<containerPort>/<protocol (tcp or udp)> (Docker provisioner only)
  -h, --help                                    help for create
      --image string                            the image to use (default "ghcr.io/talos-systems/talos:latest")
      --init-node-as-endpoint                   use init node as endpoint instead of any load balancer endpoint
      --initrd-path string                      the uncompressed kernel image to use (default "_out/initramfs-${ARCH}.xz")
  -i, --input-dir string                        location of pre-generated config files
      --install-image string                    the installer image to use (default "ghcr.io/talos-systems/installer:latest")
      --ipv4                                    enable IPv4 network in the cluster (default true)
      --ipv6                                    enable IPv6 network in the cluster (QEMU provisioner only)
      --iso-path string                         the ISO path to use for the initial boot (VM only)
      --kubernetes-version string               desired kubernetes version to run (default "1.20.5")
      --masters int                             the number of masters to create (default 1)
      --memory int                              the limit on memory usage in MB (each container/VM) (default 2048)
      --mtu int                                 MTU of the cluster network (default 1500)
      --nameservers strings                     list of nameservers to use (default [8.8.8.8,1.1.1.1,2001:4860:4860::8888,2606:4700:4700::1111])
      --registry-insecure-skip-verify strings   list of registry hostnames to skip TLS verification for
      --registry-mirror strings                 list of registry mirrors to use in format: <registry host>=<mirror URL>
      --skip-injecting-config                   skip injecting config from embedded metadata server, write config files to current directory
      --skip-kubeconfig                         skip merging kubeconfig from the created cluster
      --talos-version string                    the desired Talos version to generate config for (if not set, defaults to image version)
      --use-vip                                 use a virtual IP for the controlplane endpoint instead of the loadbalancer
      --user-disk strings                       list of disks to create for each VM in format: <mount_point1>:<size1>:<mount_point2>:<size2>
      --vmlinuz-path string                     the compressed kernel image to use (default "_out/vmlinuz-${ARCH}")
      --wait                                    wait for the cluster to be ready before returning (default true)
      --wait-timeout duration                   timeout to wait for the cluster to be ready (default 20m0s)
      --wireguard-cidr string                   CIDR of the wireguard network
      --with-apply-config                       enable apply config when the VM is starting in maintenance mode
      --with-bootloader                         enable bootloader to load kernel and initramfs from disk image after install (default true)
      --with-debug                              enable debug in Talos config to send service logs to the console
      --with-init-node                          create the cluster with an init node
      --with-uefi                               enable UEFI on x86_64 architecture (always enabled for arm64)
      --workers int                             the number of workers to create (default 1)

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
      --name string          the name of the cluster (default "talos-default")
  -n, --nodes strings        target the specified nodes
      --provisioner string   Talos cluster provisioner to use (default "docker")
      --state string         directory path to store cluster state (default "/home/user/.talos/clusters")
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl cluster destroy

Destroys a local docker-based or firecracker-based kubernetes cluster

talosctl cluster destroy [flags]

Options

  -h, --help   help for destroy

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
      --name string          the name of the cluster (default "talos-default")
  -n, --nodes strings        target the specified nodes
      --provisioner string   Talos cluster provisioner to use (default "docker")
      --state string         directory path to store cluster state (default "/home/user/.talos/clusters")
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl cluster show

Shows info about a local provisioned kubernetes cluster

talosctl cluster show [flags]

Options

  -h, --help   help for show

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
      --name string          the name of the cluster (default "talos-default")
  -n, --nodes strings        target the specified nodes
      --provisioner string   Talos cluster provisioner to use (default "docker")
      --state string         directory path to store cluster state (default "/home/user/.talos/clusters")
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl cluster

A collection of commands for managing local docker-based or firecracker-based clusters

Options

  -h, --help                 help for cluster
      --name string          the name of the cluster (default "talos-default")
      --provisioner string   Talos cluster provisioner to use (default "docker")
      --state string         directory path to store cluster state (default "/home/user/.talos/clusters")

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl completion

Output shell completion code for the specified shell (bash or zsh)

Synopsis

Output shell completion code for the specified shell (bash or zsh). The shell code must be evaluated to provide interactive completion of talosctl commands. This can be done by sourcing it from the .bash_profile.

Note for zsh users: [1] zsh completions are only supported in versions of zsh >= 5.2

talosctl completion SHELL [flags]

Examples

# Installing bash completion on macOS using homebrew
## If running Bash 3.2 included with macOS
	brew install bash-completion
## or, if running Bash 4.1+
	brew install bash-completion@2
## If talosctl is installed via homebrew, this should start working immediately.
## If you've installed via other means, you may need add the completion to your completion directory
	talosctl completion bash > $(brew --prefix)/etc/bash_completion.d/talosctl

# Installing bash completion on Linux
## If bash-completion is not installed on Linux, please install the 'bash-completion' package
## via your distribution's package manager.
## Load the talosctl completion code for bash into the current shell
	source <(talosctl completion bash)
## Write bash completion code to a file and source if from .bash_profile
	talosctl completion bash > ~/.talos/completion.bash.inc
	printf "
		# talosctl shell completion
		source '$HOME/.talos/completion.bash.inc'
		" >> $HOME/.bash_profile
	source $HOME/.bash_profile
# Load the talosctl completion code for zsh[1] into the current shell
	source <(talosctl completion zsh)
# Set the talosctl completion code for zsh[1] to autoload on startup
talosctl completion zsh > "${fpath[1]}/_talosctl"

Options

  -h, --help   help for completion

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl config add

Add a new context

talosctl config add <context> [flags]

Options

      --ca string    the path to the CA certificate
      --crt string   the path to the certificate
  -h, --help         help for add
      --key string   the path to the key

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl config context

Set the current context

talosctl config context <context> [flags]

Options

  -h, --help   help for context

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl config contexts

List contexts defined in Talos config

talosctl config contexts [flags]

Options

  -h, --help   help for contexts

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl config endpoint

Set the endpoint(s) for the current context

talosctl config endpoint <endpoint>... [flags]

Options

  -h, --help   help for endpoint

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl config merge

Merge additional contexts from another Talos config into the default config

Synopsis

Contexts with the same name are renamed while merging configs.

talosctl config merge <from> [flags]

Options

  -h, --help   help for merge

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl config node

Set the node(s) for the current context

talosctl config node <endpoint>... [flags]

Options

  -h, --help   help for node

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl config

Manage the client configuration

Options

  -h, --help   help for config

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl containers

List containers

talosctl containers [flags]

Options

  -h, --help         help for containers
  -k, --kubernetes   use the k8s.io containerd namespace

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl convert-k8s

Convert Kubernetes control plane from self-hosted (bootkube) to Talos-managed (static pods).

Synopsis

Command converts control plane bootstrapped on Talos <= 0.8 to Talos-managed control plane (Talos >= 0.9). As part of the conversion process tool reads existing configuration of the control plane, updates Talos node configuration to reflect changes made since the boostrap time. Once config is updated, tool releases static pods and deletes self-hosted DaemonSets.

talosctl convert-k8s [flags]

Options

      --endpoint string   the cluster control plane endpoint
      --force             skip prompts, assume yes
  -h, --help              help for convert-k8s

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl copy

Copy data out from the node

Synopsis

Creates an .tar.gz archive at the node starting at and streams it back to the client.

If ‘-’ is given for , archive is written to stdout. Otherwise archive is extracted to which should be an empty directory or talosctl creates a directory if doesn’t exist. Command doesn’t preserve ownership and access mode for the files in extract mode, while streamed .tar archive captures ownership and permission bits.

talosctl copy <src-path> -|<local-path> [flags]

Options

  -h, --help   help for copy

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl crashdump

Dump debug information about the cluster

talosctl crashdump [flags]

Options

      --control-plane-nodes strings   specify IPs of control plane nodes
  -h, --help                          help for crashdump
      --init-node string              specify IPs of init node
      --worker-nodes strings          specify IPs of worker nodes

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl dashboard

Cluster dashboard with real-time metrics

Synopsis

Provide quick UI to navigate through node real-time metrics.

Keyboard shortcuts:

h, : switch one node to the left
l, : switch one node to the right
j, : scroll process list down
k, : scroll process list up
: scroll process list half page down
: scroll process list half page up
: scroll process list one page down
: scroll process list one page up

talosctl dashboard [flags]

Options

  -h, --help                       help for dashboard
  -d, --update-interval duration   interval between updates (default 3s)

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl dmesg

Retrieve kernel logs

talosctl dmesg [flags]

Options

  -f, --follow   specify if the kernel log should be streamed
  -h, --help     help for dmesg
      --tail     specify if only new messages should be sent (makes sense only when combined with --follow)

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl edit

Edit a resource from the default editor.

Synopsis

The edit command allows you to directly edit any API resource you can retrieve via the command line tools.

It will open the editor defined by your TALOS_EDITOR, or EDITOR environment variables, or fall back to ‘vi’ for Linux or ’notepad’ for Windows.

talosctl edit <type> [<id>] [flags]

Options

  -h, --help               help for edit
      --immediate          apply the change immediately (without a reboot)
      --namespace string   resource namespace (default is to use default namespace per resource)
      --on-reboot          apply the change on next reboot

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl etcd forfeit-leadership

Tell node to forfeit etcd cluster leadership

talosctl etcd forfeit-leadership [flags]

Options

  -h, --help   help for forfeit-leadership

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl etcd leave

Tell nodes to leave etcd cluster

talosctl etcd leave [flags]

Options

  -h, --help   help for leave

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl etcd members

Get the list of etcd cluster members

talosctl etcd members [flags]

Options

  -h, --help   help for members

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl etcd remove-member

Remove the node from etcd cluster

Synopsis

Use this command only if you want to remove a member which is in broken state. If there is no access to the node, or the node can’t access etcd to call etcd leave. Always prefer etcd leave over this command.

talosctl etcd remove-member <hostname> [flags]

Options

  -h, --help   help for remove-member

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl etcd

Manage etcd

Options

  -h, --help   help for etcd

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl events

Stream runtime events

talosctl events [flags]

Options

      --duration duration   show events for the past duration interval (one second resolution, default is to show no history)
  -h, --help                help for events
      --since string        show events after the specified event ID (default is to show no history)
      --tail int32          show specified number of past events (use -1 to show full history, default is to show no history)

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl gen ca

Generates a self-signed X.509 certificate authority

talosctl gen ca [flags]

Options

  -h, --help                  help for ca
      --hours int             the hours from now on which the certificate validity period ends (default 87600)
      --organization string   X.509 distinguished name for the Organization
      --rsa                   generate in RSA format

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl gen config

Generates a set of configuration files for Talos cluster

Synopsis

The cluster endpoint is the URL for the Kubernetes API. If you decide to use a control plane node, common in a single node control plane setup, use port 6443 as this is the port that the API server binds to on every control plane node. For an HA setup, usually involving a load balancer, use the IP and port of the load balancer.

talosctl gen config <cluster name> <cluster endpoint> [flags]

Options

      --additional-sans strings     additional Subject-Alt-Names for the APIServer certificate
      --dns-domain string           the dns domain to use for cluster (default "cluster.local")
  -h, --help                        help for config
      --install-disk string         the disk to install to (default "/dev/sda")
      --install-image string        the image used to perform an installation (default "ghcr.io/talos-systems/installer:latest")
      --kubernetes-version string   desired kubernetes version to run
  -o, --output-dir string           destination to output generated files
  -p, --persist                     the desired persist value for configs (default true)
      --registry-mirror strings     list of registry mirrors to use in format: <registry host>=<mirror URL>
      --talos-version string        the desired Talos version to generate config for (backwards compatibility, e.g. v0.8)
      --version string              the desired machine config version to generate (default "v1alpha1")

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl gen crt

Generates an X.509 Ed25519 certificate

talosctl gen crt [flags]

Options

      --ca string     path to the PEM encoded CERTIFICATE
      --csr string    path to the PEM encoded CERTIFICATE REQUEST
  -h, --help          help for crt
      --hours int     the hours from now on which the certificate validity period ends (default 24)
      --name string   the basename of the generated file

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl gen csr

Generates a CSR using an Ed25519 private key

talosctl gen csr [flags]

Options

  -h, --help         help for csr
      --ip string    generate the certificate for this IP address
      --key string   path to the PEM encoded EC or RSA PRIVATE KEY

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl gen key

Generates an Ed25519 private key

talosctl gen key [flags]

Options

  -h, --help          help for key
      --name string   the basename of the generated file

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl gen keypair

Generates an X.509 Ed25519 key pair

talosctl gen keypair [flags]

Options

  -h, --help                  help for keypair
      --ip string             generate the certificate for this IP address
      --organization string   X.509 distinguished name for the Organization

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl gen

Generate CAs, certificates, and private keys

Options

  -h, --help   help for gen

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl get

Get a specific resource or list of resources.

talosctl get <type> [<id>] [flags]

Options

  -h, --help               help for get
      --namespace string   resource namespace (default is to use default namespace per resource)
  -o, --output string      output mode (table, yaml) (default "table")
  -w, --watch              watch resource changes

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl health

Check cluster health

talosctl health [flags]

Options

      --control-plane-nodes strings   specify IPs of control plane nodes
  -h, --help                          help for health
      --init-node string              specify IPs of init node
      --k8s-endpoint string           use endpoint instead of kubeconfig default
      --run-e2e                       run Kubernetes e2e test
      --server                        run server-side check (default true)
      --wait-timeout duration         timeout to wait for the cluster to be ready (default 20m0s)
      --worker-nodes strings          specify IPs of worker nodes

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl images

List the default images used by Talos

talosctl images [flags]

Options

  -h, --help   help for images

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl inspect dependencies

Inspect controller-resource dependencies as graphviz graph.

Synopsis

Inspect controller-resource dependencies as graphviz graph.

Pipe the output of the command through the “dot” program (part of graphviz package) to render the graph:

talosctl inspect dependencies | dot -Tpng > graph.png

talosctl inspect dependencies [flags]

Options

  -h, --help             help for dependencies
      --with-resources   display live resource information with dependencies

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl inspect

Inspect internals of Talos

Options

  -h, --help   help for inspect

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl interfaces

List network interfaces

talosctl interfaces [flags]

Options

  -h, --help   help for interfaces

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl kubeconfig

Download the admin kubeconfig from the node

Synopsis

Download the admin kubeconfig from the node. If merge flag is defined, config will be merged with ~/.kube/config or [local-path] if specified. Otherwise kubeconfig will be written to PWD or [local-path] if specified.

talosctl kubeconfig [local-path] [flags]

Options

  -f, --force                       Force overwrite of kubeconfig if already present, force overwrite on kubeconfig merge
      --force-context-name string   Force context name for kubeconfig merge
  -h, --help                        help for kubeconfig
  -m, --merge                       Merge with existing kubeconfig (default true)

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl list

Retrieve a directory listing

talosctl list [path] [flags]

Options

  -d, --depth int32    maximum recursion depth
  -h, --help           help for list
  -H, --humanize       humanize size and time in the output
  -l, --long           display additional file details
  -r, --recurse        recurse into subdirectories
  -t, --type strings   filter by specified types:
                       f	regular file
                       d	directory
                       l, L	symbolic link

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl logs

Retrieve logs for a service

talosctl logs <service name> [flags]

Options

  -f, --follow       specify if the logs should be streamed
  -h, --help         help for logs
  -k, --kubernetes   use the k8s.io containerd namespace
      --tail int32   lines of log file to display (default is to show from the beginning) (default -1)

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl memory

Show memory usage

talosctl memory [flags]

Options

  -h, --help      help for memory
  -v, --verbose   display extended memory statistics

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl mounts

List mounts

talosctl mounts [flags]

Options

  -h, --help   help for mounts

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl patch

Update field(s) of a resource using a JSON patch.

talosctl patch <type> [<id>] [flags]

Options

  -h, --help                help for patch
      --immediate           apply the change immediately (without a reboot)
      --namespace string    resource namespace (default is to use default namespace per resource)
      --on-reboot           apply the change on next reboot
  -p, --patch string        the patch to be applied to the resource file.
      --patch-file string   a file containing a patch to be applied to the resource.

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl processes

List running processes

talosctl processes [flags]

Options

  -h, --help          help for processes
  -s, --sort string   Column to sort output by. [rss|cpu] (default "rss")
  -w, --watch         Stream running processes

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl read

Read a file on the machine

talosctl read <path> [flags]

Options

  -h, --help   help for read

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl reboot

Reboot a node

talosctl reboot [flags]

Options

  -h, --help   help for reboot

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl recover

Recover a control plane

talosctl recover [flags]

Options

  -h, --help            help for recover
  -s, --source string   The data source for restoring the control plane manifests from (valid options are "apiserver" and "etcd") (default "apiserver")

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl reset

Reset a node

talosctl reset [flags]

Options

      --graceful                        if true, attempt to cordon/drain node and leave etcd (if applicable) (default true)
  -h, --help                            help for reset
      --reboot                          if true, reboot the node after resetting instead of shutting down
      --system-labels-to-wipe strings   if set, just wipe selected system disk partitions by label but keep other partitions intact

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl restart

Restart a process

talosctl restart <id> [flags]

Options

  -h, --help         help for restart
  -k, --kubernetes   use the k8s.io containerd namespace

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl rollback

Rollback a node to the previous installation

talosctl rollback [flags]

Options

  -h, --help   help for rollback

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl routes

List network routes

talosctl routes [flags]

Options

  -h, --help   help for routes

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl service

Retrieve the state of a service (or all services), control service state

Synopsis

Service control command. If run without arguments, lists all the services and their state. If service ID is specified, default action ‘status’ is executed which shows status of a single list service. With actions ‘start’, ‘stop’, ‘restart’, service state is updated respectively.

talosctl service [<id> [start|stop|restart|status]] [flags]

Options

  -h, --help   help for service

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl shutdown

Shutdown a node

talosctl shutdown [flags]

Options

  -h, --help   help for shutdown

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl stats

Get container stats

talosctl stats [flags]

Options

  -h, --help         help for stats
  -k, --kubernetes   use the k8s.io containerd namespace

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl time

Gets current server time

talosctl time [--check server] [flags]

Options

  -c, --check string   checks server time against specified ntp server
  -h, --help           help for time

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl upgrade

Upgrade Talos on the target node

talosctl upgrade [flags]

Options

  -f, --force          force the upgrade (skip checks on etcd health and members, might lead to data loss)
  -h, --help           help for upgrade
  -i, --image string   the container image to use for performing the install
  -p, --preserve       preserve data
  -s, --stage          stage the upgrade to perform it after a reboot

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl upgrade-k8s

Upgrade Kubernetes control plane in the Talos cluster.

Synopsis

Command runs upgrade of Kubernetes control plane components between specified versions. Pod-checkpointer is handled in a special way to speed up kube-apisever upgrades.

talosctl upgrade-k8s [flags]

Options

      --endpoint string   the cluster control plane endpoint
      --from string       the Kubernetes control plane version to upgrade from
  -h, --help              help for upgrade-k8s
      --to string         the Kubernetes control plane version to upgrade to (default "1.20.5")

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl usage

Retrieve a disk usage

talosctl usage [path1] [path2] ... [pathN] [flags]

Options

  -a, --all             write counts for all files, not just directories
  -d, --depth int32     maximum recursion depth
  -h, --help            help for usage
  -H, --humanize        humanize size and time in the output
  -t, --threshold int   threshold exclude entries smaller than SIZE if positive, or entries greater than SIZE if negative

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl validate

Validate config

talosctl validate [flags]

Options

  -c, --config string   the path of the config file
  -h, --help            help for validate
  -m, --mode string     the mode to validate the config for (valid values are metal, cloud, and container)

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl version

Prints the version

talosctl version [flags]

Options

      --client   Print client version only
  -h, --help     help for version
      --short    Print the short version

Options inherited from parent commands

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

talosctl

A CLI for out-of-band management of Kubernetes nodes created by Talos

Options

      --context string       Context to be used in command
  -e, --endpoints strings    override default endpoints in Talos configuration
  -h, --help                 help for talosctl
  -n, --nodes strings        target the specified nodes
      --talosconfig string   The path to the Talos configuration file (default "/home/user/.talos/config")

8.3 - Configuration

Package v1alpha1 configuration file contains all the options available for configuring a machine.

To generate a set of basic configuration files, run:

talosctl gen config --version v1alpha1 <cluster name> <cluster endpoint>

This will generate a machine config for each node type, and a talosconfig for the CLI.

Config

Config defines the v1alpha1 configuration file.

version: v1alpha1
persist: true
machine: # ...
cluster: # ...

version string

Indicates the schema used to decode the contents.

Valid values:

v1alpha1

debug bool

Enable verbose logging to the console. All system containers logs will flow into serial console.

Note: To avoid breaking Talos bootstrap flow enable this option only if serial console can handle high message throughput.

Valid values:

true
yes
false
no

persist bool

Indicates whether to pull the machine config upon every boot.

Valid values:

true
yes
false
no

machine MachineConfig

Provides machine specific configuration options.

cluster ClusterConfig

Provides cluster specific configuration options.

MachineConfig

MachineConfig represents the machine-specific config values.

Appears in:

Config.machine

type: controlplane
# InstallConfig represents the installation options for preparing a node.
install:
    disk: /dev/sda # The disk used for installations.
    # Allows for supplying extra kernel args via the bootloader.
    extraKernelArgs:
        - console=ttyS1
        - panic=10
    image: ghcr.io/talos-systems/installer:latest # Allows for supplying the image used to perform the installation.
    bootloader: true # Indicates if a bootloader should be installed.
    wipe: false # Indicates if the installation disk should be wiped at installation time.

type string

Defines the role of the machine within the cluster.

Init

Init node type designates the first control plane node to come up. You can think of it like a bootstrap node. This node will perform the initial steps to bootstrap the cluster – generation of TLS assets, starting of the control plane, etc.

Control Plane

Control Plane node type designates the node as a control plane member. This means it will host etcd along with the Kubernetes master components such as API Server, Controller Manager, Scheduler.

Worker

Worker node type designates the node as a worker node. This means it will be an available compute node for scheduling workloads.

Valid values:

init
controlplane
join

token string

The token is used by a machine to join the PKI of the cluster. Using this token, a machine will create a certificate signing request (CSR), and request a certificate that will be used as its’ identity.

Warning: It is important to ensure that this token is correct since a machine’s certificate has a short TTL by default.

Examples:

token: 328hom.uqjzh6jnn2eie9oi

ca PEMEncodedCertificateAndKey

The root certificate authority of the PKI. It is composed of a base64 encoded crt and key.

Examples:

ca:
    crt: TFMwdExTMUNSVWRKVGlCRFJWSlVTVVpKUTBGVVJTMHRMUzB0Q2sxSlNVSklla05DTUhGLi4u
    key: TFMwdExTMUNSVWRKVGlCRlJESTFOVEU1SUZCU1NWWkJWRVVnUzBWWkxTMHRMUzBLVFVNLi4u

certSANs []string

Extra certificate subject alternative names for the machine’s certificate. By default, all non-loopback interface IPs are automatically added to the certificate’s SANs.

Examples:

certSANs:
    - 10.0.0.10
    - 172.16.0.10
    - 192.168.0.10

kubelet KubeletConfig

Used to provide additional options to the kubelet.

Examples:

kubelet:
    image: ghcr.io/talos-systems/kubelet:v1.20.5 # The `image` field is an optional reference to an alternative kubelet image.
    # The `extraArgs` field is used to provide additional flags to the kubelet.
    extraArgs:
        feature-gates: ServerSideApply=true

    # # The `extraMounts` field is used to add additional mounts to the kubelet container.
    # extraMounts:
    #     - destination: /var/lib/example
    #       type: bind
    #       source: /var/lib/example
    #       options:
    #         - rshared
    #         - rw

network NetworkConfig

Provides machine specific network configuration options.

Examples:

network:
    hostname: worker-1 # Used to statically set the hostname for the machine.
    # `interfaces` is used to define the network interface configuration.
    interfaces:
        - interface: eth0 # The interface name.
          cidr: 192.168.2.0/24 # Assigns a static IP address to the interface.
          # A list of routes associated with the interface.
          routes:
            - network: 0.0.0.0/0 # The route's network.
              gateway: 192.168.2.1 # The route's gateway.
              metric: 1024 # The optional metric for the route.
          mtu: 1500 # The interface's MTU.

          # # Bond specific options.
          # bond:
          #     # The interfaces that make up the bond.
          #     interfaces:
          #         - eth0
          #         - eth1
          #     mode: 802.3ad # A bond option.
          #     lacpRate: fast # A bond option.

          # # Indicates if DHCP should be used to configure the interface.
          # dhcp: true

          # # DHCP specific options.
          # dhcpOptions:
          #     routeMetric: 1024 # The priority of all routes received via DHCP.

          # # Wireguard specific configuration.

          # # wireguard server example
          # wireguard:
          #     privateKey: ABCDEF... # Specifies a private key configuration (base64 encoded).
          #     listenPort: 51111 # Specifies a device's listening port.
          #     # Specifies a list of peer configurations to apply to a device.
          #     peers:
          #         - publicKey: ABCDEF... # Specifies the public key of this peer.
          #           endpoint: 192.168.1.3 # Specifies the endpoint of this peer entry.
          #           # AllowedIPs specifies a list of allowed IP addresses in CIDR notation for this peer.
          #           allowedIPs:
          #             - 192.168.1.0/24
          # # wireguard peer example
          # wireguard:
          #     privateKey: ABCDEF... # Specifies a private key configuration (base64 encoded).
          #     # Specifies a list of peer configurations to apply to a device.
          #     peers:
          #         - publicKey: ABCDEF... # Specifies the public key of this peer.
          #           endpoint: 192.168.1.2 # Specifies the endpoint of this peer entry.
          #           persistentKeepaliveInterval: 10s # Specifies the persistent keepalive interval for this peer.
          #           # AllowedIPs specifies a list of allowed IP addresses in CIDR notation for this peer.
          #           allowedIPs:
          #             - 192.168.1.0/24

          # # Virtual (shared) IP address configuration.
          # vip:
          #     ip: 172.16.199.55 # Specifies the IP address to be used.
    # Used to statically set the nameservers for the machine.
    nameservers:
        - 9.8.7.6
        - 8.7.6.5

    # # Allows for extra entries to be added to the `/etc/hosts` file
    # extraHostEntries:
    #     - ip: 192.168.1.100 # The IP of the host.
    #       # The host alias.
    #       aliases:
    #         - example
    #         - example.domain.tld

disks []MachineDisk

Used to partition, format and mount additional disks. Since the rootfs is read only with the exception of /var, mounts are only valid if they are under /var. Note that the partitioning and formating is done only once, if and only if no existing partitions are found. If size: is omitted, the partition is sized to occupy the full disk.

Note: size is in units of bytes.

Examples:

disks:
    - device: /dev/sdb # The name of the disk to use.
      # A list of partitions to create on the disk.
      partitions:
        - mountpoint: /var/mnt/extra # Where to mount the partition.

          # # The size of partition: either bytes or human readable representation. If `size:` is omitted, the partition is sized to occupy the full disk.

          # # Human readable representation.
          # size: 100 MB
          # # Precise value in bytes.
          # size: 1073741824

install InstallConfig

Used to provide instructions for installations.

Examples:

install:
    disk: /dev/sda # The disk used for installations.
    # Allows for supplying extra kernel args via the bootloader.
    extraKernelArgs:
        - console=ttyS1
        - panic=10
    image: ghcr.io/talos-systems/installer:latest # Allows for supplying the image used to perform the installation.
    bootloader: true # Indicates if a bootloader should be installed.
    wipe: false # Indicates if the installation disk should be wiped at installation time.

files []MachineFile

Allows the addition of user specified files. The value of op can be create, overwrite, or append. In the case of create, path must not exist. In the case of overwrite, and append, path must be a valid file. If an op value of append is used, the existing file will be appended. Note that the file contents are not required to be base64 encoded.

Note: The specified path is relative to /var.

Examples:

files:
    - content: '...' # The contents of the file.
      permissions: 0o666 # The file's permissions in octal.
      path: /tmp/file.txt # The path of the file.
      op: append # The operation to use

env Env

The env field allows for the addition of environment variables. All environment variables are set on PID 1 in addition to every service.

Valid values:

GRPC_GO_LOG_VERBOSITY_LEVEL
GRPC_GO_LOG_SEVERITY_LEVEL
http_proxy
https_proxy
no_proxy

Examples:

env:
    GRPC_GO_LOG_SEVERITY_LEVEL: info
    GRPC_GO_LOG_VERBOSITY_LEVEL: "99"
    https_proxy: http://SERVER:PORT/

env:
    GRPC_GO_LOG_SEVERITY_LEVEL: error
    https_proxy: https://USERNAME:PASSWORD@SERVER:PORT/

env:
    https_proxy: http://DOMAIN\USERNAME:PASSWORD@SERVER:PORT/

time TimeConfig

Used to configure the machine’s time settings.

Examples:

time:
    disabled: false # Indicates if the time service is disabled for the machine.
    # Specifies time (NTP) servers to use for setting the system time.
    servers:
        - time.cloudflare.com

sysctls map[string]string

Used to configure the machine’s sysctls.

Examples:

sysctls:
    kernel.domainname: talos.dev
    net.ipv4.ip_forward: "0"

registries RegistriesConfig

Used to configure the machine’s container image registry mirrors.

Automatically generates matching CRI configuration for registry mirrors.

The mirrors section allows to redirect requests for images to non-default registry, which might be local registry or caching mirror.

The config section provides a way to authenticate to the registry with TLS client identity, provide registry CA, or authentication information. Authentication information has same meaning with the corresponding field in .docker/config.json.

See also matching configuration for CRI containerd plugin.

Examples:

registries:
    # Specifies mirror configuration for each registry.
    mirrors:
        docker.io:
            # List of endpoints (URLs) for registry mirrors to use.
            endpoints:
                - https://registry.local
    # Specifies TLS & auth configuration for HTTPS image registries.
    config:
        registry.local:
            # The TLS configuration for the registry.
            tls:
                # Enable mutual TLS authentication with the registry.
                clientIdentity:
                    crt: TFMwdExTMUNSVWRKVGlCRFJWSlVTVVpKUTBGVVJTMHRMUzB0Q2sxSlNVSklla05DTUhGLi4u
                    key: TFMwdExTMUNSVWRKVGlCRlJESTFOVEU1SUZCU1NWWkJWRVVnUzBWWkxTMHRMUzBLVFVNLi4u
            # The auth configuration for this registry.
            auth:
                username: username # Optional registry authentication.
                password: password # Optional registry authentication.

systemDiskEncryption SystemDiskEncryptionConfig

Machine system disk encryption configuration. Defines each system partition encryption parameters.

Examples:

systemDiskEncryption:
    # Ephemeral partition encryption.
    ephemeral:
        provider: luks2 # Encryption provider to use for the encryption.
        # Defines the encryption keys generation and storage method.
        keys:
            - # Deterministically generated key from the node UUID and PartitionLabel.
              nodeID: {}
              slot: 0 # Key slot number for luks2 encryption.

ClusterConfig

ClusterConfig represents the cluster-wide config values.

Appears in:

Config.cluster

# ControlPlaneConfig represents the control plane configuration options.
controlPlane:
    endpoint: https://1.2.3.4 # Endpoint is the canonical controlplane endpoint, which can be an IP address or a DNS hostname.
    localAPIServerPort: 443 # The port that the API server listens on internally.
clusterName: talos.local
# ClusterNetworkConfig represents kube networking configuration options.
network:
    # The CNI used.
    cni:
        name: flannel # Name of CNI to use.
    dnsDomain: cluster.local # The domain used by Kubernetes DNS.
    # The pod subnet CIDR.
    podSubnets:
        - 10.244.0.0/16
    # The service subnet CIDR.
    serviceSubnets:
        - 10.96.0.0/12

controlPlane ControlPlaneConfig

Provides control plane specific configuration options.

Examples:

controlPlane:
    endpoint: https://1.2.3.4 # Endpoint is the canonical controlplane endpoint, which can be an IP address or a DNS hostname.
    localAPIServerPort: 443 # The port that the API server listens on internally.

clusterName string

Configures the cluster’s name.

network ClusterNetworkConfig

Provides cluster specific network configuration options.

Examples:

network:
    # The CNI used.
    cni:
        name: flannel # Name of CNI to use.
    dnsDomain: cluster.local # The domain used by Kubernetes DNS.
    # The pod subnet CIDR.
    podSubnets:
        - 10.244.0.0/16
    # The service subnet CIDR.
    serviceSubnets:
        - 10.96.0.0/12

token string

The bootstrap token used to join the cluster.

Examples:

token: wlzjyw.bei2zfylhs2by0wd

aescbcEncryptionSecret string

The key used for the encryption of secret data at rest.

Examples:

aescbcEncryptionSecret: z01mye6j16bspJYtTB/5SFX8j7Ph4JXxM2Xuu4vsBPM=

ca PEMEncodedCertificateAndKey

The base64 encoded root certificate authority used by Kubernetes.

Examples:

ca:
    crt: TFMwdExTMUNSVWRKVGlCRFJWSlVTVVpKUTBGVVJTMHRMUzB0Q2sxSlNVSklla05DTUhGLi4u
    key: TFMwdExTMUNSVWRKVGlCRlJESTFOVEU1SUZCU1NWWkJWRVVnUzBWWkxTMHRMUzBLVFVNLi4u

aggregatorCA PEMEncodedCertificateAndKey

The base64 encoded aggregator certificate authority used by Kubernetes for front-proxy certificate generation.

This CA can be self-signed.

Examples:

aggregatorCA:
    crt: TFMwdExTMUNSVWRKVGlCRFJWSlVTVVpKUTBGVVJTMHRMUzB0Q2sxSlNVSklla05DTUhGLi4u
    key: TFMwdExTMUNSVWRKVGlCRlJESTFOVEU1SUZCU1NWWkJWRVVnUzBWWkxTMHRMUzBLVFVNLi4u

serviceAccount PEMEncodedKey

The base64 encoded private key for service account token generation.

Examples:

serviceAccount:
    key: TFMwdExTMUNSVWRKVGlCRlJESTFOVEU1SUZCU1NWWkJWRVVnUzBWWkxTMHRMUzBLVFVNLi4u

apiServer APIServerConfig

API server specific configuration options.

Examples:

apiServer:
    image: k8s.gcr.io/kube-apiserver:v1.20.5 # The container image used in the API server manifest.
    # Extra arguments to supply to the API server.
    extraArgs:
        feature-gates: ServerSideApply=true
        http2-max-streams-per-connection: "32"
    # Extra certificate subject alternative names for the API server's certificate.
    certSANs:
        - 1.2.3.4
        - 4.5.6.7

controllerManager ControllerManagerConfig

Controller manager server specific configuration options.

Examples:

controllerManager:
    image: k8s.gcr.io/kube-controller-manager:v1.20.5 # The container image used in the controller manager manifest.
    # Extra arguments to supply to the controller manager.
    extraArgs:
        feature-gates: ServerSideApply=true

proxy ProxyConfig

Kube-proxy server-specific configuration options

Examples:

proxy:
    image: k8s.gcr.io/kube-proxy:v1.20.5 # The container image used in the kube-proxy manifest.
    mode: ipvs # proxy mode of kube-proxy.
    # Extra arguments to supply to kube-proxy.
    extraArgs:
        proxy-mode: iptables

scheduler SchedulerConfig

Scheduler server specific configuration options.

Examples:

scheduler:
    image: k8s.gcr.io/kube-scheduler:v1.20.5 # The container image used in the scheduler manifest.
    # Extra arguments to supply to the scheduler.
    extraArgs:
        feature-gates: AllBeta=true

etcd EtcdConfig

Etcd specific configuration options.

Examples:

etcd:
    image: gcr.io/etcd-development/etcd:v3.4.15 # The container image used to create the etcd service.
    # The `ca` is the root certificate authority of the PKI.
    ca:
        crt: TFMwdExTMUNSVWRKVGlCRFJWSlVTVVpKUTBGVVJTMHRMUzB0Q2sxSlNVSklla05DTUhGLi4u
        key: TFMwdExTMUNSVWRKVGlCRlJESTFOVEU1SUZCU1NWWkJWRVVnUzBWWkxTMHRMUzBLVFVNLi4u
    # Extra arguments to supply to etcd.
    extraArgs:
        election-timeout: "5000"

podCheckpointer PodCheckpointer

Pod Checkpointer specific configuration options.

Examples:

podCheckpointer:
    image: '...' # The `image` field is an override to the default pod-checkpointer image.

coreDNS CoreDNS

Core DNS specific configuration options.

Examples:

coreDNS:
    image: docker.io/coredns/coredns:1.8.0 # The `image` field is an override to the default coredns image.

extraManifests []string

A list of urls that point to additional manifests. These will get automatically deployed as part of the bootstrap.

Examples:

extraManifests:
    - https://www.example.com/manifest1.yaml
    - https://www.example.com/manifest2.yaml

extraManifestHeaders map[string]string

A map of key value pairs that will be added while fetching the ExtraManifests.

Examples:

extraManifestHeaders:
    Token: "1234567"
    X-ExtraInfo: info

adminKubeconfig AdminKubeconfigConfig

Settings for admin kubeconfig generation. Certificate lifetime can be configured.

Examples:

adminKubeconfig:
    certLifetime: 1h0m0s # Admin kubeconfig certificate lifetime (default is 1 year).

allowSchedulingOnMasters bool

Allows running workload on master nodes.

Valid values:

true
yes
false
no

KubeletConfig

KubeletConfig represents the kubelet config values.

Appears in:

MachineConfig.kubelet

image: ghcr.io/talos-systems/kubelet:v1.20.5 # The `image` field is an optional reference to an alternative kubelet image.
# The `extraArgs` field is used to provide additional flags to the kubelet.
extraArgs:
    feature-gates: ServerSideApply=true

# # The `extraMounts` field is used to add additional mounts to the kubelet container.
# extraMounts:
#     - destination: /var/lib/example
#       type: bind
#       source: /var/lib/example
#       options:
#         - rshared
#         - rw

image string

The image field is an optional reference to an alternative kubelet image.

Examples:

image: ghcr.io/talos-systems/kubelet:v1.20.5

extraArgs map[string]string

The extraArgs field is used to provide additional flags to the kubelet.

Examples:

extraArgs:
    key: value

extraMounts []Mount

The extraMounts field is used to add additional mounts to the kubelet container.

Examples:

extraMounts:
    - destination: /var/lib/example
      type: bind
      source: /var/lib/example
      options:
        - rshared
        - rw

registerWithFQDN bool

The registerWithFQDN field is used to force kubelet to use the node FQDN for registration. This is required in clouds like AWS.

Valid values:

true
yes
false
no

NetworkConfig

NetworkConfig represents the machine’s networking config values.

Appears in:

MachineConfig.network

hostname: worker-1 # Used to statically set the hostname for the machine.
# `interfaces` is used to define the network interface configuration.
interfaces:
    - interface: eth0 # The interface name.
      cidr: 192.168.2.0/24 # Assigns a static IP address to the interface.
      # A list of routes associated with the interface.
      routes:
        - network: 0.0.0.0/0 # The route's network.
          gateway: 192.168.2.1 # The route's gateway.
          metric: 1024 # The optional metric for the route.
      mtu: 1500 # The interface's MTU.

      # # Bond specific options.
      # bond:
      #     # The interfaces that make up the bond.
      #     interfaces:
      #         - eth0
      #         - eth1
      #     mode: 802.3ad # A bond option.
      #     lacpRate: fast # A bond option.

      # # Indicates if DHCP should be used to configure the interface.
      # dhcp: true

      # # DHCP specific options.
      # dhcpOptions:
      #     routeMetric: 1024 # The priority of all routes received via DHCP.

      # # Wireguard specific configuration.

      # # wireguard server example
      # wireguard:
      #     privateKey: ABCDEF... # Specifies a private key configuration (base64 encoded).
      #     listenPort: 51111 # Specifies a device's listening port.
      #     # Specifies a list of peer configurations to apply to a device.
      #     peers:
      #         - publicKey: ABCDEF... # Specifies the public key of this peer.
      #           endpoint: 192.168.1.3 # Specifies the endpoint of this peer entry.
      #           # AllowedIPs specifies a list of allowed IP addresses in CIDR notation for this peer.
      #           allowedIPs:
      #             - 192.168.1.0/24
      # # wireguard peer example
      # wireguard:
      #     privateKey: ABCDEF... # Specifies a private key configuration (base64 encoded).
      #     # Specifies a list of peer configurations to apply to a device.
      #     peers:
      #         - publicKey: ABCDEF... # Specifies the public key of this peer.
      #           endpoint: 192.168.1.2 # Specifies the endpoint of this peer entry.
      #           persistentKeepaliveInterval: 10s # Specifies the persistent keepalive interval for this peer.
      #           # AllowedIPs specifies a list of allowed IP addresses in CIDR notation for this peer.
      #           allowedIPs:
      #             - 192.168.1.0/24

      # # Virtual (shared) IP address configuration.
      # vip:
      #     ip: 172.16.199.55 # Specifies the IP address to be used.
# Used to statically set the nameservers for the machine.
nameservers:
    - 9.8.7.6
    - 8.7.6.5

# # Allows for extra entries to be added to the `/etc/hosts` file
# extraHostEntries:
#     - ip: 192.168.1.100 # The IP of the host.
#       # The host alias.
#       aliases:
#         - example
#         - example.domain.tld

hostname string

Used to statically set the hostname for the machine.

interfaces []Device

interfaces is used to define the network interface configuration. By default all network interfaces will attempt a DHCP discovery. This can be further tuned through this configuration parameter.

Examples:

interfaces:
    - interface: eth0 # The interface name.
      cidr: 192.168.2.0/24 # Assigns a static IP address to the interface.
      # A list of routes associated with the interface.
      routes:
        - network: 0.0.0.0/0 # The route's network.
          gateway: 192.168.2.1 # The route's gateway.
          metric: 1024 # The optional metric for the route.
      mtu: 1500 # The interface's MTU.

      # # Bond specific options.
      # bond:
      #     # The interfaces that make up the bond.
      #     interfaces:
      #         - eth0
      #         - eth1
      #     mode: 802.3ad # A bond option.
      #     lacpRate: fast # A bond option.

      # # Indicates if DHCP should be used to configure the interface.
      # dhcp: true

      # # DHCP specific options.
      # dhcpOptions:
      #     routeMetric: 1024 # The priority of all routes received via DHCP.

      # # Wireguard specific configuration.

      # # wireguard server example
      # wireguard:
      #     privateKey: ABCDEF... # Specifies a private key configuration (base64 encoded).
      #     listenPort: 51111 # Specifies a device's listening port.
      #     # Specifies a list of peer configurations to apply to a device.
      #     peers:
      #         - publicKey: ABCDEF... # Specifies the public key of this peer.
      #           endpoint: 192.168.1.3 # Specifies the endpoint of this peer entry.
      #           # AllowedIPs specifies a list of allowed IP addresses in CIDR notation for this peer.
      #           allowedIPs:
      #             - 192.168.1.0/24
      # # wireguard peer example
      # wireguard:
      #     privateKey: ABCDEF... # Specifies a private key configuration (base64 encoded).
      #     # Specifies a list of peer configurations to apply to a device.
      #     peers:
      #         - publicKey: ABCDEF... # Specifies the public key of this peer.
      #           endpoint: 192.168.1.2 # Specifies the endpoint of this peer entry.
      #           persistentKeepaliveInterval: 10s # Specifies the persistent keepalive interval for this peer.
      #           # AllowedIPs specifies a list of allowed IP addresses in CIDR notation for this peer.
      #           allowedIPs:
      #             - 192.168.1.0/24

      # # Virtual (shared) IP address configuration.
      # vip:
      #     ip: 172.16.199.55 # Specifies the IP address to be used.

nameservers []string

Used to statically set the nameservers for the machine. Defaults to 1.1.1.1 and 8.8.8.8

Examples:

nameservers:
    - 8.8.8.8
    - 1.1.1.1

extraHostEntries []ExtraHost

Allows for extra entries to be added to the /etc/hosts file

Examples:

extraHostEntries:
    - ip: 192.168.1.100 # The IP of the host.
      # The host alias.
      aliases:
        - example
        - example.domain.tld

InstallConfig

InstallConfig represents the installation options for preparing a node.

Appears in:

MachineConfig.install

disk: /dev/sda # The disk used for installations.
# Allows for supplying extra kernel args via the bootloader.
extraKernelArgs:
    - console=ttyS1
    - panic=10
image: ghcr.io/talos-systems/installer:latest # Allows for supplying the image used to perform the installation.
bootloader: true # Indicates if a bootloader should be installed.
wipe: false # Indicates if the installation disk should be wiped at installation time.

disk string

The disk used for installations.

Examples:

disk: /dev/sda

disk: /dev/nvme0

extraKernelArgs []string

Allows for supplying extra kernel args via the bootloader.

Examples:

extraKernelArgs:
    - talos.platform=metal
    - reboot=k

image string

Allows for supplying the image used to perform the installation. Image reference for each Talos release can be found on GitHub releases page.

Examples:

image: ghcr.io/talos-systems/installer:latest

bootloader bool

Indicates if a bootloader should be installed.

Valid values:

true
yes
false
no

wipe bool

Indicates if the installation disk should be wiped at installation time. Defaults to true.

Valid values:

true
yes
false
no

TimeConfig

TimeConfig represents the options for configuring time on a machine.

Appears in:

MachineConfig.time

disabled: false # Indicates if the time service is disabled for the machine.
# Specifies time (NTP) servers to use for setting the system time.
servers:
    - time.cloudflare.com

disabled bool

Indicates if the time service is disabled for the machine. Defaults to false.

servers []string

Specifies time (NTP) servers to use for setting the system time. Defaults to pool.ntp.org

This parameter only supports a single time server.

RegistriesConfig

RegistriesConfig represents the image pull options.

Appears in:

MachineConfig.registries

# Specifies mirror configuration for each registry.
mirrors:
    docker.io:
        # List of endpoints (URLs) for registry mirrors to use.
        endpoints:
            - https://registry.local
# Specifies TLS & auth configuration for HTTPS image registries.
config:
    registry.local:
        # The TLS configuration for the registry.
        tls:
            # Enable mutual TLS authentication with the registry.
            clientIdentity:
                crt: TFMwdExTMUNSVWRKVGlCRFJWSlVTVVpKUTBGVVJTMHRMUzB0Q2sxSlNVSklla05DTUhGLi4u
                key: TFMwdExTMUNSVWRKVGlCRlJESTFOVEU1SUZCU1NWWkJWRVVnUzBWWkxTMHRMUzBLVFVNLi4u
        # The auth configuration for this registry.
        auth:
            username: username # Optional registry authentication.
            password: password # Optional registry authentication.

mirrors map[string]RegistryMirrorConfig

Specifies mirror configuration for each registry. This setting allows to use local pull-through caching registires, air-gapped installations, etc.

Registry name is the first segment of image identifier, with ‘docker.io’ being default one. To catch any registry names not specified explicitly, use ‘*’.

Examples:

mirrors:
    ghcr.io:
        # List of endpoints (URLs) for registry mirrors to use.
        endpoints:
            - https://registry.insecure
            - https://ghcr.io/v2/

config map[string]RegistryConfig

Specifies TLS & auth configuration for HTTPS image registries. Mutual TLS can be enabled with ‘clientIdentity’ option.

TLS configuration can be skipped if registry has trusted server certificate.

Examples:

config:
    registry.insecure:
        # The TLS configuration for the registry.
        tls:
            insecureSkipVerify: true # Skip TLS server certificate verification (not recommended).

            # # Enable mutual TLS authentication with the registry.
            # clientIdentity:
            #     crt: TFMwdExTMUNSVWRKVGlCRFJWSlVTVVpKUTBGVVJTMHRMUzB0Q2sxSlNVSklla05DTUhGLi4u
            #     key: TFMwdExTMUNSVWRKVGlCRlJESTFOVEU1SUZCU1NWWkJWRVVnUzBWWkxTMHRMUzBLVFVNLi4u

        # # The auth configuration for this registry.
        # auth:
        #     username: username # Optional registry authentication.
        #     password: password # Optional registry authentication.

PodCheckpointer

PodCheckpointer represents the pod-checkpointer config values.

Appears in:

ClusterConfig.podCheckpointer

image: '...' # The `image` field is an override to the default pod-checkpointer image.

image string

The image field is an override to the default pod-checkpointer image.

CoreDNS

CoreDNS represents the CoreDNS config values.

Appears in:

ClusterConfig.coreDNS

image: docker.io/coredns/coredns:1.8.0 # The `image` field is an override to the default coredns image.

image string

The image field is an override to the default coredns image.

Endpoint

Endpoint represents the endpoint URL parsed out of the machine config.

Appears in:

ControlPlaneConfig.endpoint

https://1.2.3.4:6443

https://cluster1.internal:6443

ControlPlaneConfig

ControlPlaneConfig represents the control plane configuration options.

Appears in:

ClusterConfig.controlPlane

endpoint: https://1.2.3.4 # Endpoint is the canonical controlplane endpoint, which can be an IP address or a DNS hostname.
localAPIServerPort: 443 # The port that the API server listens on internally.

endpoint Endpoint

Endpoint is the canonical controlplane endpoint, which can be an IP address or a DNS hostname. It is single-valued, and may optionally include a port number.

Examples:

endpoint: https://1.2.3.4:6443

endpoint: https://cluster1.internal:6443

localAPIServerPort int

The port that the API server listens on internally. This may be different than the port portion listed in the endpoint field above. The default is 6443.

APIServerConfig

APIServerConfig represents the kube apiserver configuration options.

Appears in:

ClusterConfig.apiServer

image: k8s.gcr.io/kube-apiserver:v1.20.5 # The container image used in the API server manifest.
# Extra arguments to supply to the API server.
extraArgs:
    feature-gates: ServerSideApply=true
    http2-max-streams-per-connection: "32"
# Extra certificate subject alternative names for the API server's certificate.
certSANs:
    - 1.2.3.4
    - 4.5.6.7

image string

The container image used in the API server manifest.

Examples:

image: k8s.gcr.io/kube-apiserver:v1.20.5

extraArgs map[string]string

Extra arguments to supply to the API server.

extraVolumes []VolumeMountConfig

Extra volumes to mount to the API server static pod.

certSANs []string

Extra certificate subject alternative names for the API server’s certificate.

ControllerManagerConfig

ControllerManagerConfig represents the kube controller manager configuration options.

Appears in:

ClusterConfig.controllerManager

image: k8s.gcr.io/kube-controller-manager:v1.20.5 # The container image used in the controller manager manifest.
# Extra arguments to supply to the controller manager.
extraArgs:
    feature-gates: ServerSideApply=true

image string

The container image used in the controller manager manifest.

Examples:

image: k8s.gcr.io/kube-controller-manager:v1.20.5

extraArgs map[string]string

Extra arguments to supply to the controller manager.

extraVolumes []VolumeMountConfig

Extra volumes to mount to the controller manager static pod.

ProxyConfig

ProxyConfig represents the kube proxy configuration options.

Appears in:

ClusterConfig.proxy

image: k8s.gcr.io/kube-proxy:v1.20.5 # The container image used in the kube-proxy manifest.
mode: ipvs # proxy mode of kube-proxy.
# Extra arguments to supply to kube-proxy.
extraArgs:
    proxy-mode: iptables

disabled bool

Disable kube-proxy deployment on cluster bootstrap.

Examples:

disabled: false

image string

The container image used in the kube-proxy manifest.

Examples:

image: k8s.gcr.io/kube-proxy:v1.20.5

mode string

proxy mode of kube-proxy. The default is ‘iptables’.

extraArgs map[string]string

Extra arguments to supply to kube-proxy.

SchedulerConfig

SchedulerConfig represents the kube scheduler configuration options.

Appears in:

ClusterConfig.scheduler

image: k8s.gcr.io/kube-scheduler:v1.20.5 # The container image used in the scheduler manifest.
# Extra arguments to supply to the scheduler.
extraArgs:
    feature-gates: AllBeta=true

image string

The container image used in the scheduler manifest.

Examples:

image: k8s.gcr.io/kube-scheduler:v1.20.5

extraArgs map[string]string

Extra arguments to supply to the scheduler.

extraVolumes []VolumeMountConfig

Extra volumes to mount to the scheduler static pod.

EtcdConfig

EtcdConfig represents the etcd configuration options.

Appears in:

ClusterConfig.etcd

image: gcr.io/etcd-development/etcd:v3.4.15 # The container image used to create the etcd service.
# The `ca` is the root certificate authority of the PKI.
ca:
    crt: TFMwdExTMUNSVWRKVGlCRFJWSlVTVVpKUTBGVVJTMHRMUzB0Q2sxSlNVSklla05DTUhGLi4u
    key: TFMwdExTMUNSVWRKVGlCRlJESTFOVEU1SUZCU1NWWkJWRVVnUzBWWkxTMHRMUzBLVFVNLi4u
# Extra arguments to supply to etcd.
extraArgs:
    election-timeout: "5000"

image string

The container image used to create the etcd service.

Examples:

image: gcr.io/etcd-development/etcd:v3.4.15

ca PEMEncodedCertificateAndKey

The ca is the root certificate authority of the PKI. It is composed of a base64 encoded crt and key.

Examples:

ca:
    crt: TFMwdExTMUNSVWRKVGlCRFJWSlVTVVpKUTBGVVJTMHRMUzB0Q2sxSlNVSklla05DTUhGLi4u
    key: TFMwdExTMUNSVWRKVGlCRlJESTFOVEU1SUZCU1NWWkJWRVVnUzBWWkxTMHRMUzBLVFVNLi4u

extraArgs map[string]string

Extra arguments to supply to etcd. Note that the following args are not allowed:

name
data-dir
initial-cluster-state
listen-peer-urls
listen-client-urls
cert-file
key-file
trusted-ca-file
peer-client-cert-auth
peer-cert-file
peer-trusted-ca-file
peer-key-file

ClusterNetworkConfig

ClusterNetworkConfig represents kube networking configuration options.

Appears in:

ClusterConfig.network

# The CNI used.
cni:
    name: flannel # Name of CNI to use.
dnsDomain: cluster.local # The domain used by Kubernetes DNS.
# The pod subnet CIDR.
podSubnets:
    - 10.244.0.0/16
# The service subnet CIDR.
serviceSubnets:
    - 10.96.0.0/12

cni CNIConfig

The CNI used. Composed of “name” and “url”. The “name” key only supports options of “flannel” or “custom”. URLs is only used if name is equal to “custom”. URLs should point to the set of YAML files to be deployed. An empty struct or any other name will default to Flannel CNI.

Examples:

cni:
    name: custom # Name of CNI to use.
    # URLs containing manifests to apply for the CNI.
    urls:
        - https://raw.githubusercontent.com/cilium/cilium/v1.8/install/kubernetes/quick-install.yaml

dnsDomain string

The domain used by Kubernetes DNS. The default is cluster.local

Examples:

dnsDomain: cluser.local

podSubnets []string

The pod subnet CIDR.

Examples:

podSubnets:
    - 10.244.0.0/16

serviceSubnets []string

The service subnet CIDR.

Examples:

serviceSubnets:
    - 10.96.0.0/12

CNIConfig

CNIConfig represents the CNI configuration options.

Appears in:

ClusterNetworkConfig.cni

name: custom # Name of CNI to use.
# URLs containing manifests to apply for the CNI.
urls:
    - https://raw.githubusercontent.com/cilium/cilium/v1.8/install/kubernetes/quick-install.yaml

name string

Name of CNI to use.

urls []string

URLs containing manifests to apply for the CNI.

AdminKubeconfigConfig

AdminKubeconfigConfig contains admin kubeconfig settings.

Appears in:

ClusterConfig.adminKubeconfig

certLifetime: 1h0m0s # Admin kubeconfig certificate lifetime (default is 1 year).

certLifetime Duration

Admin kubeconfig certificate lifetime (default is 1 year). Field format accepts any Go time.Duration format (‘1h’ for one hour, ‘10m’ for ten minutes).

MachineDisk

MachineDisk represents the options available for partitioning, formatting, and mounting extra disks.

Appears in:

MachineConfig.disks

- device: /dev/sdb # The name of the disk to use.
  # A list of partitions to create on the disk.
  partitions:
    - mountpoint: /var/mnt/extra # Where to mount the partition.

      # # The size of partition: either bytes or human readable representation. If `size:` is omitted, the partition is sized to occupy the full disk.

      # # Human readable representation.
      # size: 100 MB
      # # Precise value in bytes.
      # size: 1073741824

device string

The name of the disk to use.

partitions []DiskPartition

A list of partitions to create on the disk.

DiskPartition

DiskPartition represents the options for a disk partition.

Appears in:

MachineDisk.partitions

size DiskSize

The size of partition: either bytes or human readable representation. If size: is omitted, the partition is sized to occupy the full disk.

Examples:

size: 100 MB

size: 1073741824

mountpoint string

Where to mount the partition.

EncryptionConfig

EncryptionConfig represents partition encryption settings.

Appears in:

SystemDiskEncryptionConfig.state
SystemDiskEncryptionConfig.ephemeral

provider string

Encryption provider to use for the encryption.

Examples:

provider: luks2

keys []EncryptionKey

Defines the encryption keys generation and storage method.

cipher string

Cipher kind to use for the encryption. Depends on the encryption provider.

EncryptionKey

EncryptionKey represents configuration for disk encryption key.

Appears in:

EncryptionConfig.keys

static EncryptionKeyStatic

Key which value is stored in the configuration file.

nodeID EncryptionKeyNodeID

Deterministically generated key from the node UUID and PartitionLabel.

slot int

Key slot number for luks2 encryption.

EncryptionKeyStatic

EncryptionKeyStatic represents throw away key type.

Appears in:

EncryptionKey.static

passphrase string

Defines the static passphrase value.

EncryptionKeyNodeID

EncryptionKeyNodeID represents deterministically generated key from the node UUID and PartitionLabel.

Appears in:

EncryptionKey.nodeID

MachineFile

MachineFile represents a file to write to disk.

Appears in:

MachineConfig.files

- content: '...' # The contents of the file.
  permissions: 0o666 # The file's permissions in octal.
  path: /tmp/file.txt # The path of the file.
  op: append # The operation to use

content string

The contents of the file.

permissions FileMode

The file’s permissions in octal.

path string

The path of the file.

op string

The operation to use

Valid values:

create
append
overwrite

ExtraHost

ExtraHost represents a host entry in /etc/hosts.

Appears in:

NetworkConfig.extraHostEntries

- ip: 192.168.1.100 # The IP of the host.
  # The host alias.
  aliases:
    - example
    - example.domain.tld

ip string

The IP of the host.

aliases []string

The host alias.

Device

Device represents a network interface.

Appears in:

NetworkConfig.interfaces

- interface: eth0 # The interface name.
  cidr: 192.168.2.0/24 # Assigns a static IP address to the interface.
  # A list of routes associated with the interface.
  routes:
    - network: 0.0.0.0/0 # The route's network.
      gateway: 192.168.2.1 # The route's gateway.
      metric: 1024 # The optional metric for the route.
  mtu: 1500 # The interface's MTU.

  # # Bond specific options.
  # bond:
  #     # The interfaces that make up the bond.
  #     interfaces:
  #         - eth0
  #         - eth1
  #     mode: 802.3ad # A bond option.
  #     lacpRate: fast # A bond option.

  # # Indicates if DHCP should be used to configure the interface.
  # dhcp: true

  # # DHCP specific options.
  # dhcpOptions:
  #     routeMetric: 1024 # The priority of all routes received via DHCP.

  # # Wireguard specific configuration.

  # # wireguard server example
  # wireguard:
  #     privateKey: ABCDEF... # Specifies a private key configuration (base64 encoded).
  #     listenPort: 51111 # Specifies a device's listening port.
  #     # Specifies a list of peer configurations to apply to a device.
  #     peers:
  #         - publicKey: ABCDEF... # Specifies the public key of this peer.
  #           endpoint: 192.168.1.3 # Specifies the endpoint of this peer entry.
  #           # AllowedIPs specifies a list of allowed IP addresses in CIDR notation for this peer.
  #           allowedIPs:
  #             - 192.168.1.0/24
  # # wireguard peer example
  # wireguard:
  #     privateKey: ABCDEF... # Specifies a private key configuration (base64 encoded).
  #     # Specifies a list of peer configurations to apply to a device.
  #     peers:
  #         - publicKey: ABCDEF... # Specifies the public key of this peer.
  #           endpoint: 192.168.1.2 # Specifies the endpoint of this peer entry.
  #           persistentKeepaliveInterval: 10s # Specifies the persistent keepalive interval for this peer.
  #           # AllowedIPs specifies a list of allowed IP addresses in CIDR notation for this peer.
  #           allowedIPs:
  #             - 192.168.1.0/24

  # # Virtual (shared) IP address configuration.
  # vip:
  #     ip: 172.16.199.55 # Specifies the IP address to be used.

interface string

The interface name.

Examples:

interface: eth0

cidr string

Assigns a static IP address to the interface. This should be in proper CIDR notation.

Note: This option is mutually exclusive with DHCP option.

Examples:

cidr: 10.5.0.0/16

routes []Route

A list of routes associated with the interface. If used in combination with DHCP, these routes will be appended to routes returned by DHCP server.

Examples:

routes:
    - network: 0.0.0.0/0 # The route's network.
      gateway: 10.5.0.1 # The route's gateway.
    - network: 10.2.0.0/16 # The route's network.
      gateway: 10.2.0.1 # The route's gateway.

bond Bond

Bond specific options.

Examples:

bond:
    # The interfaces that make up the bond.
    interfaces:
        - eth0
        - eth1
    mode: 802.3ad # A bond option.
    lacpRate: fast # A bond option.

vlans []Vlan

VLAN specific options.

mtu int

The interface’s MTU. If used in combination with DHCP, this will override any MTU settings returned from DHCP server.

dhcp bool

Indicates if DHCP should be used to configure the interface. The following DHCP options are supported:

OptionClasslessStaticRoute
OptionDomainNameServer
OptionDNSDomainSearchList
OptionHostName

Note: This option is mutually exclusive with CIDR.
Note: To configure an interface with only IPv6 SLAAC addressing, CIDR should be set to "" and DHCP to false in order for Talos to skip configuration of addresses. All other options will still apply.

Examples:

dhcp: true

ignore bool

Indicates if the interface should be ignored (skips configuration).

dummy bool

Indicates if the interface is a dummy interface. dummy is used to specify that this interface should be a virtual-only, dummy interface.

dhcpOptions DHCPOptions

DHCP specific options. dhcp must be set to true for these to take effect.

Examples:

dhcpOptions:
    routeMetric: 1024 # The priority of all routes received via DHCP.

wireguard DeviceWireguardConfig

Wireguard specific configuration. Includes things like private key, listen port, peers.

Examples:

wireguard:
    privateKey: ABCDEF... # Specifies a private key configuration (base64 encoded).
    listenPort: 51111 # Specifies a device's listening port.
    # Specifies a list of peer configurations to apply to a device.
    peers:
        - publicKey: ABCDEF... # Specifies the public key of this peer.
          endpoint: 192.168.1.3 # Specifies the endpoint of this peer entry.
          # AllowedIPs specifies a list of allowed IP addresses in CIDR notation for this peer.
          allowedIPs:
            - 192.168.1.0/24

wireguard:
    privateKey: ABCDEF... # Specifies a private key configuration (base64 encoded).
    # Specifies a list of peer configurations to apply to a device.
    peers:
        - publicKey: ABCDEF... # Specifies the public key of this peer.
          endpoint: 192.168.1.2 # Specifies the endpoint of this peer entry.
          persistentKeepaliveInterval: 10s # Specifies the persistent keepalive interval for this peer.
          # AllowedIPs specifies a list of allowed IP addresses in CIDR notation for this peer.
          allowedIPs:
            - 192.168.1.0/24

vip DeviceVIPConfig

Virtual (shared) IP address configuration.

Examples:

vip:
    ip: 172.16.199.55 # Specifies the IP address to be used.

DHCPOptions

DHCPOptions contains options for configuring the DHCP settings for a given interface.

Appears in:

Device.dhcpOptions

routeMetric: 1024 # The priority of all routes received via DHCP.

routeMetric uint32

The priority of all routes received via DHCP.

ipv4 bool

Enables DHCPv4 protocol for the interface (default is enabled).

ipv6 bool

Enables DHCPv6 protocol for the interface (default is disabled).

DeviceWireguardConfig

DeviceWireguardConfig contains settings for configuring Wireguard network interface.

Appears in:

Device.wireguard

privateKey: ABCDEF... # Specifies a private key configuration (base64 encoded).
listenPort: 51111 # Specifies a device's listening port.
# Specifies a list of peer configurations to apply to a device.
peers:
    - publicKey: ABCDEF... # Specifies the public key of this peer.
      endpoint: 192.168.1.3 # Specifies the endpoint of this peer entry.
      # AllowedIPs specifies a list of allowed IP addresses in CIDR notation for this peer.
      allowedIPs:
        - 192.168.1.0/24

privateKey: ABCDEF... # Specifies a private key configuration (base64 encoded).
# Specifies a list of peer configurations to apply to a device.
peers:
    - publicKey: ABCDEF... # Specifies the public key of this peer.
      endpoint: 192.168.1.2 # Specifies the endpoint of this peer entry.
      persistentKeepaliveInterval: 10s # Specifies the persistent keepalive interval for this peer.
      # AllowedIPs specifies a list of allowed IP addresses in CIDR notation for this peer.
      allowedIPs:
        - 192.168.1.0/24

privateKey string

Specifies a private key configuration (base64 encoded). Can be generated by wg genkey.

listenPort int

Specifies a device’s listening port.

firewallMark int

Specifies a device’s firewall mark.

peers []DeviceWireguardPeer

Specifies a list of peer configurations to apply to a device.

DeviceWireguardPeer

DeviceWireguardPeer a WireGuard device peer configuration.

Appears in:

DeviceWireguardConfig.peers

publicKey string

Specifies the public key of this peer. Can be extracted from private key by running wg pubkey < private.key > public.key && cat public.key.

endpoint string

Specifies the endpoint of this peer entry.

persistentKeepaliveInterval Duration

Specifies the persistent keepalive interval for this peer. Field format accepts any Go time.Duration format (‘1h’ for one hour, ‘10m’ for ten minutes).

allowedIPs []string

AllowedIPs specifies a list of allowed IP addresses in CIDR notation for this peer.

DeviceVIPConfig

DeviceVIPConfig contains settings for configuring a Virtual Shared IP on an interface.

Appears in:

Device.vip

ip: 172.16.199.55 # Specifies the IP address to be used.

ip string

Specifies the IP address to be used.

Bond

Bond contains the various options for configuring a bonded interface.

Appears in:

Device.bond

# The interfaces that make up the bond.
interfaces:
    - eth0
    - eth1
mode: 802.3ad # A bond option.
lacpRate: fast # A bond option.

interfaces []string

The interfaces that make up the bond.

arpIPTarget []string