Everything you ever wanted to know about using etcd with Kubernetes v1.6 (but were afraid to ask)
etcd data store format: v2 or v3?
etcd version 3.0.0 and up supports two different data stores: v2 and v3, but it's important to know what version you're using because it impacts your ability to back up your information.In Kubernetes 1.5, the default data store format was v2, but v3 was still available if you set it explicitly. For Kubernetes v1.6, however, the default data store is etcd v3, but you will still need to think about which format, for the various components that surround it.
For example, Calico, Canal, and Flannel can only write data to the etcd v2 data store, so combining their etcd data with the Kubernetes etcd data store can complicate the maintenance of etcd if Kubernetes is using v3.
Users blindly upgrading from Kubernetes v1.5 to v1.6 may be in for a surprise. (Just one reason it's important to always read the release notes!) Kubernetes v1.6 changes the default etcd backend from v2 to v3, so make sure that before you start, you manually migrate etcd to v3. This way, you can ensure data consistency, which requires shutting down all kube-apiservers.
If you don't want to migrate just yet, you can pin kube-apiserver back to v2 etcd with the following option:
--storage-backend=etcd2
Backing up etcd
All configuration data for Kubernetes is stored inside etcd, so in the event of an irrecoverable disaster, an operator can use an etcd backup to recover all data. Etcd creates snapshots regularly on its own, but daily backups stored on a separate host are a good strategy for disaster recovery for Kubernetes.Backup methods
etcd has different backup methods for v2 and v3, and each has its own advantages and disadvantages. The v3 backup is much cleaner and consists of a single, compact file, but it has one major drawback: it won't backup or recover v2 data.This means that if you have only etcd v3 data (for example, if your network plugin doesn't consume etcd), you can use the v3 backup, but if you have any v2 data--even if it's mixed with v3 data--you must use the v2 backup method.
Let's look at each of these methods.
Etcd v2 backups
The etcd v2 backup method creates a directory structure with a single WAL file. You can perform a backup online without interrupting etcd cluster operations. To back up an etcd v2+v3 data store, use the following command:etcdctl backup --data-dir /var/lib/etcd/ --backup-dir /backupdirYou can find the official procedure for etcd v2 restore here, but here is an overview of the basic steps. The challenging part is to rebuild the cluster one node at a time.
#!/bin/bash -e |
Etcd v3 backups
The etcd v3 backup creates a single compressed file. Remember, while v2 backups surprisingly also copy v3 data, the v3 backup cannot be used to back up etcd v2 data, so be careful before using this method. To create a v3 backup, run the command:ETCDCTL_API=3 etcdctl snapshot save /backupdir |
The steps required are as follows:
- Stop etcd on all hosts
- Purge /var/lib/etcd/member on all hosts
- Copy the backup file to each etcd host
- source /etc/default/etcd on each host and run the following command:
ETCDCTL_API=3 etcdctl snapshot restore BACKUP_FILE \ |
Tuning etcd
Because etcd is used to store Kubernetes' configuration information, its performance is crucial to the efficient performance of your cluster. Fortunately, etcd can be tuned to better operate under various deployment conditions. All write operations require synchronization between all etcd nodes, which leads us to the following functional requirements:- etcd needs fast access to disk
- etcd needs low latency to other etcd nodes, and thus fast networking
- etcd needs to synchronize data across all etcd nodes before writing data to disk
- The etcd store should not be located on the same disk as a disk-intensive service (such as Ceph)
- etcd nodes should not be spread across datacenters or, in the case of public clouds, availability zones
- The number of etcd nodes should be 3; you need an odd number to prevent "split brain" problems, but more than 3 can be a drag on performance
ETCD_ELECTION_TIMEOUT=5000 #default 1000ms |
Troubleshooting etcd
Here are some problems we've run into with etcd, and the solutions we came up with to fix them.Problem | Solution |
My restore fails and I see “etcdmain: database file (/var/lib/etcd/member/snap/db) of the backend is missing” in my etcd log. | The etcd v2 backup took place while etcd was writing a snapshot file. This backup file is not usable. The only solution is to restore from another backup file. |
Why is etcd not listening on port 2379? | There are several possible reasons. First, ensure that the etcd service is running. Next, check etcd service logs on each host to see if there are issues with election and/or quorum. At least 51% of the cluster must be online -- the actual formula is N/2 + 1 -- in order for any data to be read or written, to prevent split brain problems; this way you won't find yourself in a situation where different data is written across the cluster. That means a 3 node cluster must have at least 2 functional nodes. |
Why does etcd perform so many re-elections? | Try raising ETCD_ELECTION_TIMEOUT and ETCD_HEARTBEAT_INTERVAL. Also, try reducing the amount of load on the host. You can find more information here. |