How to Migrate an Instance with Zero Downtime: OpenStack Live Migration with KVM Hypervisor and NFS Shared Storage
Editor’s note: We will be talking briefly about live migration in the What’s New in OpenStack Havana webcast next week, but Damian had such a great explanation of how to actually do it that we wanted to put it out here so you can see it in action.
Live migration is the movement of a live instance from one compute node to another. A hugely sought-after feature by cloud administrators, it’s used primarily to achieve zero downtime during cloud maintenance and can also be a useful feature to achieve performance as live instances can be moved from a heavily loaded compute node to a less loaded compute node.
Planning for live migration has to be done at the initial stage of planning and designing an OpenStack deployment. Some things to take into consideration are as follows:
At the moment, not all hypervisors support live migration in OpenStack; therefore, it’s best to check HypervisorSupportMatrix to see if your hypervisor supports live migration. KVM, QEMU, XenServer/XCP, and HyperV are some of the currently supported hypervisors.
In a typical Openstack deployment, every compute node manages its instances locally in a dedicated directory (for example, /var/lib/nova/instances/) but for live migration, this folder has to be in a centralized location and shared across all the compute nodes. Hence, a shared file system or block storage is an important requirement for enabling live migration. For shared storage, a distributed file system such as GlusterFS, NFS needs to be properly configured and running before live migration can be performed. SAN storage protocols such as Fibre Channel (FC) and iSCSI can also be used for shared storage.
For file permissions when accessing the centralized storage in the shared storage, you must ensure that the UID and GID of Compute (nova) user is the same on the controller node and on all of the compute nodes (the assumption here is that the shared storage is on the controller node). Also, the UID and GID of libvirt-qemu must be the same on all compute nodes.
It’s important to specify vncserver_listen=0.0.0.0 so that vnc server can accept connections from all of the compute nodes regardless of where the instances are running. If this is not set, accessing the migrated instances through vnc could be an issue because the destination compute node’s ip address does not match that of the source compute node.
The following instructions enable live migration on an OpenStack multinode deployment using KVM hypervisor running Ubuntu 12.04 LTS with an NFS shared storage. This tutorial assumes that a working multimode deployment has already been configured using such a deployment tool as Mirantis Fuel. The lab used for this tutorial consists of a cloud controller node, a network node utilizing neutron networking, and two compute nodes.
Please note that this tutorial does not consider the security aspects of live migration. You have to research this area of concern and so do not take this tutorial as production ready from a security standpoint.
This tutorial is presented in two steps: first, the NFS shared storage implementation procedures, and, then, a demo of live migration.
Part 1: Implementing NFS shared storage
The cloud controller node is the NFS server. The aim is to share /var/lib/nova/instances across all of the compute nodes in your Openstack cluster. This directory contains libvirt KVM file-based disk images for the instances hosted on that compute node. If you are not running your cloud in a shared storage environment, this directory will be unique across all compute nodes. Note that if you already have instances running in your cloud before configuring live migrations, you need to take precautions that the existing instances are not overridden.
On the NFS server/controller node, take the following steps:
- Install the NFS server.
root@vmcon-mn:~# apt-get install nfs-kernel-server
IDMAPD provides functionality to the NFSv4 kernel client and server, by translating user and group IDs to names, and vice versa. Edit /etc/default/nfs-kernel-server and set the indicated option to yes. This file must be the same on both the client and NFS server.
NEED_IDMAPD=yes # only needed for Ubuntu 11.10 and earlier
- Ensure that the file /etc/idmapd.conf has the following:
[Mapping]
Nobody-User = nobody
Nobody-Group = nogroup - To share /var/lib/nova/instances, add the following to /etc/exports:
192.168.122.0/24(rw,fsid=0,insecure,no_subtree_check,async,no_root_squash)
Where 192.168.122.0/24 is the network address of your compute nodes (usually called data network) for your OpenStack cluster.
Set the ‘execute’ bit on your shared directory as follows, so that qemu can use the images within the directories when exported to the compute nodes.
root@vmcom1-mn:~# chmod o+x /var/lib/nova/instances
- Restart the services.
root@vmcon-mn:~# service nfs-kernel-server restart
root@vmcon-mn:~# /etc/init.d/idmapd restart
On each of the compute nodes, take the following steps:
- Install the NFS client services.
root@vmcom1-mn:~#apt-get install nfs-common
- Edit /etc/default/nfs-common and set the indicated option to yes:
NEED_IDMAPD=yes # only needed for Ubuntu 11.10 or earlier
- Mount the shared file system from the NFS server.
mount NFS-SERVER:/var/lib/nova/instances /var/lib/nova/instances
Where NFS-SERVER is the hostname/ip-address of the NFS server - To save from retyping this after every reboot, add the following line to /etc/fstab:
nfs-server:/ /var/lib/nova/instances nfs auto 0 0
Check on all the compute nodes and ensure the permissions are set as listed below. This indicates that the correct permissions are set on the controller node with the chmod +x command above.
root@vmcom1-mn:~# ls -ld /var/lib/nova/instances/
drwxr-xr-x 8 nova nova 4096 Oct 3 12:41 /var/lib/nova/instances/- Ensure that the exported directory can be mounted and check that it’s mounted.
root@vmcom1-mn#mount –a -v
root@vmcom1-mn:~# df -k
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/vda1 6192704 1732332 4145800 30% /
udev 1991628 4 1991624 1% /dev
tmpfs 800176 284 799892 1% /run
none 5120 0 5120 0% /run/lock
none 2000432 0 2000432 0% /run/shm
cgroup 2000432 0 2000432 0% /sys/fs/cgroup
vmcon-mn:/var/lib/nova/instances 6192896 2773760 3104512 48% /var/lib/nova/instances- Ensure that the last line above is as indicated. This line indicates that the /var/lib/nova/instances is correctly exported from NFS server. If this line is missing, your NFS share may not be working properly and you need to fix it before you proceed.
- Update the libvirt configurations. Modify /etc/libvirt/libvirtd.conf. To see all of the available options, please see libvirtd configurations.
before : #listen_tls = 0
after : listen_tls = 0
before : #listen_tcp = 1
after : listen_tcp = 1
add: auth_tcp = "none" - Modify /etc/init/libvirt-bin.conf.
before : exec /usr/sbin/libvirtd -d
after : exec /usr/sbin/libvirtd -d -l- -l is short for --listen
- Modify /etc/default/libvirt-bin.
before :libvirtd_opts=" -d"
after :libvirtd_opts=" -d -l" - Restart libvirt. After executing the command, ensure that libvirt is successfully restarted.
$ stop libvirt-bin && start libvirt-bin
$ ps -ef | grep libvirt
Miscellaneous configurations
You may skip the steps below if live migration was designed from start and hence the basic requirements are in place as stated in the introduction. These steps are to ensure that the nova UID and GID are the same on the controller node and on all the compute nodes. Also, the libvirt-qemu UID and GID must be the same on all compute nodes. This involves manually changing the GIDs and UIDs to ensure that they’re uniform on the compute and controller nodes.
The steps are as follows:
- On the controller node, check the nova id and then implement the same on all of the compute nodes:
[root@vmcon-mn ~]# id nova
uid=110(nova) gid=117(nova) groups=117(nova),113(libvirtd) - Now that we know the nova UIDs and GIDs, we can change them on all of the compute nodes as follows:
[root@vmcom1-mn ~]# usermod -u 110 nova
[root@vmcom1-mn ~]# groupmod -g 117 nova- Follow the same procedures for all of the compute nodes.
- Repeat the same for libvirt-qemu but keep in mind that the controller node does not have this user because the controller node does not run a hypervisor. Ensure that all of the compute nodes have the same UID and GID for user libvirt-qemu.
- Since we have changed the UIDs and GIDs of user nova and libvirt-qemu, we need to ensure that this is reflected across all of the files owned by these users. We achieve this by through the next step.
- Stop the nova-api and libvirt-bin services on the compute node. Change all of the files owned by nova and nova group to the new UID and GID, respectively. For example:
[root@vmcom1-mn ~]#service nova-api stop
[root@vmcom1-mn ~]#service libvirt-bin stop
[root@vmcom1-mn ~]#find / -uid 106 -exec chown nova {} \; # note the 106 here is the old nova uid before the change
[root@vmcom1-mn ~]#find / -uid 104 -exec chown libvirt-qemu {} \; # note the 104 here is the old nova uid before the change
[root@vmcom1-mn ~]# find / -gid 107 -exec chgrp nova {} \; #note the 107 here is the old nova uid before the change
[root@vmcom1-mn ~]#find / -gid 104 -exec chgrp libvirt-qemu {} \; #note the 104 here is the old nova uid before the change
[root@vmcom1-mn ~]#service nova-api restart
[root@vmcom1-mn ~]#service libvirt-bin restart
Part 2: Live migration of an OpenStack virtual machine
Now that OpenStack cluster and NFS shared file system have been properly set up, it’s time to attempt a live migration. Perform the following steps on the controller node:
- Check the running instances to determine their IDs.
nova list
root@vmcon-mn:~# nova list
+--------------------------------------+------+--------+------------------------+
| ID | Name | Status | Networks |
+--------------------------------------+------+--------+------------------------+
| 0bb04bc1-5535-49e2-8769-53fa42e184c8 | vm1 | ACTIVE | net_proj_one=10.10.1.4 |
| d93572ec-4796-4795-ade8-cfeb2a770cf2 | vm2 | ACTIVE | net_proj_one=10.10.1.5 |
+--------------------------------------+------+--------+------------------------+ - Check to see the compute nodes where the instances are running.
nova-manage vm list
root@vmcon-mn:~# nova-manage vm list
instance node type state launched image kernel ramdisk project user zone index
vm1 vmcom2-mn m1.tiny active 2013-10-03 13:33:52 b353319f-efef-4f1a-a20c-23949c82abd8 419303e31d40475a9c5b7d554b28a22f cd516c290d4e437d8605b411af4108fe None 0
vm2 vmcom1-mn m1.tiny active 2013-10-03 13:34:33 b353319f-efef-4f1a-a20c-23949c82abd8 419303e31d40475a9c5b7d554b28a22f cd516c290d4e437d8605b411af4108fe None 0- Here we observe that vm1 is running on compute 2 (vmcom2-mn) and vm2 is running on compute 1 (vmcom1-mn).
- Perform live migration.
- We will migrate vm1 with id 0bb04bc1-5535-49e2-8769-53fa42e184c8 (obtained using the nova list above) running on compute node 2 to compute node 1 (see command: nova-manage vm list above), vmcom1-mn.
- Note that this is an administrative function, so typically you first want to export the variables or source an admin credentials file.
root@vmcon-mn:~# export OS_TENANT_NAME=admin
root@vmcon-mn:~# export OS_USERNAME=admin
root@vmcon-mn:~# export OS_PASSWORD=admin
root@vmcon-mn:~# export OS_AUTH_URL="http://10.0.0.51:5000/v2.0/"
root@vmcon-mn:~# nova live-migration 0bb04bc1-5535-49e2-8769- 53fa42e184c8 vmcom1-mn- If successful, nova live-migration command produces no output.
- Verify that migration has been performed by running:
root@vmcon-mn:~# nova-manage vm list
instance node type state launched image kernel ramdisk project user zone index
vm1 vmcom1-mn m1.tiny active 2013-10-03 13:33:52 b353319f-efef-4f1a-a20c-23949c82abd8 419303e31d40475a9c5b7d554b28a22f cd516c290d4e437d8605b411af4108fe None 0
vm2 vmcom1-mn m1.tiny active 2013-10-03 13:34:33 b353319f-efef-4f1a-a20c-23949c82abd8 419303e31d40475a9c5b7d554b28a22f cd516c290d4e437d8605b411af4108fe None 0- We can see that both instances are now running on the same node.
Conclusion
Live migration is an indispensable feature to achieve zero downtime during OpenStack cloud maintenance where some compute nodes need to be shut down. The above steps--implementing shared storage and migrating a live instance--were followed to get a working live migration on an OpenStack Grizzly cloud running Ubuntu 12.04, using NFS shared storage.