Mirantis OpenStack 7.0: NFVI Deployment Guide -- NUMA/CPU pinning
As we hinted in the section about Huge pages, NUMA means that memory is broken up into pools with each vCPU having its own pool of "local" memory. Best performance comes from making sure a process and its memory are running in the same NUMA cell. Unfortunately, the nature of virtualization means that processes typically use whatever vCPU is available, whether it's local to the memory or not. This is partially because when deciding how to allocate VMs to CPUs, the default OpenStack scheduler attempts to optimise for heavily contended systems rather than guarantee the performance of individual VMs. You can override this behavior using CPU Pinning.
CPU Pinning enables you to pin, or establish a mapping between virtual CPU to the physical core so that a virtual CPU will always run on the same physical one. By exposing NUMA topology to the VM and pinning vCPU to specific core it’s possible to improve VM performance by ensuring that access to memory will always be local in terms of NUMA topology.
Compute hosts configuration
To enable CPU Pinning, perform the following steps on every compute host where you want CPU pinning to be enabled.- Upgrade QEMU to 2.4 to use NUMA CPU pinning (see the Appendix A1 “Installing qemu 2.1”).
- Get the NUMA topology for the node:
# lscpu | grep NUMA
NUMA node(s): 2
NUMA node0 CPU(s): 0-5,12-17
NUMA node1 CPU(s): 6-11,18-23 - Tell the system which cores should be used only by virtual machines and not by host operating system by adding the following to the end of /etc/default/grub:
GRUB_CMDLINE_LINUX="$GRUB_CMDLINE_LINUX isolcpus=1-5,7-23”
- Then add the same list to vcpu_pin_set in /etc/nova/nova.conf:
vcpu_pin_set=1-5,7-23
In this example we ensured that cores 0 and 6 will be dedicated to the host system. Virtual machines will use cores 1-5 and 12-17 on NUMA cell 1, and cores 7-11 and 18-23 on NUMA cell 2. - Update boot record and reboot compute node:
update-grub
reboot
Nova configuration
Now that you've enabled CPU Pinning on the system, you need to configure nova to use it. Perform these steps on the compute nodes where you want to use NUMA pinning:- On the commandline, create aggregates for instances with and without cpu pinning:
# nova aggregate-create performance
# nova aggregate-set-metadata performance pinned=true# nova aggregate-create normal
# nova aggregate-set-metadata normal pinned=false - Add one or more hosts to the new aggregates:
# nova aggregate-add-host performance node-9.domain.tld # nova aggregate-add-host normal node-10.domain.tld
- Create a new flavor for VMs that require CPU pinning:
# nova flavor-create m1.small.performance auto 2048 20 2
# nova flavor-key m1.small.performance set hw:cpu_policy=dedicated
# nova flavor-key m1.small.performance set aggregate_instance_extra_specs:pinned=true - To be thorough, you should update all other flavours so they will start only on hosts without CPU pinning:
# openstack flavor list -f csv|grep -v performance |cut -f1 -d,| \
tail -n +2| xargs -I% -n 1 nova flavor-key % set aggregate_instance_extra_specs:pinned=false - On every controller add values AggregateInstanceExtraSpecFilter and NUMATopologyFilter to the scheduler_default_filters parameter in /etc/nova/nova.conf:
scheduler_default_filters=RetryFilter,AvailabilityZoneFilter,RamFilter,DiskFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter,NUMATopologyFilter,AggregateInstanceExtraSpecsFilter
- Restart nova scheduler service on all controllers:
restart nova-scheduler
Using CPU pinning
Once you've done this configuration, using CPU Pinning is straightforward. Follow these steps:- Start a new VM with a flavor that requires pinning ...
# nova boot --image TestVM --nic net-id=`openstack network show net04 -f value | head -n1` --flavor m1.small.performance test1
… and check its vcpu configuration:# hypervisor=`nova show test1 | grep OS-EXT-SRV-ATTR:host | cut -d\| -f3`
# instance=`nova show test1 | grep OS-EXT-SRV-ATTR:instance_name | cut -d\| -f3`
# ssh $hypervisor virsh dumpxml $instance |awk '/vcpu placement/ {p=1}; p; /\/numatune/ {p=0}’<vcpu placement='static'>2</vcpu>
You should see that each vCPU is pinned to a dedicated CPU core, which is not used by the host operating system, and that these cores are inside the same host NUMA cell (in our example it’s cores 4 and 16 in NUMA cell 1).
<cputune>
<shares>2048</shares>
<vcpupin vcpu='0' cpuset='16'/>
<vcpupin vcpu='1' cpuset='4'/>
<emulatorpin cpuset='4,16'/>
</cputune>
<numatune>
<memory mode='strict' nodeset='0'/>
<memnode cellid='0' mode='strict' nodeset='0'/>
</numatune> - Repeat the test for the instance with two NUMA cells:
# nova flavor-create m1.small.performance-2 auto 2048 20 2
# nova flavor-key m1.small.performance-2 set hw:cpu_policy=dedicated
# nova flavor-key m1.small.performance-2 set aggregate_instance_extra_specs:pinned=true
# nova flavor-key m1.small.performance-2 set hw:numa_nodes=2 # nova boot --image TestVM --nic net-id=`openstack network show net04 -f value | head -n1` --flavor m1.small.performance-2 test2 # hypervisor=`nova show test2 | grep OS-EXT-SRV-ATTR:host | cut -d\| -f3`
# instance=`nova show test2 | grep OS-EXT-SRV-ATTR:instance_name | cut -d\| -f3`
# ssh $hypervisor virsh dumpxml $instance |awk '/vcpu placement/ {p=1}; p; /\/numatune/ {p=0}’<vcpu placement='static'>2</vcpu>
<cputune>
<shares>2048</shares>
<vcpupin vcpu='0' cpuset='2'/>
<vcpupin vcpu='1' cpuset='10'/>
<emulatorpin cpuset='2,10'/>
</cputune>
<numatune>
<memory mode='strict' nodeset='0-1'/>
<memnode cellid='0' mode='strict' nodeset='0'/>
<memnode cellid='1' mode='strict' nodeset='1'/>
</numatune>
Troubleshooting
You might run into the following errors:internal error: No PCI buses available in /etc/nova/nova.conf
In this case, you've specified the wrong hw_machine_type in /etc/nova/nova.conf
libvirtError: unsupported configuration
Per-node memory binding is not supported with this version of QEMU. You may have an older version of qemu, or a stale libvirt cache.