3e92d21cd7
Change-Id: Icdaaa341e1fff4ee63ede1e1a059d41960e96f48
656 lines
21 KiB
ReStructuredText
656 lines
21 KiB
ReStructuredText
================================================
|
|
Testing NUMA related hardware setup with libvirt
|
|
================================================
|
|
|
|
This page describes how to test the libvirt driver's handling of the NUMA
|
|
placement, large page allocation and CPU pinning features. It relies on setting
|
|
up a virtual machine as the test environment and requires support for nested
|
|
virtualization since plain QEMU is not sufficiently functional. The virtual
|
|
machine will itself be given NUMA topology, so it can then act as a virtual
|
|
"host" for testing purposes.
|
|
|
|
------------------------------------------
|
|
Provisioning a virtual machine for testing
|
|
------------------------------------------
|
|
|
|
The entire test process will take place inside a large virtual machine running
|
|
Fedora 21. The instructions should work for any other Linux distribution which
|
|
includes libvirt >= 1.2.9 and QEMU >= 2.1.2
|
|
|
|
The tests will require support for nested KVM, which is not enabled by default
|
|
on hypervisor hosts. It must be explicitly turned on in the host when loading
|
|
the kvm-intel/kvm-amd kernel modules.
|
|
|
|
On Intel hosts verify it with
|
|
|
|
.. code-block:: bash
|
|
|
|
# cat /sys/module/kvm_intel/parameters/nested
|
|
N
|
|
|
|
# rmmod kvm-intel
|
|
# echo "options kvm-intel nested=y" > /etc/modprobe.d/dist.conf
|
|
# modprobe kvm-intel
|
|
|
|
# cat /sys/module/kvm_intel/parameters/nested
|
|
Y
|
|
|
|
While on AMD hosts verify it with
|
|
|
|
.. code-block:: bash
|
|
|
|
# cat /sys/module/kvm_amd/parameters/nested
|
|
0
|
|
|
|
# rmmod kvm-amd
|
|
# echo "options kvm-amd nested=1" > /etc/modprobe.d/dist.conf
|
|
# modprobe kvm-amd
|
|
|
|
# cat /sys/module/kvm_amd/parameters/nested
|
|
1
|
|
|
|
The virt-install command below shows how to provision a basic Fedora 21 x86_64
|
|
guest with 8 virtual CPUs, 8 GB of RAM and 20 GB of disk space:
|
|
|
|
.. code-block:: bash
|
|
|
|
# cd /var/lib/libvirt/images
|
|
# wget http://download.fedoraproject.org/pub/fedora/linux/releases/test/21-Alpha/Server/x86_64/iso/Fedora-Server-netinst-x86_64-21_Alpha.iso
|
|
|
|
# virt-install \
|
|
--name f21x86_64 \
|
|
--ram 8000 \
|
|
--vcpus 8 \
|
|
--file /var/lib/libvirt/images/f21x86_64.img \
|
|
--file-size 20
|
|
--cdrom /var/lib/libvirt/images/Fedora-Server-netinst-x86_64-21_Alpha.iso \
|
|
--os-variant fedora20
|
|
|
|
When the virt-viewer application displays the installer, follow the defaults
|
|
for the installation with a couple of exceptions
|
|
|
|
* The automatic disk partition setup can be optionally tweaked to reduce the
|
|
swap space allocated. No more than 500MB is required, free'ing up an extra
|
|
1.5 GB for the root disk.
|
|
|
|
* Select "Minimal install" when asked for the installation type since a desktop
|
|
environment is not required.
|
|
|
|
* When creating a user account be sure to select the option "Make this user
|
|
administrator" so it gets 'sudo' rights
|
|
|
|
Once the installation process has completed, the virtual machine will reboot
|
|
into the final operating system. It is now ready to deploy an OpenStack
|
|
development environment.
|
|
|
|
---------------------------------
|
|
Setting up a devstack environment
|
|
---------------------------------
|
|
|
|
For later ease of use, copy your SSH public key into the virtual machine
|
|
|
|
.. code-block:: bash
|
|
|
|
# ssh-copy-id <IP of VM>
|
|
|
|
Now login to the virtual machine
|
|
|
|
.. code-block:: bash
|
|
|
|
# ssh <IP of VM>
|
|
|
|
We'll install devstack under `$HOME/src/cloud/`.
|
|
|
|
.. code-block:: bash
|
|
|
|
# mkdir -p $HOME/src/cloud
|
|
# cd $HOME/src/cloud
|
|
# chmod go+rx $HOME
|
|
|
|
The Fedora minimal install does not contain git and only has the crude &
|
|
old-fashioned "vi" editor.
|
|
|
|
.. code-block:: bash
|
|
|
|
# sudo yum -y install git emacs
|
|
|
|
At this point a fairly standard devstack setup can be done. The config below is
|
|
just an example that is convenient to use to place everything in `$HOME`
|
|
instead of `/opt/stack`. Change the IP addresses to something appropriate for
|
|
your environment of course
|
|
|
|
.. code-block:: bash
|
|
|
|
# git clone git://github.com/openstack-dev/devstack.git
|
|
# cd devstack
|
|
# cat >>local.conf <<EOF
|
|
[[local|localrc]]
|
|
DEST=$HOME/src/cloud
|
|
DATA_DIR=$DEST/data
|
|
SERVICE_DIR=$DEST/status
|
|
|
|
LOGFILE=$DATA_DIR/logs/stack.log
|
|
SCREEN_LOGDIR=$DATA_DIR/logs
|
|
VERBOSE=True
|
|
|
|
disable_service neutron
|
|
|
|
HOST_IP=192.168.122.50
|
|
FLAT_INTERFACE=eth0
|
|
FIXED_RANGE=192.168.128.0/24
|
|
FIXED_NETWORK_SIZE=256
|
|
FLOATING_RANGE=192.168.129.0/24
|
|
|
|
MYSQL_PASSWORD=123456
|
|
SERVICE_TOKEN=123456
|
|
SERVICE_PASSWORD=123456
|
|
ADMIN_PASSWORD=123456
|
|
RABBIT_PASSWORD=123456
|
|
|
|
IMAGE_URLS="http://download.cirros-cloud.net/0.3.2/cirros-0.3.2-x86_64-uec.tar.gz"
|
|
EOF
|
|
|
|
# FORCE=yes ./stack.sh
|
|
|
|
Unfortunately while devstack starts various system services and changes various
|
|
system settings it doesn't make the changes persistent. Fix that now to avoid
|
|
later surprises after reboots
|
|
|
|
.. code-block:: bash
|
|
|
|
# sudo systemctl enable mysqld.service
|
|
# sudo systemctl enable rabbitmq-server.service
|
|
# sudo systemctl enable httpd.service
|
|
|
|
# sudo emacs /etc/sysconfig/selinux
|
|
SELINUX=permissive
|
|
|
|
----------------------------
|
|
Testing basis non-NUMA usage
|
|
----------------------------
|
|
|
|
First to confirm we've not done anything unusual to the traditional operation
|
|
of Nova libvirt guests boot a tiny instance
|
|
|
|
.. code-block:: bash
|
|
|
|
# . openrc admin
|
|
# nova boot --image cirros-0.3.2-x86_64-uec --flavor m1.tiny cirros1
|
|
|
|
The host will be reporting NUMA topology, but there should only be a single
|
|
NUMA cell this point.
|
|
|
|
.. code-block:: bash
|
|
|
|
# mysql -u root -p nova
|
|
MariaDB [nova]> select numa_topology from compute_nodes;
|
|
+----------------------------------------------------------------------------+
|
|
| numa_topology |
|
|
+----------------------------------------------------------------------------+
|
|
| {
|
|
| "nova_object.name": "NUMATopology",
|
|
| "nova_object.data": {
|
|
| "cells": [{
|
|
| "nova_object.name": "NUMACell",
|
|
| "nova_object.data": {
|
|
| "cpu_usage": 0,
|
|
| "memory_usage": 0,
|
|
| "cpuset": [0, 1, 2, 3, 4, 5, 6, 7],
|
|
| "pinned_cpus": [],
|
|
| "siblings": [],
|
|
| "memory": 7793,
|
|
| "mempages": [
|
|
| {
|
|
| "nova_object.name": "NUMAPagesTopology",
|
|
| "nova_object.data": {
|
|
| "total": 987430,
|
|
| "used": 0,
|
|
| "size_kb": 4
|
|
| },
|
|
| },
|
|
| {
|
|
| "nova_object.name": "NUMAPagesTopology",
|
|
| "nova_object.data": {
|
|
| "total": 0,
|
|
| "used": 0,
|
|
| "size_kb": 2048
|
|
| },
|
|
| }
|
|
| ],
|
|
| "id": 0
|
|
| },
|
|
| },
|
|
| ]
|
|
| },
|
|
| }
|
|
+----------------------------------------------------------------------------+
|
|
|
|
Meanwhile, the guest instance should not have any NUMA configuration recorded
|
|
|
|
.. code-block:: bash
|
|
|
|
MariaDB [nova]> select numa_topology from instance_extra;
|
|
+---------------+
|
|
| numa_topology |
|
|
+---------------+
|
|
| NULL |
|
|
+---------------+
|
|
|
|
-----------------------------------------------------
|
|
Reconfiguring the test instance to have NUMA topology
|
|
-----------------------------------------------------
|
|
|
|
Now that devstack is proved operational, it is time to configure some NUMA
|
|
topology for the test VM, so that it can be used to verify the OpenStack NUMA
|
|
support. To do the changes, the VM instance that is running devstack must be
|
|
shut down.
|
|
|
|
.. code-block:: bash
|
|
|
|
# sudo shutdown -h now
|
|
|
|
And now back on the physical host edit the guest config as root
|
|
|
|
.. code-block:: bash
|
|
|
|
# sudo virsh edit f21x86_64
|
|
|
|
The first thing is to change the `<cpu>` block to do passthrough of the host
|
|
CPU. In particular this exposes the "SVM" or "VMX" feature bits to the guest so
|
|
that "Nested KVM" can work. At the same time we want to define the NUMA
|
|
topology of the guest. To make things interesting we're going to give the guest
|
|
an asymmetric topology with 4 CPUS and 4 GBs of RAM in the first NUMA node and
|
|
2 CPUs and 2 GB of RAM in the second and third NUMA nodes. So modify the guest
|
|
XML to include the following CPU XML
|
|
|
|
.. code-block:: bash
|
|
|
|
<cpu mode='host-passthrough'>
|
|
<numa>
|
|
<cell id='0' cpus='0-3' memory='4096000'/>
|
|
<cell id='1' cpus='4-5' memory='2048000'/>
|
|
<cell id='2' cpus='6-7' memory='2048000'/>
|
|
</numa>
|
|
</cpu>
|
|
|
|
The guest can now be started again, and ssh back into it
|
|
|
|
.. code-block:: bash
|
|
|
|
# virsh start f21x86_64
|
|
|
|
...wait for it to finish booting
|
|
|
|
# ssh <IP of VM>
|
|
|
|
Before starting OpenStack services again, it is necessary to reconfigure Nova
|
|
to enable the NUMA scheduler filter. The libvirt virtualization type must also
|
|
be explicitly set to KVM, so that guests can take advantage of nested KVM.
|
|
|
|
.. code-block:: bash
|
|
|
|
# sudo emacs /etc/nova/nova.conf
|
|
|
|
Set the following parameters:
|
|
|
|
.. code-block:: bash
|
|
|
|
[DEFAULT]
|
|
scheduler_default_filters=RetryFilter, AvailabilityZoneFilter, RamFilter, ComputeFilter, ComputeCapabilitiesFilter, ImagePropertiesFilter, ServerGroupAntiAffinityFilter, ServerGroupAffinityFilter, NUMATopologyFilter
|
|
|
|
[libvirt]
|
|
virt_type = kvm
|
|
|
|
With that done, OpenStack can be started again
|
|
|
|
.. code-block:: bash
|
|
|
|
# cd $HOME/src/cloud/devstack
|
|
# ./rejoin-stack.sh
|
|
|
|
The first thing is to check that the compute node picked up the new NUMA
|
|
topology setup for the guest
|
|
|
|
.. code-block:: bash
|
|
|
|
# mysql -u root -p nova
|
|
MariaDB [nova]> select numa_topology from compute_nodes;
|
|
+----------------------------------------------------------------------------+
|
|
| numa_topology |
|
|
+----------------------------------------------------------------------------+
|
|
| {
|
|
| "nova_object.name": "NUMATopology",
|
|
| "nova_object.data": {
|
|
| "cells": [{
|
|
| "nova_object.name": "NUMACell",
|
|
| "nova_object.data": {
|
|
| "cpu_usage": 0,
|
|
| "memory_usage": 0,
|
|
| "cpuset": [0, 1, 2, 3],
|
|
| "pinned_cpus": [],
|
|
| "siblings": [],
|
|
| "memory": 3857,
|
|
| "mempages": [
|
|
| {
|
|
| "nova_object.name": "NUMAPagesTopology",
|
|
| "nova_object.data": {
|
|
| "total": 987430,
|
|
| "used": 0,
|
|
| "size_kb": 4
|
|
| },
|
|
| },
|
|
| {
|
|
| "nova_object.name": "NUMAPagesTopology",
|
|
| "nova_object.data": {
|
|
| "total": 0,
|
|
| "used": 0,
|
|
| "size_kb": 2048
|
|
| },
|
|
| }
|
|
| ],
|
|
| "id": 0
|
|
| },
|
|
| },
|
|
| {
|
|
| "nova_object.name": "NUMACell",
|
|
| "nova_object.data": {
|
|
| "cpu_usage": 0,
|
|
| "memory_usage": 0,
|
|
| "cpuset": [4, 5],
|
|
| "pinned_cpus": [],
|
|
| "siblings": [],
|
|
| "memory": 1969,
|
|
| "mempages": [
|
|
| {
|
|
| "nova_object.name": "NUMAPagesTopology",
|
|
| "nova_object.data": {
|
|
| "total": 504216,
|
|
| "used": 0,
|
|
| "size_kb": 4
|
|
| },
|
|
| },
|
|
| {
|
|
| "nova_object.name": "NUMAPagesTopology",
|
|
| "nova_object.data": {
|
|
| "total": 0,
|
|
| "used": 0,
|
|
| "size_kb": 2048
|
|
| },
|
|
| }
|
|
| ],
|
|
| "id": 1
|
|
| },
|
|
| },
|
|
| {
|
|
| "nova_object.name": "NUMACell",
|
|
| "nova_object.data": {
|
|
| "cpu_usage": 0,
|
|
| "memory_usage": 0,
|
|
| "cpuset": [6, 7],
|
|
| "pinned_cpus": [],
|
|
| "siblings": [],
|
|
| "memory": 1967,
|
|
| "mempages": [
|
|
| {
|
|
| "nova_object.name": "NUMAPagesTopology",
|
|
| "nova_object.data": {
|
|
| "total": 503575,
|
|
| "used": 0,
|
|
| "size_kb": 4
|
|
| },
|
|
| },
|
|
| {
|
|
| "nova_object.name": "NUMAPagesTopology",
|
|
| "nova_object.data": {
|
|
| "total": 0,
|
|
| "used": 0,
|
|
| "size_kb": 2048
|
|
| },
|
|
| }
|
|
| ],
|
|
| "id": 2
|
|
| },
|
|
| }
|
|
| ]
|
|
| },
|
|
| }
|
|
+----------------------------------------------------------------------------+
|
|
|
|
This indeed shows that there are now 3 NUMA nodes for the "host" machine, the
|
|
first with 4 GB of RAM and 4 CPUs, and others with 2 GB of RAM and 2 CPUs each.
|
|
|
|
-----------------------------------------------------
|
|
Testing instance boot with no NUMA topology requested
|
|
-----------------------------------------------------
|
|
|
|
For the sake of backwards compatibility, if the NUMA filter is enabled, but the
|
|
flavor/image does not have any NUMA settings requested, it should be assumed
|
|
that the guest will have a single NUMA node. The guest should be locked to a
|
|
single host NUMA node too. Boot a guest with the `m1.tiny` flavor to test this
|
|
condition
|
|
|
|
.. code-block:: bash
|
|
|
|
# . openrc admin admin
|
|
# nova boot --image cirros-0.3.2-x86_64-uec --flavor m1.tiny cirros1
|
|
|
|
Now look at the libvirt guest XML. It should show that the vCPUs are locked to
|
|
pCPUs within a particular node.
|
|
|
|
.. code-block:: bash
|
|
|
|
# virsh -c qemu:///system list
|
|
....
|
|
# virsh -c qemu:///system dumpxml instanceXXXXXX
|
|
...
|
|
<vcpu placement='static' cpuset='6-7'>1</vcpu>
|
|
...
|
|
|
|
This example shows that the guest has been locked to the 3rd NUMA node (which
|
|
contains pCPUs 6 and 7). Note that there is no explicit NUMA topology listed in
|
|
the guest XML.
|
|
|
|
------------------------------------------------
|
|
Testing instance boot with 1 NUMA cell requested
|
|
------------------------------------------------
|
|
|
|
Moving forward a little, explicitly tell Nova that the NUMA topology for the
|
|
guest should have a single NUMA node. This should operate in an identical
|
|
manner to the default behavior where no NUMA policy is set. To define the
|
|
topology we will create a new flavor
|
|
|
|
.. code-block:: bash
|
|
|
|
# nova flavor-create m1.numa 999 1024 1 4
|
|
# nova flavor-key m1.numa set hw:numa_nodes=1
|
|
# nova flavor-show m1.numa
|
|
|
|
Now boot the guest using this new flavor
|
|
|
|
.. code-block:: bash
|
|
|
|
# nova boot --image cirros-0.3.2-x86_64-uec --flavor m1.numa cirros2
|
|
|
|
Looking at the resulting guest XML from libvirt
|
|
|
|
.. code-block:: bash
|
|
|
|
# virsh -c qemu:///system dumpxml instanceXXXXXX
|
|
...
|
|
<vcpu placement='static'>4</vcpu>
|
|
<cputune>
|
|
<vcpupin vcpu='0' cpuset='0-3'/>
|
|
<vcpupin vcpu='1' cpuset='0-3'/>
|
|
<vcpupin vcpu='2' cpuset='0-3'/>
|
|
<vcpupin vcpu='3' cpuset='0-3'/>
|
|
<emulatorpin cpuset='0-3'/>
|
|
</cputune>
|
|
...
|
|
<cpu>
|
|
<topology sockets='4' cores='1' threads='1'/>
|
|
<numa>
|
|
<cell id='0' cpus='0-3' memory='1048576'/>
|
|
</numa>
|
|
</cpu>
|
|
...
|
|
<numatune>
|
|
<memory mode='strict' nodeset='0'/>
|
|
<memnode cellid='0' mode='strict' nodeset='0'/>
|
|
</numatune>
|
|
|
|
The XML shows:
|
|
|
|
* Each guest CPU has been pinned to the physical CPUs associated with a
|
|
particular NUMA node
|
|
|
|
* The emulator threads have been pinned to the union of all physical CPUs in
|
|
the host NUMA node that the guest is placed on
|
|
|
|
* The guest has been given a virtual NUMA topology with a single node holding
|
|
all RAM and CPUs
|
|
|
|
* The guest NUMA node has been strictly pinned to a host NUMA node.
|
|
|
|
As a further sanity test, check what Nova recorded for the instance in the
|
|
database. This should match the <numatune> information
|
|
|
|
.. code-block:: bash
|
|
|
|
MariaDB [nova]> select numa_topology from instance_extra;
|
|
+----------------------------------------------------------------------------+
|
|
| numa_topology |
|
|
+----------------------------------------------------------------------------+
|
|
| {
|
|
| "nova_object.name": "InstanceNUMATopology",
|
|
| "nova_object.data": {
|
|
| "instance_uuid": "4c2302fe-3f0f-46f1-9f3e-244011f6e03a",
|
|
| "cells": [
|
|
| {
|
|
| "nova_object.name": "InstanceNUMACell",
|
|
| "nova_object.data": {
|
|
| "cpu_topology": null,
|
|
| "pagesize": null,
|
|
| "cpuset": [
|
|
| 0,
|
|
| 1,
|
|
| 2,
|
|
| 3
|
|
| ],
|
|
| "memory": 1024,
|
|
| "cpu_pinning_raw": null,
|
|
| "id": 0
|
|
| },
|
|
| }
|
|
| ]
|
|
| },
|
|
| }
|
|
+----------------------------------------------------------------------------+
|
|
|
|
-------------------------------------------------
|
|
Testing instance boot with 2 NUMA cells requested
|
|
-------------------------------------------------
|
|
|
|
Now getting more advanced we tell Nova that the guest will have two NUMA nodes.
|
|
To define the topology we will change the previously defined flavor
|
|
|
|
.. code-block:: bash
|
|
|
|
# nova flavor-key m1.numa set hw:numa_nodes=2
|
|
# nova flavor-show m1.numa
|
|
|
|
Now boot the guest using this changed flavor
|
|
|
|
.. code-block:: bash
|
|
|
|
# nova boot --image cirros-0.3.2-x86_64-uec --flavor m1.numa cirros2
|
|
|
|
Looking at the resulting guest XML from libvirt
|
|
|
|
.. code-block:: bash
|
|
|
|
# virsh -c qemu:///system dumpxml instanceXXXXXX
|
|
...
|
|
<vcpu placement='static'>4</vcpu>
|
|
<cputune>
|
|
<vcpupin vcpu='0' cpuset='0-3'/>
|
|
<vcpupin vcpu='1' cpuset='0-3'/>
|
|
<vcpupin vcpu='2' cpuset='4-5'/>
|
|
<vcpupin vcpu='3' cpuset='4-5'/>
|
|
<emulatorpin cpuset='0-5'/>
|
|
</cputune>
|
|
...
|
|
<cpu>
|
|
<topology sockets='4' cores='1' threads='1'/>
|
|
<numa>
|
|
<cell id='0' cpus='0-1' memory='524288'/>
|
|
<cell id='1' cpus='2-3' memory='524288'/>
|
|
</numa>
|
|
</cpu>
|
|
...
|
|
<numatune>
|
|
<memory mode='strict' nodeset='0-1'/>
|
|
<memnode cellid='0' mode='strict' nodeset='0'/>
|
|
<memnode cellid='1' mode='strict' nodeset='1'/>
|
|
</numatune>
|
|
|
|
The XML shows:
|
|
|
|
* Each guest CPU has been pinned to the physical CPUs associated with
|
|
particular NUMA nodes
|
|
|
|
* The emulator threads have been pinned to the union of all physical CPUs in
|
|
the host NUMA nodes that the guest is placed on
|
|
|
|
* The guest has been given a virtual NUMA topology with two nodes, each holding
|
|
half the RAM and CPUs
|
|
|
|
* The guest NUMA nodes have been strictly pinned to different host NUMA node
|
|
|
|
As a further sanity test, check what Nova recorded for the instance in the
|
|
database. This should match the <numatune> information
|
|
|
|
.. code-block:: bash
|
|
|
|
MariaDB [nova]> select numa_topology from instance_extra;
|
|
+----------------------------------------------------------------------------+
|
|
| numa_topology |
|
|
+----------------------------------------------------------------------------+
|
|
| {
|
|
| "nova_object.name": "InstanceNUMATopology",
|
|
| "nova_object.data": {
|
|
| "instance_uuid": "a14fcd68-567e-4d71-aaa4-a12f23f16d14",
|
|
| "cells": [
|
|
| {
|
|
| "nova_object.name": "InstanceNUMACell",
|
|
| "nova_object.data": {
|
|
| "cpu_topology": null,
|
|
| "pagesize": null,
|
|
| "cpuset": [
|
|
| 0,
|
|
| 1
|
|
| ],
|
|
| "memory": 512,
|
|
| "cpu_pinning_raw": null,
|
|
| "id": 0
|
|
| },
|
|
| },
|
|
| {
|
|
| "nova_object.name": "InstanceNUMACell",
|
|
| "nova_object.data": {
|
|
| "cpu_topology": null,
|
|
| "pagesize": null,
|
|
| "cpuset": [
|
|
| 2,
|
|
| 3
|
|
| ],
|
|
| "memory": 512,
|
|
| "cpu_pinning_raw": null,
|
|
| "id": 1
|
|
| },
|
|
| }
|
|
| ]
|
|
| },
|
|
| }
|
|
|
|
|
+----------------------------------------------------------------------------+
|