Rename vgpu options to mdev

As a prerequisite for blueprint generic-mdevs we need to rename the
existing enabled_vgpu_types options and dynamically generated groups
into enabled_mdev_types.
There is no upgrade impact for existing users, as the original
options are still accepted.

NOTE(sbauza): As we have a lot of methods and objects named gpu-ish
let's just change what we need here and provide followups for
fixing internal tech debt later.

Change-Id: Idba094f6366a24965804b88da0bc1b9754549c99
Partially-Implements: blueprint generic-mdevs
This commit is contained in:
Sylvain Bauza
2021-07-21 11:03:27 +02:00
parent 3545356ae3
commit ff4d0d002a
8 changed files with 122 additions and 105 deletions
+5 -5
View File
@@ -33,12 +33,12 @@ Enable GPU types (Compute)
#. Specify which specific GPU type(s) the instances would get. #. Specify which specific GPU type(s) the instances would get.
Edit :oslo.config:option:`devices.enabled_vgpu_types`: Edit :oslo.config:option:`devices.enabled_mdev_types`:
.. code-block:: ini .. code-block:: ini
[devices] [devices]
enabled_vgpu_types = nvidia-35 enabled_mdev_types = nvidia-35
If you want to support more than a single GPU type, you need to provide a If you want to support more than a single GPU type, you need to provide a
separate configuration section for each device. For example: separate configuration section for each device. For example:
@@ -46,12 +46,12 @@ Enable GPU types (Compute)
.. code-block:: ini .. code-block:: ini
[devices] [devices]
enabled_vgpu_types = nvidia-35, nvidia-36 enabled_mdev_types = nvidia-35, nvidia-36
[vgpu_nvidia-35] [mdev_nvidia-35]
device_addresses = 0000:84:00.0,0000:85:00.0 device_addresses = 0000:84:00.0,0000:85:00.0
[vgpu_nvidia-36] [mdev_nvidia-36]
device_addresses = 0000:86:00.0 device_addresses = 0000:86:00.0
where you have to define which physical GPUs are supported per GPU type. where you have to define which physical GPUs are supported per GPU type.
+26 -22
View File
@@ -16,36 +16,39 @@ devices_group = cfg.OptGroup(
name='devices', name='devices',
title='physical or virtual device options') title='physical or virtual device options')
vgpu_opts = [ mdev_opts = [
cfg.ListOpt('enabled_vgpu_types', cfg.ListOpt('enabled_mdev_types',
default=[], default=[],
deprecated_name='enabled_vgpu_types',
help=""" help="""
The vGPU types enabled in the compute node. The mdev types enabled in the compute node.
Some pGPUs (e.g. NVIDIA GRID K1) support different vGPU types. User can use Some hardware (e.g. NVIDIA GRID K1) support different mdev types. User can use
this option to specify a list of enabled vGPU types that may be assigned to a this option to specify a list of enabled mdev types that may be assigned to a
guest instance. guest instance.
If more than one single vGPU type is provided, then for each *vGPU type* an If more than one single mdev type is provided, then for each *mdev type* an
additional section, ``[vgpu_$(VGPU_TYPE)]``, must be added to the configuration additional section, ``[mdev_$(MDEV_TYPE)]``, must be added to the configuration
file. Each section then **must** be configured with a single configuration file. Each section then **must** be configured with a single configuration
option, ``device_addresses``, which should be a list of PCI addresses option, ``device_addresses``, which should be a list of PCI addresses
corresponding to the physical GPU(s) to assign to this type. corresponding to the physical GPU(s) or mdev-capable hardware to assign to this
type.
If one or more sections are missing (meaning that a specific type is not wanted If one or more sections are missing (meaning that a specific type is not wanted
to use for at least one physical GPU) or if no device addresses are provided, to use for at least one physical device) or if no device addresses are provided
then Nova will only use the first type that was provided by , then Nova will only use the first type that was provided by
``[devices]/enabled_vgpu_types``. ``[devices]/enabled_mdev_types``.
If the same PCI address is provided for two different types, nova-compute will If the same PCI address is provided for two different types, nova-compute will
return an InvalidLibvirtGPUConfig exception at restart. return an InvalidLibvirtGPUConfig exception at restart.
An example is as the following:: As an interim period, old configuration groups named ``[vgpu_$(MDEV_TYPE)]``
will be accepted. A valid configuration could then be::
[devices] [devices]
enabled_vgpu_types = nvidia-35, nvidia-36 enabled_mdev_types = nvidia-35, nvidia-36
[vgpu_nvidia-35] [mdev_nvidia-35]
device_addresses = 0000:84:00.0,0000:85:00.0 device_addresses = 0000:84:00.0,0000:85:00.0
[vgpu_nvidia-36] [vgpu_nvidia-36]
@@ -57,7 +60,7 @@ An example is as the following::
def register_opts(conf): def register_opts(conf):
conf.register_group(devices_group) conf.register_group(devices_group)
conf.register_opts(vgpu_opts, group=devices_group) conf.register_opts(mdev_opts, group=devices_group)
def register_dynamic_opts(conf): def register_dynamic_opts(conf):
@@ -66,14 +69,15 @@ def register_dynamic_opts(conf):
This must be called by the service that wishes to use the options **after** This must be called by the service that wishes to use the options **after**
the initial configuration has been loaded. the initial configuration has been loaded.
""" """
opt = cfg.ListOpt('device_addresses', default=[],
item_type=cfg.types.String())
# Register the '[vgpu_$(VGPU_TYPE)]/device_addresses' opts, implicitly # Register the '[mdev_$(MDEV_TYPE)]/device_addresses' opts, implicitly
# registering the '[vgpu_$(VGPU_TYPE)]' groups in the process # registering the '[mdev_$(MDEV_TYPE)]' groups in the process
for vgpu_type in conf.devices.enabled_vgpu_types: for mdev_type in conf.devices.enabled_mdev_types:
conf.register_opt(opt, group='vgpu_%s' % vgpu_type) opt = cfg.ListOpt('device_addresses', default=[],
item_type=cfg.types.String(),
deprecated_group='vgpu_%s' % mdev_type)
conf.register_opt(opt, group='mdev_%s' % mdev_type)
def list_opts(): def list_opts():
return {devices_group: vgpu_opts} return {devices_group: mdev_opts}
@@ -79,7 +79,7 @@ class VGPUReshapeTests(base.ServersTestBase):
# start a compute with vgpu support disabled so the driver will # start a compute with vgpu support disabled so the driver will
# ignore the content of the above HostMdevDeviceInfo # ignore the content of the above HostMdevDeviceInfo
self.flags(enabled_vgpu_types='', group='devices') self.flags(enabled_mdev_types='', group='devices')
hostname = self.start_compute( hostname = self.start_compute(
hostname='compute1', hostname='compute1',
@@ -106,7 +106,7 @@ class VGPUReshapeTests(base.ServersTestBase):
# enabled vgpu support # enabled vgpu support
self.flags( self.flags(
enabled_vgpu_types=fakelibvirt.NVIDIA_11_VGPU_TYPE, enabled_mdev_types=fakelibvirt.NVIDIA_11_VGPU_TYPE,
group='devices') group='devices')
# We don't want to restart the compute service or it would call for # We don't want to restart the compute service or it would call for
# a reshape but we still want to accept some vGPU types so we call # a reshape but we still want to accept some vGPU types so we call
+4 -4
View File
@@ -132,7 +132,7 @@ class VGPUTests(VGPUTestBase):
# Start compute1 supporting only nvidia-11 # Start compute1 supporting only nvidia-11
self.flags( self.flags(
enabled_vgpu_types=fakelibvirt.NVIDIA_11_VGPU_TYPE, enabled_mdev_types=fakelibvirt.NVIDIA_11_VGPU_TYPE,
group='devices') group='devices')
# for the sake of resizing, we need to patch the two methods below # for the sake of resizing, we need to patch the two methods below
@@ -293,7 +293,7 @@ class VGPUMultipleTypesTests(VGPUTestBase):
self.flavor = self._create_flavor(extra_spec=extra_spec) self.flavor = self._create_flavor(extra_spec=extra_spec)
self.flags( self.flags(
enabled_vgpu_types=[fakelibvirt.NVIDIA_11_VGPU_TYPE, enabled_mdev_types=[fakelibvirt.NVIDIA_11_VGPU_TYPE,
fakelibvirt.NVIDIA_12_VGPU_TYPE], fakelibvirt.NVIDIA_12_VGPU_TYPE],
group='devices') group='devices')
# we need to call the below again to ensure the updated # we need to call the below again to ensure the updated
@@ -304,8 +304,8 @@ class VGPUMultipleTypesTests(VGPUTestBase):
# - 0000:81:01.0 will only support nvidia-12 # - 0000:81:01.0 will only support nvidia-12
pgpu1_pci_addr = self.libvirt2pci_address(fakelibvirt.PGPU1_PCI_ADDR) pgpu1_pci_addr = self.libvirt2pci_address(fakelibvirt.PGPU1_PCI_ADDR)
pgpu2_pci_addr = self.libvirt2pci_address(fakelibvirt.PGPU2_PCI_ADDR) pgpu2_pci_addr = self.libvirt2pci_address(fakelibvirt.PGPU2_PCI_ADDR)
self.flags(device_addresses=[pgpu1_pci_addr], group='vgpu_nvidia-11') self.flags(device_addresses=[pgpu1_pci_addr], group='mdev_nvidia-11')
self.flags(device_addresses=[pgpu2_pci_addr], group='vgpu_nvidia-12') self.flags(device_addresses=[pgpu2_pci_addr], group='mdev_nvidia-12')
# Prepare traits for later on # Prepare traits for later on
self._create_trait('CUSTOM_NVIDIA_11') self._create_trait('CUSTOM_NVIDIA_11')
+7 -7
View File
@@ -20,15 +20,15 @@ CONF = nova.conf.CONF
class DevicesConfTestCase(test.NoDBTestCase): class DevicesConfTestCase(test.NoDBTestCase):
def test_register_dynamic_opts(self): def test_register_dynamic_opts(self):
self.flags(enabled_vgpu_types=['nvidia-11', 'nvidia-12'], self.flags(enabled_mdev_types=['nvidia-11', 'nvidia-12'],
group='devices') group='devices')
self.assertNotIn('vgpu_nvidia-11', CONF) self.assertNotIn('mdev_nvidia-11', CONF)
self.assertNotIn('vgpu_nvidia-12', CONF) self.assertNotIn('mdev_nvidia-12', CONF)
nova.conf.devices.register_dynamic_opts(CONF) nova.conf.devices.register_dynamic_opts(CONF)
self.assertIn('vgpu_nvidia-11', CONF) self.assertIn('mdev_nvidia-11', CONF)
self.assertIn('vgpu_nvidia-12', CONF) self.assertIn('mdev_nvidia-12', CONF)
self.assertEqual([], getattr(CONF, 'vgpu_nvidia-11').device_addresses) self.assertEqual([], getattr(CONF, 'mdev_nvidia-11').device_addresses)
self.assertEqual([], getattr(CONF, 'vgpu_nvidia-12').device_addresses) self.assertEqual([], getattr(CONF, 'mdev_nvidia-12').device_addresses)
+41 -41
View File
@@ -20601,7 +20601,7 @@ class TestUpdateProviderTree(test.NoDBTestCase):
def _test_update_provider_tree( def _test_update_provider_tree(
self, mock_gpu_invs, gpu_invs=None, vpmems=None): self, mock_gpu_invs, gpu_invs=None, vpmems=None):
if gpu_invs: if gpu_invs:
self.flags(enabled_vgpu_types=['nvidia-11'], group='devices') self.flags(enabled_mdev_types=['nvidia-11'], group='devices')
mock_gpu_invs.return_value = gpu_invs mock_gpu_invs.return_value = gpu_invs
if vpmems: if vpmems:
self.driver._vpmems_by_rc = vpmems self.driver._vpmems_by_rc = vpmems
@@ -20855,7 +20855,7 @@ class TestUpdateProviderTree(test.NoDBTestCase):
def test_update_provider_tree_for_vgpu_reshape( def test_update_provider_tree_for_vgpu_reshape(
self, mock_gpus, mock_get_devs, mock_get_mdev_info): self, mock_gpus, mock_get_devs, mock_get_mdev_info):
"""Tests the VGPU reshape scenario.""" """Tests the VGPU reshape scenario."""
self.flags(enabled_vgpu_types=['nvidia-11'], group='devices') self.flags(enabled_mdev_types=['nvidia-11'], group='devices')
# Let's assume we have two PCI devices each having 4 pGPUs for this # Let's assume we have two PCI devices each having 4 pGPUs for this
# type # type
pci_devices = ['pci_0000_06_00_0', 'pci_0000_07_00_0'] pci_devices = ['pci_0000_06_00_0', 'pci_0000_07_00_0']
@@ -20987,7 +20987,7 @@ class TestUpdateProviderTree(test.NoDBTestCase):
"""Tests the VGPU reshape failure scenario where VGPU allocations """Tests the VGPU reshape failure scenario where VGPU allocations
are not on the root compute node provider as expected. are not on the root compute node provider as expected.
""" """
self.flags(enabled_vgpu_types=['nvidia-11'], group='devices') self.flags(enabled_mdev_types=['nvidia-11'], group='devices')
# Let's assume we have two PCI devices each having 4 pGPUs for this # Let's assume we have two PCI devices each having 4 pGPUs for this
# type # type
pci_devices = ['pci_0000_06_00_0', 'pci_0000_07_00_0'] pci_devices = ['pci_0000_06_00_0', 'pci_0000_07_00_0']
@@ -25181,7 +25181,7 @@ class LibvirtDriverTestCase(test.NoDBTestCase, TraitsComparisonMixin):
self.assertEqual({}, drvr._get_gpu_inventories()) self.assertEqual({}, drvr._get_gpu_inventories())
# Now, set a specific GPU type and restart the driver # Now, set a specific GPU type and restart the driver
self.flags(enabled_vgpu_types=['nvidia-11'], group='devices') self.flags(enabled_mdev_types=['nvidia-11'], group='devices')
drvr = libvirt_driver.LibvirtDriver(fake.FakeVirtAPI(), False) drvr = libvirt_driver.LibvirtDriver(fake.FakeVirtAPI(), False)
expected = { expected = {
# the first GPU also has one mdev allocated against it # the first GPU also has one mdev allocated against it
@@ -25206,13 +25206,13 @@ class LibvirtDriverTestCase(test.NoDBTestCase, TraitsComparisonMixin):
@mock.patch('nova.virt.libvirt.driver.LibvirtDriver' @mock.patch('nova.virt.libvirt.driver.LibvirtDriver'
'._get_mdev_capable_devices') '._get_mdev_capable_devices')
def test_get_gpu_inventories_with_two_types(self, get_mdev_capable_devs): def test_get_gpu_inventories_with_two_types(self, get_mdev_capable_devs):
self.flags(enabled_vgpu_types=['nvidia-11', 'nvidia-12'], self.flags(enabled_mdev_types=['nvidia-11', 'nvidia-12'],
group='devices') group='devices')
# we need to call the below again to ensure the updated # we need to call the below again to ensure the updated
# 'device_addresses' value is read and the new groups created # 'device_addresses' value is read and the new groups created
nova.conf.devices.register_dynamic_opts(CONF) nova.conf.devices.register_dynamic_opts(CONF)
self.flags(device_addresses=['0000:06:00.0'], group='vgpu_nvidia-11') self.flags(device_addresses=['0000:06:00.0'], group='mdev_nvidia-11')
self.flags(device_addresses=['0000:07:00.0'], group='vgpu_nvidia-12') self.flags(device_addresses=['0000:07:00.0'], group='mdev_nvidia-12')
drvr = libvirt_driver.LibvirtDriver(fake.FakeVirtAPI(), False) drvr = libvirt_driver.LibvirtDriver(fake.FakeVirtAPI(), False)
expected = { expected = {
# the first GPU supports nvidia-11 and has one mdev with this type # the first GPU supports nvidia-11 and has one mdev with this type
@@ -25244,7 +25244,7 @@ class LibvirtDriverTestCase(test.NoDBTestCase, TraitsComparisonMixin):
self.assertEqual([], drvr._get_supported_vgpu_types()) self.assertEqual([], drvr._get_supported_vgpu_types())
# Now, provide only one supported vGPU type # Now, provide only one supported vGPU type
self.flags(enabled_vgpu_types=['nvidia-11'], group='devices') self.flags(enabled_mdev_types=['nvidia-11'], group='devices')
self.assertEqual(['nvidia-11'], drvr._get_supported_vgpu_types()) self.assertEqual(['nvidia-11'], drvr._get_supported_vgpu_types())
# Given we only support one vGPU type, we don't have any map for PCI # Given we only support one vGPU type, we don't have any map for PCI
# devices *yet* # devices *yet*
@@ -25256,18 +25256,18 @@ class LibvirtDriverTestCase(test.NoDBTestCase, TraitsComparisonMixin):
mock_warning.reset_mock() mock_warning.reset_mock()
# Now two types without forgetting to provide the pGPU addresses # Now two types without forgetting to provide the pGPU addresses
self.flags(enabled_vgpu_types=['nvidia-11', 'nvidia-12'], self.flags(enabled_mdev_types=['nvidia-11', 'nvidia-12'],
group='devices') group='devices')
# we need to call the below again to ensure the updated # we need to call the below again to ensure the updated
# 'device_addresses' value is read and the new groups created # 'device_addresses' value is read and the new groups created
nova.conf.devices.register_dynamic_opts(CONF) nova.conf.devices.register_dynamic_opts(CONF)
self.flags(device_addresses=['0000:84:00.0'], group='vgpu_nvidia-11') self.flags(device_addresses=['0000:84:00.0'], group='mdev_nvidia-11')
self.assertEqual(['nvidia-11'], drvr._get_supported_vgpu_types()) self.assertEqual(['nvidia-11'], drvr._get_supported_vgpu_types())
self.assertEqual({}, drvr.pgpu_type_mapping) self.assertEqual({}, drvr.pgpu_type_mapping)
msg = ("The vGPU type '%(type)s' was listed in '[devices] " msg = ("The mdev type '%(type)s' was listed in '[devices] "
"enabled_vgpu_types' but no corresponding " "enabled_mdev_types' but no corresponding "
"'[vgpu_%(type)s]' group or " "'[mdev_%(type)s]' group or "
"'[vgpu_%(type)s] device_addresses' " "'[mdev_%(type)s] device_addresses' "
"option was defined. Only the first type '%(ftype)s' " "option was defined. Only the first type '%(ftype)s' "
"will be used." % {'type': 'nvidia-12', "will be used." % {'type': 'nvidia-12',
'ftype': 'nvidia-11'}) 'ftype': 'nvidia-11'})
@@ -25276,8 +25276,8 @@ class LibvirtDriverTestCase(test.NoDBTestCase, TraitsComparisonMixin):
mock_warning.reset_mock() mock_warning.reset_mock()
# And now do it correctly ! # And now do it correctly !
self.flags(device_addresses=['0000:84:00.0'], group='vgpu_nvidia-11') self.flags(device_addresses=['0000:84:00.0'], group='mdev_nvidia-11')
self.flags(device_addresses=['0000:85:00.0'], group='vgpu_nvidia-12') self.flags(device_addresses=['0000:85:00.0'], group='mdev_nvidia-12')
self.assertEqual(['nvidia-11', 'nvidia-12'], self.assertEqual(['nvidia-11', 'nvidia-12'],
drvr._get_supported_vgpu_types()) drvr._get_supported_vgpu_types())
self.assertEqual({'0000:84:00.0': 'nvidia-11', self.assertEqual({'0000:84:00.0': 'nvidia-11',
@@ -25285,32 +25285,32 @@ class LibvirtDriverTestCase(test.NoDBTestCase, TraitsComparisonMixin):
mock_warning.assert_not_called() mock_warning.assert_not_called()
def test_get_supported_vgpu_types_with_duplicate_types(self): def test_get_supported_vgpu_types_with_duplicate_types(self):
self.flags(enabled_vgpu_types=['nvidia-11', 'nvidia-12'], self.flags(enabled_mdev_types=['nvidia-11', 'nvidia-12'],
group='devices') group='devices')
# we need to call the below again to ensure the updated # we need to call the below again to ensure the updated
# 'device_addresses' value is read and the new groups created # 'device_addresses' value is read and the new groups created
nova.conf.devices.register_dynamic_opts(CONF) nova.conf.devices.register_dynamic_opts(CONF)
# Provide the same pGPU PCI ID for two different types # Provide the same pGPU PCI ID for two different types
self.flags(device_addresses=['0000:84:00.0'], group='vgpu_nvidia-11') self.flags(device_addresses=['0000:84:00.0'], group='mdev_nvidia-11')
self.flags(device_addresses=['0000:84:00.0'], group='vgpu_nvidia-12') self.flags(device_addresses=['0000:84:00.0'], group='mdev_nvidia-12')
self.assertRaises(exception.InvalidLibvirtGPUConfig, self.assertRaises(exception.InvalidLibvirtGPUConfig,
libvirt_driver.LibvirtDriver, libvirt_driver.LibvirtDriver,
fake.FakeVirtAPI(), False) fake.FakeVirtAPI(), False)
def test_get_supported_vgpu_types_with_invalid_pci_address(self): def test_get_supported_vgpu_types_with_invalid_pci_address(self):
self.flags(enabled_vgpu_types=['nvidia-11'], group='devices') self.flags(enabled_mdev_types=['nvidia-11'], group='devices')
# we need to call the below again to ensure the updated # we need to call the below again to ensure the updated
# 'device_addresses' value is read and the new groups created # 'device_addresses' value is read and the new groups created
nova.conf.devices.register_dynamic_opts(CONF) nova.conf.devices.register_dynamic_opts(CONF)
# Fat-finger the PCI address # Fat-finger the PCI address
self.flags(device_addresses=['whoops'], group='vgpu_nvidia-11') self.flags(device_addresses=['whoops'], group='mdev_nvidia-11')
self.assertRaises(exception.InvalidLibvirtGPUConfig, self.assertRaises(exception.InvalidLibvirtGPUConfig,
libvirt_driver.LibvirtDriver, libvirt_driver.LibvirtDriver,
fake.FakeVirtAPI(), False) fake.FakeVirtAPI(), False)
@mock.patch.object(nova.conf.devices, 'register_dynamic_opts') @mock.patch.object(nova.conf.devices, 'register_dynamic_opts')
def test_get_supported_vgpu_types_registering_dynamic_opts(self, rdo): def test_get_supported_vgpu_types_registering_dynamic_opts(self, rdo):
self.flags(enabled_vgpu_types=['nvidia-11', 'nvidia-12'], self.flags(enabled_mdev_types=['nvidia-11', 'nvidia-12'],
group='devices') group='devices')
drvr = libvirt_driver.LibvirtDriver(fake.FakeVirtAPI(), False) drvr = libvirt_driver.LibvirtDriver(fake.FakeVirtAPI(), False)
@@ -25328,42 +25328,42 @@ class LibvirtDriverTestCase(test.NoDBTestCase, TraitsComparisonMixin):
self.assertIsNone(drvr._get_vgpu_type_per_pgpu(device)) self.assertIsNone(drvr._get_vgpu_type_per_pgpu(device))
# BY default, we return the first type if we only support one. # BY default, we return the first type if we only support one.
self.flags(enabled_vgpu_types=['nvidia-11'], group='devices') self.flags(enabled_mdev_types=['nvidia-11'], group='devices')
drvr = libvirt_driver.LibvirtDriver(fake.FakeVirtAPI(), False) drvr = libvirt_driver.LibvirtDriver(fake.FakeVirtAPI(), False)
self.assertEqual('nvidia-11', drvr._get_vgpu_type_per_pgpu(device)) self.assertEqual('nvidia-11', drvr._get_vgpu_type_per_pgpu(device))
# Now, make sure we provide the right vGPU type for the device # Now, make sure we provide the right vGPU type for the device
self.flags(enabled_vgpu_types=['nvidia-11', 'nvidia-12'], self.flags(enabled_mdev_types=['nvidia-11', 'nvidia-12'],
group='devices') group='devices')
# we need to call the below again to ensure the updated # we need to call the below again to ensure the updated
# 'device_addresses' value is read and the new groups created # 'device_addresses' value is read and the new groups created
nova.conf.devices.register_dynamic_opts(CONF) nova.conf.devices.register_dynamic_opts(CONF)
self.flags(device_addresses=['0000:84:00.0'], group='vgpu_nvidia-11') self.flags(device_addresses=['0000:84:00.0'], group='mdev_nvidia-11')
self.flags(device_addresses=['0000:85:00.0'], group='vgpu_nvidia-12') self.flags(device_addresses=['0000:85:00.0'], group='mdev_nvidia-12')
drvr = libvirt_driver.LibvirtDriver(fake.FakeVirtAPI(), False) drvr = libvirt_driver.LibvirtDriver(fake.FakeVirtAPI(), False)
# the libvirt name pci_0000_84_00_0 matches 0000:84:00.0 # the libvirt name pci_0000_84_00_0 matches 0000:84:00.0
self.assertEqual('nvidia-11', drvr._get_vgpu_type_per_pgpu(device)) self.assertEqual('nvidia-11', drvr._get_vgpu_type_per_pgpu(device))
def test_get_vgpu_type_per_pgpu_with_incorrect_pci_addr(self): def test_get_vgpu_type_per_pgpu_with_incorrect_pci_addr(self):
self.flags(enabled_vgpu_types=['nvidia-11', 'nvidia-12'], self.flags(enabled_mdev_types=['nvidia-11', 'nvidia-12'],
group='devices') group='devices')
# we need to call the below again to ensure the updated # we need to call the below again to ensure the updated
# 'device_addresses' value is read and the new groups created # 'device_addresses' value is read and the new groups created
nova.conf.devices.register_dynamic_opts(CONF) nova.conf.devices.register_dynamic_opts(CONF)
self.flags(device_addresses=['0000:84:00.0'], group='vgpu_nvidia-11') self.flags(device_addresses=['0000:84:00.0'], group='mdev_nvidia-11')
self.flags(device_addresses=['0000:85:00.0'], group='vgpu_nvidia-12') self.flags(device_addresses=['0000:85:00.0'], group='mdev_nvidia-12')
drvr = libvirt_driver.LibvirtDriver(fake.FakeVirtAPI(), False) drvr = libvirt_driver.LibvirtDriver(fake.FakeVirtAPI(), False)
# 'whoops' is not a correct libvirt name corresponding to a PCI address # 'whoops' is not a correct libvirt name corresponding to a PCI address
self.assertIsNone(drvr._get_vgpu_type_per_pgpu('whoops')) self.assertIsNone(drvr._get_vgpu_type_per_pgpu('whoops'))
def test_get_vgpu_type_per_pgpu_with_unconfigured_pgpu(self): def test_get_vgpu_type_per_pgpu_with_unconfigured_pgpu(self):
self.flags(enabled_vgpu_types=['nvidia-11', 'nvidia-12'], self.flags(enabled_mdev_types=['nvidia-11', 'nvidia-12'],
group='devices') group='devices')
# we need to call the below again to ensure the updated # we need to call the below again to ensure the updated
# 'device_addresses' value is read and the new groups created # 'device_addresses' value is read and the new groups created
nova.conf.devices.register_dynamic_opts(CONF) nova.conf.devices.register_dynamic_opts(CONF)
self.flags(device_addresses=['0000:84:00.0'], group='vgpu_nvidia-11') self.flags(device_addresses=['0000:84:00.0'], group='mdev_nvidia-11')
self.flags(device_addresses=['0000:85:00.0'], group='vgpu_nvidia-12') self.flags(device_addresses=['0000:85:00.0'], group='mdev_nvidia-12')
drvr = libvirt_driver.LibvirtDriver(fake.FakeVirtAPI(), False) drvr = libvirt_driver.LibvirtDriver(fake.FakeVirtAPI(), False)
# 0000:86:00.0 wasn't configured # 0000:86:00.0 wasn't configured
self.assertIsNone(drvr._get_vgpu_type_per_pgpu('pci_0000_86_00_0')) self.assertIsNone(drvr._get_vgpu_type_per_pgpu('pci_0000_86_00_0'))
@@ -25560,7 +25560,7 @@ class LibvirtDriverTestCase(test.NoDBTestCase, TraitsComparisonMixin):
@mock.patch.object(libvirt_driver.LibvirtDriver, @mock.patch.object(libvirt_driver.LibvirtDriver,
'_get_existing_mdevs_not_assigned') '_get_existing_mdevs_not_assigned')
def test_allocate_mdevs_with_available_mdevs(self, get_unassigned_mdevs): def test_allocate_mdevs_with_available_mdevs(self, get_unassigned_mdevs):
self.flags(enabled_vgpu_types=['nvidia-11'], group='devices') self.flags(enabled_mdev_types=['nvidia-11'], group='devices')
allocations = { allocations = {
uuids.rp1: { uuids.rp1: {
'resources': { 'resources': {
@@ -25586,13 +25586,13 @@ class LibvirtDriverTestCase(test.NoDBTestCase, TraitsComparisonMixin):
unallocated_mdevs, unallocated_mdevs,
get_mdev_capable_devs, get_mdev_capable_devs,
privsep_create_mdev): privsep_create_mdev):
self.flags(enabled_vgpu_types=['nvidia-11', 'nvidia-12'], self.flags(enabled_mdev_types=['nvidia-11', 'nvidia-12'],
group='devices') group='devices')
# we need to call the below again to ensure the updated # we need to call the below again to ensure the updated
# 'device_addresses' value is read and the new groups created # 'device_addresses' value is read and the new groups created
nova.conf.devices.register_dynamic_opts(CONF) nova.conf.devices.register_dynamic_opts(CONF)
self.flags(device_addresses=['0000:06:00.0'], group='vgpu_nvidia-11') self.flags(device_addresses=['0000:06:00.0'], group='mdev_nvidia-11')
self.flags(device_addresses=['0000:07:00.0'], group='vgpu_nvidia-12') self.flags(device_addresses=['0000:07:00.0'], group='mdev_nvidia-12')
allocations = { allocations = {
uuids.rp1: { uuids.rp1: {
'resources': { 'resources': {
@@ -25633,7 +25633,7 @@ class LibvirtDriverTestCase(test.NoDBTestCase, TraitsComparisonMixin):
unallocated_mdevs, unallocated_mdevs,
get_mdev_capable_devs, get_mdev_capable_devs,
privsep_create_mdev): privsep_create_mdev):
self.flags(enabled_vgpu_types=['nvidia-11'], group='devices') self.flags(enabled_mdev_types=['nvidia-11'], group='devices')
allocations = { allocations = {
uuids.rp1: { uuids.rp1: {
'resources': { 'resources': {
@@ -25743,13 +25743,13 @@ class LibvirtDriverTestCase(test.NoDBTestCase, TraitsComparisonMixin):
def test_recreate_mediated_device_on_init_host( def test_recreate_mediated_device_on_init_host(
self, get_all_assigned_mdevs, exists, mock_get_mdev_info, self, get_all_assigned_mdevs, exists, mock_get_mdev_info,
get_mdev_capable_devs, privsep_create_mdev): get_mdev_capable_devs, privsep_create_mdev):
self.flags(enabled_vgpu_types=['nvidia-11', 'nvidia-12'], self.flags(enabled_mdev_types=['nvidia-11', 'nvidia-12'],
group='devices') group='devices')
# we need to call the below again to ensure the updated # we need to call the below again to ensure the updated
# 'device_addresses' value is read and the new groups created # 'device_addresses' value is read and the new groups created
nova.conf.devices.register_dynamic_opts(CONF) nova.conf.devices.register_dynamic_opts(CONF)
self.flags(device_addresses=['0000:06:00.0'], group='vgpu_nvidia-11') self.flags(device_addresses=['0000:06:00.0'], group='mdev_nvidia-11')
self.flags(device_addresses=['0000:07:00.0'], group='vgpu_nvidia-12') self.flags(device_addresses=['0000:07:00.0'], group='mdev_nvidia-12')
get_all_assigned_mdevs.return_value = {uuids.mdev1: uuids.inst1, get_all_assigned_mdevs.return_value = {uuids.mdev1: uuids.inst1,
uuids.mdev2: uuids.inst2} uuids.mdev2: uuids.inst2}
@@ -25793,7 +25793,7 @@ class LibvirtDriverTestCase(test.NoDBTestCase, TraitsComparisonMixin):
'_get_all_assigned_mediated_devices') '_get_all_assigned_mediated_devices')
def test_recreate_mediated_device_on_init_host_with_wrong_config( def test_recreate_mediated_device_on_init_host_with_wrong_config(
self, get_all_assigned_mdevs, exists, mock_get_mdev_info): self, get_all_assigned_mdevs, exists, mock_get_mdev_info):
self.flags(enabled_vgpu_types=['nvidia-11', 'nvidia-12'], self.flags(enabled_mdev_types=['nvidia-11', 'nvidia-12'],
group='devices') group='devices')
get_all_assigned_mdevs.return_value = {uuids.mdev1: uuids.inst1} get_all_assigned_mdevs.return_value = {uuids.mdev1: uuids.inst1}
# We pretend this mdev doesn't exist hence it needs recreation # We pretend this mdev doesn't exist hence it needs recreation
+24 -24
View File
@@ -7382,26 +7382,26 @@ class LibvirtDriver(driver.ComputeDriver):
return total return total
def _get_supported_vgpu_types(self): def _get_supported_vgpu_types(self):
if not CONF.devices.enabled_vgpu_types: if not CONF.devices.enabled_mdev_types:
return [] return []
# Make sure we register all the types as the compute service could # Make sure we register all the types as the compute service could
# be calling this method before init_host() # be calling this method before init_host()
if len(CONF.devices.enabled_vgpu_types) > 1: if len(CONF.devices.enabled_mdev_types) > 1:
nova.conf.devices.register_dynamic_opts(CONF) nova.conf.devices.register_dynamic_opts(CONF)
for vgpu_type in CONF.devices.enabled_vgpu_types: for vgpu_type in CONF.devices.enabled_mdev_types:
group = getattr(CONF, 'vgpu_%s' % vgpu_type, None) group = getattr(CONF, 'mdev_%s' % vgpu_type, None)
if group is None or not group.device_addresses: if group is None or not group.device_addresses:
first_type = CONF.devices.enabled_vgpu_types[0] first_type = CONF.devices.enabled_mdev_types[0]
if len(CONF.devices.enabled_vgpu_types) > 1: if len(CONF.devices.enabled_mdev_types) > 1:
# Only provide the warning if the operator provided more # Only provide the warning if the operator provided more
# than one type as it's not needed to provide groups # than one type as it's not needed to provide groups
# if you only use one vGPU type. # if you only use one vGPU type.
msg = ("The vGPU type '%(type)s' was listed in '[devices] " msg = ("The mdev type '%(type)s' was listed in '[devices] "
"enabled_vgpu_types' but no corresponding " "enabled_mdev_types' but no corresponding "
"'[vgpu_%(type)s]' group or " "'[mdev_%(type)s]' group or "
"'[vgpu_%(type)s] device_addresses' " "'[mdev_%(type)s] device_addresses' "
"option was defined. Only the first type " "option was defined. Only the first type "
"'%(ftype)s' will be used." % {'type': vgpu_type, "'%(ftype)s' will be used." % {'type': vgpu_type,
'ftype': first_type}) 'ftype': first_type})
@@ -7426,7 +7426,7 @@ class LibvirtDriver(driver.ComputeDriver):
reason="incorrect PCI address: %s" % device_address reason="incorrect PCI address: %s" % device_address
) )
self.pgpu_type_mapping[device_address] = vgpu_type self.pgpu_type_mapping[device_address] = vgpu_type
return CONF.devices.enabled_vgpu_types return CONF.devices.enabled_mdev_types
def _get_vgpu_type_per_pgpu(self, device_address): def _get_vgpu_type_per_pgpu(self, device_address):
"""Provides the vGPU type the pGPU supports. """Provides the vGPU type the pGPU supports.
@@ -7464,19 +7464,19 @@ class LibvirtDriver(driver.ComputeDriver):
# in case we can't find a specific pGPU # in case we can't find a specific pGPU
return return
def _count_mediated_devices(self, enabled_vgpu_types): def _count_mediated_devices(self, enabled_mdev_types):
"""Counts the sysfs objects (handles) that represent a mediated device """Counts the sysfs objects (handles) that represent a mediated device
and filtered by $enabled_vgpu_types. and filtered by $enabled_mdev_types.
Those handles can be in use by a libvirt guest or not. Those handles can be in use by a libvirt guest or not.
:param enabled_vgpu_types: list of enabled VGPU types on this host :param enabled_mdev_types: list of enabled VGPU types on this host
:returns: dict, keyed by parent GPU libvirt PCI device ID, of number of :returns: dict, keyed by parent GPU libvirt PCI device ID, of number of
mdev device handles for that GPU mdev device handles for that GPU
""" """
counts_per_parent: ty.Dict[str, int] = collections.defaultdict(int) counts_per_parent: ty.Dict[str, int] = collections.defaultdict(int)
mediated_devices = self._get_mediated_devices(types=enabled_vgpu_types) mediated_devices = self._get_mediated_devices(types=enabled_mdev_types)
for mdev in mediated_devices: for mdev in mediated_devices:
parent_vgpu_type = self._get_vgpu_type_per_pgpu(mdev['parent']) parent_vgpu_type = self._get_vgpu_type_per_pgpu(mdev['parent'])
if mdev['type'] != parent_vgpu_type: if mdev['type'] != parent_vgpu_type:
@@ -7487,16 +7487,16 @@ class LibvirtDriver(driver.ComputeDriver):
counts_per_parent[mdev['parent']] += 1 counts_per_parent[mdev['parent']] += 1
return counts_per_parent return counts_per_parent
def _count_mdev_capable_devices(self, enabled_vgpu_types): def _count_mdev_capable_devices(self, enabled_mdev_types):
"""Counts the mdev-capable devices on this host filtered by """Counts the mdev-capable devices on this host filtered by
$enabled_vgpu_types. $enabled_mdev_types.
:param enabled_vgpu_types: list of enabled VGPU types on this host :param enabled_mdev_types: list of enabled VGPU types on this host
:returns: dict, keyed by device name, to an integer count of available :returns: dict, keyed by device name, to an integer count of available
instances of each type per device instances of each type per device
""" """
mdev_capable_devices = self._get_mdev_capable_devices( mdev_capable_devices = self._get_mdev_capable_devices(
types=enabled_vgpu_types) types=enabled_mdev_types)
counts_per_dev: ty.Dict[str, int] = collections.defaultdict(int) counts_per_dev: ty.Dict[str, int] = collections.defaultdict(int)
for dev in mdev_capable_devices: for dev in mdev_capable_devices:
# dev_id is the libvirt name for the PCI device, # dev_id is the libvirt name for the PCI device,
@@ -7516,7 +7516,7 @@ class LibvirtDriver(driver.ComputeDriver):
def _get_gpu_inventories(self): def _get_gpu_inventories(self):
"""Returns the inventories for each physical GPU for a specific type """Returns the inventories for each physical GPU for a specific type
supported by the enabled_vgpu_types CONF option. supported by the enabled_mdev_types CONF option.
:returns: dict, keyed by libvirt PCI name, of dicts like: :returns: dict, keyed by libvirt PCI name, of dicts like:
{'pci_0000_84_00_0': {'pci_0000_84_00_0':
@@ -7531,16 +7531,16 @@ class LibvirtDriver(driver.ComputeDriver):
""" """
# Bail out early if operator doesn't care about providing vGPUs # Bail out early if operator doesn't care about providing vGPUs
enabled_vgpu_types = self.supported_vgpu_types enabled_mdev_types = self.supported_vgpu_types
if not enabled_vgpu_types: if not enabled_mdev_types:
return {} return {}
inventories = {} inventories = {}
count_per_parent = self._count_mediated_devices(enabled_vgpu_types) count_per_parent = self._count_mediated_devices(enabled_mdev_types)
for dev_name, count in count_per_parent.items(): for dev_name, count in count_per_parent.items():
inventories[dev_name] = {'total': count} inventories[dev_name] = {'total': count}
# Filter how many available mdevs we can create for all the supported # Filter how many available mdevs we can create for all the supported
# types. # types.
count_per_dev = self._count_mdev_capable_devices(enabled_vgpu_types) count_per_dev = self._count_mdev_capable_devices(enabled_mdev_types)
# Combine the counts into the dict that we return to the caller. # Combine the counts into the dict that we return to the caller.
for dev_name, count in count_per_dev.items(): for dev_name, count in count_per_dev.items():
inv_per_parent = inventories.setdefault( inv_per_parent = inventories.setdefault(
@@ -0,0 +1,13 @@
---
deprecations:
- |
The existing config options in the ``[devices]`` group for managing virtual
GPUs are now renamed in order to be more generic since the mediated devices
framework from the linux kernel can support other devices:
- ``enabled_vgpu_types`` is now deprecated in favour of
``enabled_mdev_types``
- Dynamic configuration groups called ``[vgpu_*]`` are now deprecated in
favour of ``[mdev_*]``
Support for the deprecated options will be removed in a future release.