This adds support for devices that will be allocated to an instance
once and left in a reserved=total state. An external workflow can
put them back into allocatable state by dropping reserved back to
zero. Note this requires PCI-in-placement tracking for the affected
devices and it is only valid for type-PCI and type-PF devices.
Related to blueprint one-time-use-devices
Depends-On: https://review.opendev.org/c/openstack/requirements/+/946181
Co-Authored-By: Balazs Gibizer <gibi@redhat.com>
Change-Id: Idfe8a746a97d68cd4eae30afb7d22f4e3af80327
This makes us invalidate our cache of the PCI-in-placement resource
providers when we go to do instance_claim(). This is not technically
required right now, but is setup for the next patch where we will
update that inventory during claim and we need to make sure we are
working with the latest version. Without this, we may consider a
cached version of the inventory to be the same as the proposed one,
and thus not actually update placement when we need to. Since PCI-in-
placement was designed to tolerate external changes to the inventory
(especially/explicitly changing the reserved count), we need to be
careful not to allow our cache to prevent us from taking the action
we intend.
Related to blueprint one-time-use-devices
Change-Id: I89039328af7a2d2e6a4128dd08dbe8e97ecb16cd
This makes invalidate_resource_provider() have a cacheonly flag that
only invalidates our cache, but does not remove the provider from the
tree for efficiency.
Related to blueprint one-time-use-devices
Change-Id: I04dd5e984c5671d866804c258422e4230fce37b7
This patch enhances the libvirt fixture to better align with the real
libvirt output when handling hostdevs.
It adds the alias tag, which libvirt provides to specify the hostdev
name, and the address tag, which indicates the address seen by
the guest.
These two fields will be used in a subsequent patch to improve the
comparison between source and destination XMLs during migration.
Example:
<hostdev mode='subsystem' type='pci' managed='yes'>
<source>
<address domain='0x0000' bus='0x82' slot='0x00' function='0x1'/>
</source>
<alias name='hostdev0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05'
function='0x0'/>
</hostdev>
The target goal of these series of patch is to enable VFIO devices
migration with kernel variant drivers.
Implements: blueprint migrate-vfio-devices-using-kernel-variant-drivers
Change-Id: I3ee3923f990dd6522a11849551a9d49c9fad426c
This patch adds the necessary documentation identified in:
- pci-passthrough: Explaining live migration and known issues.
- virtual-gpu: Updating the caveats section to clarify what to do
when VF devices are available instead of `mdev`.
The target goal of these series of patch is to enable VFIO devices
migration with kernel variant drivers.
Implements: blueprint migrate-vfio-devices-using-kernel-variant-drivers
Change-Id: I41271a8af5687fb1d18f9d0852492756e096720d
Today, when a user does not request live-migratable devices, the
migration should fail.
However, this failure is hard to detect because the end result is a
NoValidHost error when Nova exhausts its reschedule attempts. As a
result, it is difficult to determine why scheduling failed.
This patch adds a warning to aid in debugging and identifying the
root cause more easily.
The target goal of these series of patch is to enable VFIO devices
migration with kernel variant drivers.
Implements: blueprint migrate-vfio-devices-using-kernel-variant-drivers
Change-Id: I64448f30e5d692396c129d9239679e74051cde7f
This patch updates an incorrect comment to reflect the correct
behavior. It also improves variable naming for tags that need to be
removed from the device specifications.
The target goal of these series of patch is to enable VFIO devices
migration with kernel variant drivers.
Implements: blueprint migrate-vfio-devices-using-kernel-variant-drivers
Change-Id: I0ae1da59014725aa0065a7f4cfa629367fa5eaeb
This patch removes the _test_pci() method, which is no longer necessary
since flavor-based requests can now be live migrated.
The related tests have also been removed.
This fixes a bug where a user requests a live migration with a
flavor-based request and NUMA constraints (e.g., CPU affinity). In
this case, the code encounters the _test_pci() method and fails because
the check was originally designed to enforce port-based requests only,
causing an unnecessary failure.
Notes: This issue was discovered through functional tests that involve
a mix of port-based and flavor-based requests. The failure in this
scenario highlighted the unnecessary constraint.
A functional test reproducing this issue in a mixed-mode scenario
(port request + flavor-based request) will be provided in a subsequent
FUP patch.
The _test_pci() check was redundant, as a similar verification
is already performed earlier in the migration process.
Closes-Bug: 2103636
Implements: blueprint migrate-vfio-devices-using-kernel-variant-drivers
Change-Id: Icbeaadd94658ed44917d724446d484f6497f29e5
This change adds a latch_error_on_raise decorator which
is applied to the init_applciation function in our
common wsgi_app module.
This decorator will catch all non retryable exceptions
and cause future invocations of the function to always
return that same exception forever.
a reset function is also added to the decorated function
which should be called in our bases test class to
prevent cross test interactons.
Closes-Bug: #2103811
Related-Bug: #1882094
Change-Id: I44b1f7e2acc36a5b557d6d8788f6099f52bbdfb8
The stats module uses the _find_pool() call to find a matching pool for
a new device or a device that is being deallocated. If no existing pool
matches with the dev then then a new pool is created for it. The
pool matching logic was faulty as it did not remove all the metadata
keys from the pool like rp_uuid. So if the dev did not have that key but
the pool did then the dev did not match.
On the other hand the PCI allocation logic (when PCI in Placement is
enabled) assumed that devices from a single rp_uuid are always in a
single pool. As this assumption was broken by the above bug the PCI
allocation blindly tried to allocate resources for an rp_uuid from each
matching pool causing overallocation.
The main fix in this patch is to ignore the metadata tags in
_find_pool(). But also two safety net are added to the allocation logic.
The logic now asserts that the assumption is correct and if not (i.e. it
found multiple pools with the same rp_uuid) then it bails out. It also
does not ever blindly allocate the same rp_uuid request from multiple
pools.
Closes-Bug: #2098496
Change-Id: I9678230397fa1a3c735ee01ed756d5af3b4e1191
Add file to the reno documentation build to show release notes for
stable/2025.1.
Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/2025.1.
Sem-Ver: feature
Change-Id: Iba42aa129140dc494d99dede17f5ea7b44062d62
We agreed by I2dd906f34118da02783bb7755e0d6c2a2b88eb5d on the support
envelope.
Pre-RC1, we need to add a service version in the object.
Post-RC1, depending on whether it's SLURP or not SLURP, we need to bump
the minimum version or not.
This patch only focuses on pre-RC1 stage.
Given Flamingo will be skippable, we will need a post-RC1 patch for updating the min
that will bump to Epoxy.
HTH.
Change-Id: Id74ebfeaaac7bd116b11ff7bdd86674feb825f0f
In oslo.limit 2.6.0 service endpoint discovery was added, provided by
three new config options:
[oslo_limit]
endpoint_service_type = ...
endpoint_service_name = ...
endpoint_region_name = ...
We can use the same config options if they are present to lookup the
service ID and region ID we need when calling the
GET /registered_limits API as part of the resource limit enforcement
strategy. This way, the user will not have to configure endpoint_id.
This will look for [oslo.limit]endpoint_id first and if it is not set,
it will do the discovery.
Closes-Bug: #1931875
Change-Id: Ida14303115e00a1460e6bef4b6d25fc68f343a4e
I have chosen to do a bit of a cleanup of the lookup of
minimum compute manager versions, I didn't like how we looked up
the minimum version several times for a single parent call for
both create and resize.
Change-Id: Ifc52d73b1328d3785e72be2c5cf741962c2b95da
There's a TODO to prevent passing random query strings to the
'/os-console-auth-tokens' API that should be addressed while we are
updating the API. Do it now.
Change-Id: Ic19f75b1e26ae048df110f6cd9217b706bf3c0a4
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
These are *super* annoying (and useless to boot, since there is nothing
we can do about them in the near term). Shut them ⬇️⬇️⬇️ down ⬇️⬇️⬇️.
Change-Id: I469dafa243b95749b34503c1f3e905d9d8c780d4
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
These were not added in change I1e701cbabc0e2c435685e31465159eec09e3b1a0
as they should have been. In addition, said change regressed some unit
tests by reverting changes for that should be UUIDs back to non-UUIDs.
A future change, Ia5e4c6cadb6c88ccdf7e89566573f1f89087fbe5, will prevent
this happening again.
Change-Id: I2a50750848f8571df7cdbaf39f2168e355220c25
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
This patch checks the revision of QEMU and libvirt to ensure support
for VFIO SR-IOV device migration.
It also updates the _live_migration_operation() function, particularly
the get_updated_guest_xml() function, to map source PCI addresses
to destination addresses in the destination XML file, using the data
provided by the LiveMigrateData object.
The target goal of these series of patch is to enable VFIO devices
migration with kernel variant drivers.
Partially-Implements: blueprint migrate-vfio-devices-using-kernel-variant-drivers
Change-Id: I62ec475988eab8de948498f50d8d4c0d47321102