Commit Graph

61316 Commits

Author SHA1 Message Date
Dan Smith 28a266461a Support "one-time-use" PCI devices
This adds support for devices that will be allocated to an instance
once and left in a reserved=total state. An external workflow can
put them back into allocatable state by dropping reserved back to
zero. Note this requires PCI-in-placement tracking for the affected
devices and it is only valid for type-PCI and type-PF devices.

Related to blueprint one-time-use-devices

Depends-On: https://review.opendev.org/c/openstack/requirements/+/946181
Co-Authored-By: Balazs Gibizer <gibi@redhat.com>
Change-Id: Idfe8a746a97d68cd4eae30afb7d22f4e3af80327
2025-04-02 11:53:36 -07:00
Dan Smith c5efabbd07 Invalidate PCI-in-placement cached RPs during claim
This makes us invalidate our cache of the PCI-in-placement resource
providers when we go to do instance_claim(). This is not technically
required right now, but is setup for the next patch where we will
update that inventory during claim and we need to make sure we are
working with the latest version. Without this, we may consider a
cached version of the inventory to be the same as the proposed one,
and thus not actually update placement when we need to. Since PCI-in-
placement was designed to tolerate external changes to the inventory
(especially/explicitly changing the reserved count), we need to be
careful not to allow our cache to prevent us from taking the action
we intend.

Related to blueprint one-time-use-devices

Change-Id: I89039328af7a2d2e6a4128dd08dbe8e97ecb16cd
2025-04-01 07:42:33 -07:00
Dan Smith ba00d60b95 Extend invalidate_rp to only invalidate cache
This makes invalidate_resource_provider() have a cacheonly flag that
only invalidates our cache, but does not remove the provider from the
tree for efficiency.

Related to blueprint one-time-use-devices

Change-Id: I04dd5e984c5671d866804c258422e4230fce37b7
2025-03-27 08:54:11 -07:00
René Ribaud 98226b60f3 FUP: Improve libvirt fixture for hostdevs
This patch enhances the libvirt fixture to better align with the real
libvirt output when handling hostdevs.

It adds the alias tag, which libvirt provides to specify the hostdev
name, and the address tag, which indicates the address seen by
the guest.

These two fields will be used in a subsequent patch to improve the
comparison between source and destination XMLs during migration.

Example:

<hostdev mode='subsystem' type='pci' managed='yes'>
  <source>
    <address domain='0x0000' bus='0x82' slot='0x00' function='0x1'/>
  </source>
  <alias name='hostdev0'/>
  <address type='pci' domain='0x0000' bus='0x00' slot='0x05'
  function='0x0'/>
</hostdev>

The target goal of these series of patch is to enable VFIO devices
migration with kernel variant drivers.

Implements: blueprint migrate-vfio-devices-using-kernel-variant-drivers
Change-Id: I3ee3923f990dd6522a11849551a9d49c9fad426c
2025-03-26 10:04:37 +01:00
René Ribaud c6a96a17db FUP Update pci-passthrough and virtual-gpu documentation
This patch adds the necessary documentation identified in:

- pci-passthrough: Explaining live migration and known issues.
- virtual-gpu: Updating the caveats section to clarify what to do
  when VF devices are available instead of `mdev`.

The target goal of these series of patch is to enable VFIO devices
migration with kernel variant drivers.

Implements: blueprint migrate-vfio-devices-using-kernel-variant-drivers
Change-Id: I41271a8af5687fb1d18f9d0852492756e096720d
2025-03-26 10:02:41 +01:00
René Ribaud 28f82ba912 FUP Add a warning to make non-explicit live migration request debugging easier
Today, when a user does not request live-migratable devices, the
migration should fail.
However, this failure is hard to detect because the end result is a
NoValidHost error when Nova exhausts its reschedule attempts. As a
result, it is difficult to determine why scheduling failed.

This patch adds a warning to aid in debugging and identifying the
root cause more easily.

The target goal of these series of patch is to enable VFIO devices
migration with kernel variant drivers.

Implements: blueprint migrate-vfio-devices-using-kernel-variant-drivers
Change-Id: I64448f30e5d692396c129d9239679e74051cde7f
2025-03-26 10:02:41 +01:00
René Ribaud 5ac94abfdb FUP improve comment accuracy and variable naming for tag removal
This patch updates an incorrect comment to reflect the correct
behavior. It also improves variable naming for tags that need to be
removed from the device specifications.

The target goal of these series of patch is to enable VFIO devices
migration with kernel variant drivers.

Implements: blueprint migrate-vfio-devices-using-kernel-variant-drivers
Change-Id: I0ae1da59014725aa0065a7f4cfa629367fa5eaeb
2025-03-26 10:02:41 +01:00
René Ribaud 4e4262cd3d FUP Remove unnecessary PCI check
This patch removes the _test_pci() method, which is no longer necessary
since flavor-based requests can now be live migrated.

The related tests have also been removed.

This fixes a bug where a user requests a live migration with a
flavor-based request and NUMA constraints (e.g., CPU affinity). In
this case, the code encounters the _test_pci() method and fails because
the check was originally designed to enforce port-based requests only,
causing an unnecessary failure.

Notes: This issue was discovered through functional tests that involve
a mix of port-based and flavor-based requests. The failure in this
scenario highlighted the unnecessary constraint.
A functional test reproducing this issue in a mixed-mode scenario
(port request + flavor-based request) will be provided in a subsequent
FUP patch.

The _test_pci() check was redundant, as a similar verification
is already performed earlier in the migration process.

Closes-Bug: 2103636
Implements: blueprint migrate-vfio-devices-using-kernel-variant-drivers
Change-Id: Icbeaadd94658ed44917d724446d484f6497f29e5
2025-03-26 10:02:41 +01:00
Zuul 725a307693 Merge "Update master for stable/2025.1" 2025-03-25 15:57:34 +00:00
Zuul 778be04d4e Merge "Fix case-sensitivity for metadata keys" 2025-03-25 15:57:27 +00:00
Zuul 76e3b573ba Merge "Fix case sensitive comparison" 2025-03-25 15:56:01 +00:00
Zuul caa379116e Merge "wrap wsgi_app.init_application with latch_error_on_raise" 2025-03-25 04:35:20 +00:00
Zuul 9d05000bb8 Merge "unified limits: discover service ID and region ID" 2025-03-25 01:33:36 +00:00
Sean Mooney 8dcbbe43e7 wrap wsgi_app.init_application with latch_error_on_raise
This change adds a latch_error_on_raise decorator which
is applied to the init_applciation function in our
common wsgi_app module.

This decorator will catch all non retryable exceptions
and cause future invocations of the function to always
return that same exception forever.

a reset function is also added to the decorated function
which should be called in our bases test class to
prevent cross test interactons.

Closes-Bug: #2103811
Related-Bug: #1882094
Change-Id: I44b1f7e2acc36a5b557d6d8788f6099f52bbdfb8
2025-03-24 23:37:12 +00:00
Zuul 76c3c4c1bd Merge "Ignore metadata tags in pci/stats _find_pool logic" 2025-03-19 22:04:07 +00:00
Zuul c66d5735b0 Merge "Reproduce bug/2098496" 2025-03-19 19:26:09 +00:00
Balazs Gibizer 229fb3513a Ignore metadata tags in pci/stats _find_pool logic
The stats module uses the _find_pool() call to find a matching pool for
a new device or a device that is being deallocated. If no existing pool
matches with the dev then then a new pool is created for it. The
pool matching logic was faulty as it did not remove all the metadata
keys from the pool like rp_uuid. So if the dev did not have that key but
the pool did then the dev did not match.

On the other hand the PCI allocation logic (when PCI in Placement is
enabled) assumed that devices from a single rp_uuid are always in a
single pool. As this assumption was broken by the above bug the PCI
allocation blindly tried to allocate resources for an rp_uuid from each
matching pool causing overallocation.

The main fix in this patch is to ignore the metadata tags in
_find_pool(). But also two safety net are added to the allocation logic.
The logic now asserts that the assumption is correct and if not (i.e. it
found multiple pools with the same rp_uuid) then it bails out. It also
does not ever blindly allocate the same rp_uuid request from multiple
pools.

Closes-Bug: #2098496
Change-Id: I9678230397fa1a3c735ee01ed756d5af3b4e1191
2025-03-19 18:25:59 +01:00
Pierre Riteau 9b7809b289 Fix missing backtick in configuration option help
Change-Id: I00207d1837ba419f0dd5325ee5cbaeb678ad541b
2025-03-19 14:42:51 +01:00
OpenStack Release Bot 932d2334c2 Update master for stable/2025.1
Add file to the reno documentation build to show release notes for
stable/2025.1.

Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/2025.1.

Sem-Ver: feature
Change-Id: Iba42aa129140dc494d99dede17f5ea7b44062d62
2025-03-18 16:27:31 +00:00
Zuul 6042300453 Merge "Bump MIN_{LIBVIRT,QEMU} for "Epoxy"" 2025-03-18 12:43:44 +00:00
Zuul fc339b2559 Merge "Add Epoxy prelude section" 2025-03-18 10:06:15 +00:00
Sylvain Bauza 8197f7d5a6 Add Epoxy prelude section
Shamelessly copied from the cycle highlights

Change-Id: I9c949db80ad795d67e75c464eec6cc683e80f4af
2025-03-18 09:19:00 +01:00
Balazs Gibizer 32afd0c644 Reproduce bug/2098496
Related-Bug: #2098496
Change-Id: I9a3091662fb5d1d0a41dbeff56d9680a748fe312
2025-03-17 17:50:48 +01:00
Zuul 6b45672b23 Merge "Update compute rpc alias for epoxy" 2025-03-17 13:56:55 +00:00
Zuul 1e1b74467d Merge "doc: mark the maximum microversion for 2025.1 Epoxy" 2025-03-13 12:32:33 +00:00
Zuul f71a0a6204 Merge "Fix serial console for ironic" 2025-03-12 12:26:06 +00:00
Sylvain Bauza 0d484ce37d Add service version for Epoxy
We agreed by I2dd906f34118da02783bb7755e0d6c2a2b88eb5d  on the support
envelope.
Pre-RC1, we need to add a service version in the object.
Post-RC1, depending on whether it's SLURP or not SLURP, we need to bump
the minimum version or not.

This patch only focuses on pre-RC1 stage.
Given Flamingo will be skippable, we will need a post-RC1 patch for updating the min
that will bump to Epoxy.

HTH.

Change-Id: Id74ebfeaaac7bd116b11ff7bdd86674feb825f0f
2025-03-11 11:38:40 +01:00
Zuul a329c103cb Merge "Update driver to map the targeted address for SR-IOV PCI devices" 2025-03-10 20:20:19 +00:00
Zuul a0a83640b9 Merge "Update libvirt fixtures to support hostdevs" 2025-03-10 20:12:34 +00:00
Sylvain Bauza a1a118c9f0 Update compute rpc alias for epoxy
This adds an alias for Epoxy

Change-Id: I2d2b5f80c13524e7aa8278029d0343d12f6d61fd
2025-03-10 16:08:22 +01:00
Sylvain Bauza 4a5e67cff7 doc: mark the maximum microversion for 2025.1 Epoxy
We need it for this release.

Change-Id: Ibc70045dbdd1b28bf94fd1bec1fac033fae84e26
2025-03-10 16:05:28 +01:00
Zuul fd1ad4d582 Merge "Update conductor and filters allowing migration with SR-IOV devices" 2025-03-10 14:36:36 +00:00
Zuul 6e51c83d28 Merge "Fix parameter order in add_instance_info_to_node" 2025-03-10 14:09:22 +00:00
Zuul d1c94e25b6 Merge "api: Address TODO in microversion v2.99" 2025-03-10 13:46:44 +00:00
Zuul 5f3133efc0 Merge "api: project/tenant and user IDs are not UUIDs" 2025-03-10 13:46:38 +00:00
Zuul 2cf4667780 Merge "libvirt: fix maxphysaddr passthrough dom parsing" 2025-03-10 12:13:36 +00:00
melanie witt eb3a803cd7 unified limits: discover service ID and region ID
In oslo.limit 2.6.0 service endpoint discovery was added, provided by
three new config options:

  [oslo_limit]
  endpoint_service_type = ...
  endpoint_service_name = ...
  endpoint_region_name = ...

We can use the same config options if they are present to lookup the
service ID and region ID we need when calling the
GET /registered_limits API as part of the resource limit enforcement
strategy. This way, the user will not have to configure endpoint_id.

This will look for [oslo.limit]endpoint_id first and if it is not set,
it will do the discovery.

Closes-Bug: #1931875

Change-Id: Ida14303115e00a1460e6bef4b6d25fc68f343a4e
2025-03-07 17:18:30 -08:00
Zuul 0bbb1d15f4 Merge "Update manager to allow vfio pci device live migration" 2025-03-07 20:17:49 +00:00
Zuul 276685b3db Merge "api: Add response body schemas for for console auth token APIs (v2.99)" 2025-03-06 20:37:31 +00:00
Michael Still 0954ec9e5c Don't calculate the minimum compute version repeatedly.
I have chosen to do a bit of a cleanup of the lookup of
minimum compute manager versions, I didn't like how we looked up
the minimum version several times for a single parent call for
both create and resize.

Change-Id: Ifc52d73b1328d3785e72be2c5cf741962c2b95da
2025-03-06 18:26:02 +11:00
Vasyl Saienko bf8883ca3b Fix serial console for ironic
Allign code after we switched to openstacksdk in ironic virt driver
related to serial console.

Closes-Bug: #2099872

Depends-On: https://review.opendev.org/c/openstack/requirements/+/942889

Change-Id: Ic25c5e8b9ac9cf87f4f96c9956140aa4f6576ded
2025-03-05 05:07:55 +00:00
Zuul 29d17552a7 Merge "Add live_migratable flag to PCI device specification" 2025-03-04 20:24:52 +00:00
Zuul e1b33cdf0c Merge "Augment the LiveMigrateData object" 2025-03-04 20:24:46 +00:00
Stephen Finucane 244f9b0ad1 api: Address TODO in microversion v2.99
There's a TODO to prevent passing random query strings to the
'/os-console-auth-tokens' API that should be addressed while we are
updating the API. Do it now.

Change-Id: Ic19f75b1e26ae048df110f6cd9217b706bf3c0a4
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
2025-03-04 17:13:36 +00:00
Stephen Finucane 244ff89060 tests: Filter out eventlet deprecation warnings
These are *super* annoying (and useless to boot, since there is nothing
we can do about them in the near term). Shut them ⬇️⬇️⬇️ down ⬇️⬇️⬇️.

Change-Id: I469dafa243b95749b34503c1f3e905d9d8c780d4
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
2025-03-04 15:44:44 +00:00
Stephen Finucane 8f6b14bada api: project/tenant and user IDs are not UUIDs
Who knew?

Change-Id: Id3366ce2897cfcb1678034c3d24d809d8c24c43a
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
2025-03-04 13:19:56 +00:00
Stephen Finucane 401ca73c26 api: Add response body schemas for for console auth token APIs (v2.99)
These were not added in change I1e701cbabc0e2c435685e31465159eec09e3b1a0
as they should have been. In addition, said change regressed some unit
tests by reverting changes for that should be UUIDs back to non-UUIDs.

A future change, Ia5e4c6cadb6c88ccdf7e89566573f1f89087fbe5, will prevent
this happening again.

Change-Id: I2a50750848f8571df7cdbaf39f2168e355220c25
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
2025-03-04 13:19:19 +00:00
Zuul 406eedb1ce Merge "Fix microversion 2.98 doc/tests for update/rebuild APIs" 2025-03-04 01:45:13 +00:00
Zuul cf326d4375 Merge "Fix microversion 2.96 for update/rebuild APIs" 2025-03-04 01:38:24 +00:00
René Ribaud fd656f3943 Update driver to map the targeted address for SR-IOV PCI devices
This patch checks the revision of QEMU and libvirt to ensure support
for VFIO SR-IOV device migration.
It also updates the _live_migration_operation() function, particularly
the get_updated_guest_xml() function, to map source PCI addresses
to destination addresses in the destination XML file, using the data
provided by the LiveMigrateData object.

The target goal of these series of patch is to enable VFIO devices
migration with kernel variant drivers.

Partially-Implements: blueprint migrate-vfio-devices-using-kernel-variant-drivers
Change-Id: I62ec475988eab8de948498f50d8d4c0d47321102
2025-03-03 20:50:35 +01:00