Commit Graph

61822 Commits

Author SHA1 Message Date
Zuul 23cad1dd7d Merge "Update start_service() function in test" 2025-10-08 11:35:30 +00:00
Julien Le Jeune 38d1b14170 Update start_service() function in test
Update the 'mapped' field of the created node to be coherent with
what it's done in _check_and_create_node_host_mappings function [1].

[1] https://opendev.org/openstack/nova/src/commit/cc742602bcdeff185ff120452e4f301398f6aa7b/nova/objects/host_mapping.py#L209

Related-Bug: #2085135
Change-Id: I9965932adc521756e4583d1bcfc75c83cc630626
Signed-off-by: Julien Le Jeune <julien.le-jeune@ovhcloud.com>
2025-10-08 10:32:24 +02:00
Zuul 3ed740eabd Merge "[nova-tox-py312-threading]Ignore failing tests" 2025-10-07 20:36:17 +00:00
Zuul 7278e661a4 Merge "doc: Fix typo in nova-manage command" 2025-10-07 12:10:58 +00:00
Sylvain Bauza 04afc452b3 Add a regression test for ImagePropsWeigher
The weigher is unable to get the right image metadata for existing
instances if they are not already on the HostState.

Change-Id: I5bccf854662ecffe1d469bacc6e4afcb746d6b4d
Signed-off-by: Sylvain Bauza <sbauza@redhat.com>
2025-10-06 18:39:06 +02:00
Zuul cc742602bc Merge "Run nova-conductor in native threading mode" 2025-10-02 15:55:16 +00:00
Zuul 4ccdec1ac4 Merge "Switch nova-conductor to use ThreadPoolExecutor" 2025-10-02 13:40:26 +00:00
Zuul e8ebbd5417 Merge "tests: Replace keystoneclient with keystoneauth1" 2025-10-02 12:07:49 +00:00
Zuul 1508cb39a2 Merge "[hacking] N374 do not use time.sleep(0) to yield" 2025-10-01 20:24:32 +00:00
Zuul 2928d53dca Merge "Centralize cooperative yield" 2025-10-01 20:04:43 +00:00
melanie witt 787d2a1300 Move cleanup of vTPM secret from driver to compute
Currently, vTPM secrets are deleted from Barbican any time instance
disks are deleted when driver.destroy() is called. This is fine if the
instance is also being deleted but if it's not, such as during a resize
revert, it will fail with the following error:

  nova.exception.Invalid: Refusing to create an emulated TPM with no
    secret!

which will bubble up to the API as a HTTP 500.

This moves deletion of the vTPM secret from Barbican from the libvirt
driver destroy() path to the compute manager _delete_instance() path so
that the vTPM secret is deleted only if the instance is being deleted.

Closes-Bug: #2125030
Change-Id: I1a43dc0502e1e65b4ef0348610f5eddb43dbff02
Signed-off-by: melanie witt <melwittt@gmail.com>
2025-10-01 01:55:28 +00:00
Balazs Gibizer d265faed2c [hacking] N374 do not use time.sleep(0) to yield
We have a centralized nova.utils.cooperative_yield() instead of
time.sleep(0). It is better as it allows to turn off the sleep calls
when the service runs in threaded mode.

Change-Id: I625daec79ee5b7f8b92116f450e21f997cef0546
Signed-off-by: Balazs Gibizer <gibi@redhat.com>
2025-09-24 15:38:26 +02:00
Pierre Riteau 7cae672a74 doc: Fix typo in nova-manage command
Change-Id: Id9ba4e984418b9da20b5be313485d8892ef98c0e
Signed-off-by: Pierre Riteau <pierre@stackhpc.com>
2025-09-23 09:14:52 +02:00
melanie witt 650772d97e Add functional reproducer for bug 2125030
This reproduces the bug where an attempt to revert a resize from a
flavor with vTPM to a different flavor with vTPM results in the revert
failing and the instance going into ERROR state when storage is not
shared.

Because of the lack of test coverage of vTPM with non-shared storage,
this change also just adds a subclass to run all of the vTPM functional
tests with the test environment mocked to behave as though storage is
not shared between compute hosts.

A bug fix will follow these functional tests.

Related-Bug: #2125030

Change-Id: I49745a8ba78e1ea6a1b009bccab32a002cb6afb0
Signed-off-by: melanie witt <melwittt@gmail.com>
2025-09-23 06:02:36 +00:00
Balazs Gibizer ec426532c3 Run nova-conductor in native threading mode
Previous patches removed direct eventlet usage from nova-conductor so
now we can run it with native threading as well. This patch documents
the possibility and switches both nova-conductor process to native
threading mode in the nova-next job.

Change-Id: If26c0c7199cbda157f24b99a419697ecb6618fa6
Signed-off-by: Balazs Gibizer <gibi@redhat.com>
2025-09-22 10:17:39 +00:00
Zuul b99a882366 Merge "Add admin context when filling metadata on ImagePropsWeigher" 2025-09-19 14:17:21 +00:00
Balazs Gibizer 858494997e Centralize cooperative yield
Replace the remaining time.sleep calls use to trigger eventlet yield
to the existing nova.utils.cooperative_yield() call. This will help
us to disable such yielding in when the service is running in threading
mode and eventually drop the whole thing when nova removes eventlet.

Change-Id: I6b3fcba13f4d1c41d1fac2efe3cb4a943e66f8bb
Signed-off-by: Balazs Gibizer <gibi@redhat.com>
2025-09-19 14:35:15 +02:00
Balazs Gibizer 520057663a [nova-tox-py312-threading]Ignore failing tests
There is two intermittently failing tests we need to ignore for now so
this patch extends the list.

Closes-Bug: #2125185

Change-Id: I8d440013c84ae1dac4e2a1f661fc31138944b032
Signed-off-by: Balazs Gibizer <gibi@redhat.com>
2025-09-19 10:58:36 +02:00
Zuul f8b72e964c Merge "Regression test for ImagePropsWeigher due to missing context" 2025-09-18 16:09:25 +00:00
Sylvain Bauza dedfc305dd Add admin context when filling metadata on ImagePropsWeigher
Creating a new admin context as we can't reuse the RequestSpec user
context in order to hydrate InstanceList with the full list of instances
from the host and not only the ones from the user.

Closes-Bug: #2125052

Change-Id: Ibbd80324c17be6546ecd8b80f908ac5bbab5abd0
Signed-off-by: Sylvain Bauza <sbauza@redhat.com>
2025-09-18 17:25:10 +02:00
Sylvain Bauza 59224d1583 Regression test for ImagePropsWeigher due to missing context
Added a functional regression test that shows that a second instance
fails on a host.

Related-Bug: #2125052
Change-Id: I14c1464d638a8c0d55e6a69ec22e0b83567c1797
Signed-off-by: Sylvain Bauza <sbauza@redhat.com>
2025-09-18 16:31:22 +02:00
Zuul 1d317f043e Merge "nova-conductor puts instance in error state" 2025-09-17 22:28:17 +00:00
Zuul 4eea21199c Merge "Adds regression test for bug LP#2044235" 2025-09-17 11:00:45 +00:00
Kamil Sambor 9f58f596db Switch nova-conductor to use ThreadPoolExecutor
This is a pure refactor so not having any unit test change actually
signals that the refactor did not change the existing behavior which is
good.

The unit test run on this patch only covers the eventlet mode but higher
in the series we run unit test with native threading mode in a separate
job that will complement the coverage for this patch.

Change-Id: Iafc96c93a0d4c406b77902942b2940653441fe38
Signed-off-by: Kamil Sambor <kamil.sambor@gmail.com>
2025-09-17 11:35:35 +02:00
Rajesh Tailor ca158f2da3 Fix string format specifier
This change fixes string format specifier from $ to % for
correct formatting.

Closes-Bug: #2123840
Signed-off-by: Rajesh Tailor <ratailor@redhat.com>
Change-Id: I04f6e1ba3eff443d40a13c6fe2d0b77a78a020e6
2025-09-16 10:20:45 +05:30
Zuul 640782207c Merge "Remove eventlet timer from multi_cell_list" 2025-09-15 17:15:03 +00:00
Julien Le Jeune dc51a4271b nova-conductor puts instance in error state
Nova-conductor puts instance in error if an unknown exception is raised
in the _build_live_migrate_task during the live-migration. [1]
The exception comes from _call_livem_checks_on_host and we can see raise
exception.MigrationPreCheckError if we face to
messaging.MessagingTimeout exception for example. [2]
The function check_can_live_migrate_destination does a check also on source
host with check_can_live_migrate_source [3] and this check can also
return exceptions like MessagingTimeout and this one is not caught properly
because it's a remote "Remote error: MessagingTimeout" due to dest host try to
contact source host and this source host not reply.

[1] https://github.com/openstack/nova/blob/master/nova/conductor/manager.py#L523
[2] https://github.com/openstack/nova/blob/master/nova/conductor/tasks/live_migrate.py#L381
[3] https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L9090

Closes-Bug: #2044235
Change-Id: Ie1f96fee743c235ab35113a9ad1549a67b975839
Signed-off-by: Julien Le Jeune <julien.le-jeune@ovhcloud.com>
2025-09-15 16:41:01 +02:00
Julien Le Jeune efc8a12421 Adds regression test for bug LP#2044235
Related-bug: #2044235

Change-Id: Ic63ac71c3253fb24ffef8c954bc86fcb46e59ad7
Signed-off-by: Julien Le Jeune <julien.le-jeune@ovhcloud.com>
2025-09-15 16:36:37 +02:00
Zuul 87bf7700b8 Merge "reno: Update master for unmaintained/2023.1" 2025-09-12 10:55:00 +00:00
OpenStack Release Bot 71607ef8a5 Update master for stable/2025.2
Add file to the reno documentation build to show release notes for
stable/2025.2.

Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/2025.2.

Sem-Ver: feature
Change-Id: I7d967c1d5b1ac7fa2e601acfa25c3b5c3880056e
Signed-off-by: OpenStack Release Bot <infra-root@openstack.org>
Generated-By: openstack/project-config:roles/copy-release-tools-scripts/files/release-tools/add_release_note_page.sh
2025-09-12 08:54:07 +00:00
Zuul 759e03c35d Merge "Add Flamingo prelude section" 2025-09-11 09:03:15 +00:00
Zuul 7e1d86bdff Merge "Fix bug 2114951" 2025-09-10 14:38:54 +00:00
Zuul ee0cb67782 Merge "Update Debian qemu/libvirt/libguestfs versions" 2025-09-10 11:37:14 +00:00
Zuul 36c63f1664 Merge "hypervisors: Optimize uptime retrieval for better performance" 2025-09-10 11:36:59 +00:00
Thomas Goirand 187ffa120f Update Debian qemu/libvirt/libguestfs versions
Change-Id: I99b742bd527672cb32dd7cf8e80c20aeb8b7a5b0
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
2025-09-10 18:44:57 +09:00
René Ribaud 45ddbc2569 Add Flamingo prelude section
Shamelessly copied from the cycle highlights

Signed-off-by: René Ribaud <rribaud@redhat.com>
Change-Id: Ib9de63fe4ccce24921326ef3bcfc690fd4481687
2025-09-10 10:39:44 +02:00
Takashi Kajinami 51aceec3ab docs: Update libvirt version support matrix for Flamingo
Change-Id: I05ac8ec870e75d58095b9f34a63ce786a47c3922
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
2025-09-10 02:25:08 +09:00
Zuul 2952c10948 Merge "Fix fast8 tox target" 2025-09-08 19:43:32 +00:00
Zuul 74673be235 Merge "Update compute rpc alias for epoxy" 2025-09-08 19:17:46 +00:00
Zuul 9efabaa993 Merge "Reproducer for bug 2114951" 2025-09-08 17:48:28 +00:00
Dan Smith 2bf2814add Fix fast8 tox target
This is a major timesaver for repos the size of nova. This broke
recently due to changes in flake8 itself. This removes some
needless complexity to make it work again. It also removes the
suggestion to use pre-commit which has nothing to do with this
target, which also stings more when pre-commit is breaking things
which is why you're using this in the first place.

Change-Id: Ieb150bf0931ad8031ca83bae1f206075a9f505e2
Signed-off-by: Dan Smith <dansmith@redhat.com>
2025-09-08 09:35:47 -07:00
Sean Mooney 567dbe1867 hypervisors: Optimize uptime retrieval for better performance
The /os-hypervisors/detail API endpoint was experiencing significant
performance issues in environments with many compute nodes when using
microversion 2.88 or higher, as it made sequential RPC calls to gather
uptime information from each compute node.

This change optimizes uptime retrieval by:

* Adding uptime to periodic resource updates sent by nova-compute to the
  database, eliminating synchronous RPC calls during API requests
* Restricting RPC-based uptime retrieval to hypervisor types that support
  it (libvirt and z/VM), avoiding unnecessary calls that would always fail
* Preferring cached database uptime data over RPC calls when available

Closes-Bug: #2122036
Assisted-By: Claude <noreply@anthropic.com>
Change-Id: I5723320f578192f7e0beead7d5df5d7e47d54d2b
Co-Authored-By: Sylvain Bauza <sbauza@redhat.com>
Signed-off-by: Sean Mooney <work@seanmooney.info>
2025-09-05 19:03:38 +01:00
Zuul 0dd7cb1fb0 Merge "libvirt: Disable VMCoreInfo device for SEV-encrypted instances" 2025-09-05 16:32:24 +00:00
Zuul 7d5521ac84 Merge "[pci]Keep used dev in Placement regardless of dev_spec" 2025-09-05 15:36:30 +00:00
Zuul ab744e5040 Merge "[PCI tracker]Remove non configured devs when freed" 2025-09-05 15:36:13 +00:00
Zuul cb6ed0e3c0 Merge "Reproduce bug/2115905" 2025-09-05 14:40:57 +00:00
Balazs Gibizer 4495f1f019 [pci]Keep used dev in Placement regardless of dev_spec
This changes the PCI Placement translator edge case handling logic to
resolve a bug preventing VM deletion.

If a device is allocated but removed from the dev_spec then we need to
keep the device in Placement otherwise the Placement update will be
rejected as we are trying to delete an RP that has allocations. This
prevent the deletion of a VM that is using this removed device.

The alternative would be to not allow the nova-compute service to
start if it detects this situation. However this situation can
happen in at least two very different cases:
1. The admin removed a dev_spec. In this case adding the dev_spec back,
   removing the VM, then removing the dev_spec is the right course of
   action and nova-compute failing to start would be OK to enforce this.

2. A device disappeared as the HW is died. In this case not allowing the
   nova-compute to start up would prevent the admin to migrate the
   other VMs away from the host before doing a HW replacement.

Note that this is fairly complex change due to the fact that based on
purely the PciDevice object we cannot differentiate between the two
cases:

1. A PciDevice object is being removed as the related device spec is
   removed from the configuration or the device is disappeared from
   the hypervisor.

2. A PciDevice object was held back for a while as the device spec is
   removed (or the device disappeared from the hypervisor) while the
   device was allocated to a VM. And now that VM is undergoing deletion.

In both case the PCI in Placement logic sees a PciDevice object in
dev.status.REMOVED and dev.instance_uuid = None. However the two cases
require different handling.

1. The related inventory can be removed from Placement

2. The related inventory cannot be removed from Placement as it is still
   being allocated to the VM that is undergoing deletion.

The second case is due to the sequence of events during a VM deletion
being:
* We destroy the VM on the hypervisor
* We update the PCI tracker to free the device. As the device was held back
  the tracker not just frees the device but removes it as well as it is
  not configured any more in the dev_spec so it should not go to
  AVAILABLE state.
* When the PCI tracker is updated it calls the PCI in Placement logic
  to update Placement inventories as well. At this point the VM deletion
  still in progress and the VM's allocation hasn't been deleted in
  Placement, so the Placement inventory cannot be removed as it is still
  allocated.
* After the resource tracker update is finished the compute manager
  deletes the VM's allocation in Placement.

So in this edge case we temporarily keep the Placement inventory and
only remove that in a subsequent periodic run where we are sure the
VM's allocation is gone. This means there is a time window when
the Placement inventory shows an extra resource even though that
resource has already been removed from the PCI tracker. During this
window the scheduler might select a host based on this ghost inventory
and the compute resource tracker will reject the boot request forcing
a normal re-schedule.

Closes-Bug: #2115905
Change-Id: Ie9d311ea9f59ff49593003e3773b690dd36fdeb2
Signed-off-by: Balazs Gibizer <gibi@redhat.com>
2025-09-04 10:05:20 +00:00
Balazs Gibizer f37cdf0c41 [PCI tracker]Remove non configured devs when freed
The PCI tracker handles the case when a device spec is removed from
the configuration while a device is still being allocated. It keeps the
device until the VM is deleted to avoid inconsistencies.

However the full removal of such a device needs not just the VM deletion,
but also a nova-compute restart. The device tracker just frees the
device during VM deletion but does not removed them until the next
nova-compute startup. This allows the device to be re-allocated by
another VM even though the device is not allowed by a device_spec.

This change adds yet another in memory dict to the pci tracker to track
these devices that are only kept until they are freed. Then during
free() this list is consulted and if the device is in the list then the
device is marked for removal as well.

This kills two birds with one stone:

* We prevent the re-allocation of the device as the state of the device
  will be set to REMOVED not AVAILABLE during VM deletion.

* As PCI in Placement relies on the state of the device to decide what
  to track in placement, this change makes sure that a device that
  needs to be removed, is now removed from placement too. Note that we have
  another bug that prevents this removal for now. But at least the
  reproducers of that bug now starts to behave the same regardless of
  how many device belongs to the same RP in placement.

Related-Bug: #2115905
Change-Id: I63c8fb2669a3c6b3adb77d210c0f9b39d3657c80
Signed-off-by: Balazs Gibizer <gibi@redhat.com>
2025-09-04 10:30:19 +02:00
Balazs Gibizer d86aa2d15a Reproduce bug/2115905
Both the PCI tracker and the PCI in Placement logic handles the case
when a device spec is removed from the configuration while a device
is still being allocated.

However there are edge cases in PCI in Placement that it not handled
well. Namely that if the VM with this allocation is deleted, then
depending on the amount of VFs the PF had originally, the logic might
try to delete the RP before the allocation is removed. That is
rejected by Placement. This prevent the deletion of such a VM and
therefore prevents one of the ways the original inconsistency can be

Note that with this patch we see two additional behaviors worth
mentioning:
* When the VM is successfully deleted (in a single VF or PF case) the
  PCI tracker still keeps the now free device in the DB and therefore PCI
  in Placement also keeps the RP. This keeps the non whitelisted device
  available for allocations until the next nova-compute restart.

* The PCI in Placement logic is different between the case where
  the last device is removed from an RP and the case where there
  are other devices on the RP, some that can be removed and some that
  cannot due to allocation.

Related-Bug: #2115905
Change-Id: Ib3febb77299da65ada24ed49849c04cbf3c41af1
Signed-off-by: Balazs Gibizer <gibi@redhat.com>
2025-09-04 10:19:31 +02:00
Zuul 9f156aa954 Merge "Fix 'nova-manage image_property set' command" 2025-09-03 17:29:24 +00:00