Commit Graph

62070 Commits

Author SHA1 Message Date
Ghanshyam Maan 48bdfc8b2f Fix the negative sleep value in graceful_shutdown()
This fixes the following comment to avoid having the
negative sleep value in manager graceful_shutdown()

- https://review.opendev.org/c/openstack/nova/+/975586/comment/d5e3a603_0c746704/

Change-Id: I07a994bd05ac1e7f734f2a2144327bd2559c1416
Signed-off-by: Ghanshyam Maan <gmaan.os14@gmail.com>
2026-02-23 18:51:43 +00:00
Zuul 64d6ba34c5 Merge "Fix the flasky test test_submit_second_while_delaying_first" 2026-02-19 20:21:47 +00:00
Ghanshyam Maan 2fb9113ed2 Add manager graceful shutdown, timeout, and wait
As per the part1 of the graceful shutdown timeouts[1], this commit
add/modifies the below timeout/wait needed for graceful shutdown:

- Override the default of the graceful_shutdown_timeout to 180.
- Add a new config option for manager shutdown timeout.

It also adds a graceful_shutdown() method on the manager side, which
will be called by the nova/service.py->stop() method before it stops
the 2nd RPC server. In part1, this will wait for the configurable wait
time, but part2 will implement a better solution to track the
in-progress tasks. The idea is to have this single interface from the
service manager (graceful_shutdown()) that will be called during
graceful shutdown and is responsible for finishing the required tasks
and cleanup.

Partial implement blueprint nova-services-graceful-shutdown-part1

[1] https://specs.openstack.org/openstack/nova-specs/specs/2026.1/approved/nova-services-graceful-shutdown-part1.html#graceful-shutdown-timeouts

Change-Id: I7c1934d3ec7854feac3fc8432627c25eba963ddf
Signed-off-by: Ghanshyam Maan <gmaan.os14@gmail.com>
2026-02-19 19:44:41 +00:00
Ghanshyam Maan b5c0a97582 Fix the flasky test test_submit_second_while_delaying_first
This test failied a few times and most recent faaailure is
- https://review.opendev.org/c/openstack/nova/+/975586 (PS8 run)

Traceback (most recent call last):
  File "/home/zuul/src/opendev.org/openstack/nova/nova/tests/unit/test_utils.py", line 2185, in test_submit_second_while_delaying_first
    self.assertGreater(task2_runtime, 2.0)
  File "/usr/lib/python3.12/unittest/case.py", line 1269, in assertGreater
    self.fail(self._formatMessage(msg, standardMsg))
  File "/usr/lib/python3.12/unittest/case.py", line 715, in fail
    raise self.failureException(msg)
AssertionError: 1.997275639999998 not greater than 2.0

From error, it seems we are capturing the start time after we
submit the task to executor who will count the task submit time
little ahead of test captured the task start time.

let's capture the task start time before task is submitted so that
we can caompare the time in more correct way.

Change-Id: I5a9845813b614c58e0f5a66e07f8a8c732f38eb3
Signed-off-by: Ghanshyam Maan <gmaan.os14@gmail.com>
2026-02-19 19:06:07 +00:00
Zuul 7a303bc1e2 Merge "Add 2nd RPC server for compute service" 2026-02-19 17:36:34 +00:00
Zuul 7dd05b7af6 Merge "FairLockGuard: Support cross-thread sharing and nesting" 2026-02-18 17:54:57 +00:00
Zuul 0d074c6775 Merge "Make nova recognize amx-capabilities" 2026-02-18 17:22:49 +00:00
Zuul 716fa5edb9 Merge "Fix injection of [cors] allowed_origin" 2026-02-18 15:21:32 +00:00
Kamil Sambor 29e1dc8b43 Rename deadline parameter to more accurate timeout
Change-Id: If57fb3ada65b658bd4b5cca62ec22485f431d2a4
Signed-off-by: Kamil Sambor <kamil.sambor@gmail.com>
2026-02-18 14:57:04 +00:00
Zuul 00f554fd92 Merge "Fix full executor warning on noname executor" 2026-02-18 14:37:19 +00:00
Zuul 88adf2c870 Merge "Cleanup libvirt driver at service stop" 2026-02-18 14:37:06 +00:00
Johannes Kulik 2a51df2760 Attaching a volume returns HTTP 202
Instead of returning an HTTP 200 and a `volumeAttachment` object,
attaching a volume to an instance returns HTTP 202 starting from API
version 2.101.

To keep the functionality for older API versions, we move the
`_attach_volume()` method from n-api to n-conductor and either do a call
or a cast depending on whether the API needs to return a value.
n-conductor then handles reserving the block_device_mapping's device by
calling n-compute before it starts the previously-already-async volume
attachment.

We have to move `_check_attach_and_reserve_volume` into compute utils,
because it's getting called in both conductor and compute api (for the
shelved offloaded attach).

The new RPC method in the conductor needs a long timeout when used with
API versions less than the new 2.101, because it waits for the call to
`reserve_block_device_name()` in nova-compute which already needs a long
timeout.

Updating the functional tests' `post_server_volume()` and
`_attach_volume()` to not return the attachment anymore is possible, as
no test uses the returned values.

Change-Id: I4d38c2679f0e88cca30055a9c8c45ba1dd6fb5ef
Implements: blueprint async-volume-attachments
Signed-off-by: Johannes Kulik <johannes.kulik@sap.com>
2026-02-18 15:02:21 +01:00
Zuul 3b23957f77 Merge "Remove spawn_after" 2026-02-18 14:01:19 +00:00
Takashi Kajinami e212a1e744 libvirt: Add capability to load smm feature from existing xml
Some firmwares require smm feature. While the feature doesn't have to
be explicitly enabled when auto-selection is enabled, it should be
enabled explicitly when firmware files are pre-defined.

Partially-Implements: blueprint libvirt-firmware-auto-selection
Change-Id: Ia194dcfacd2b743761e720d947a6807689a96da3
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
2026-02-18 22:44:12 +09:00
Takashi Kajinami 3136d594a4 libvirt: Add capability to load loader and nvram from xml
... so that we can load these from existing guest XML.

This is a preparation work to use firmware auto-selection by libvirt,
and is required to avoid re-selection during hard-reboot.

Partially-Implements: blueprint libvirt-firmware-auto-selection
Change-Id: I899cb7d6ee364def8d1298b77c24cc5156c71126
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
2026-02-18 22:44:10 +09:00
Takashi Kajinami 511518e493 libvirt: Add basic xml generation for firmware auto selection
Extend the existing (but unused) guest xml generation logic for
firmware detection, by adding the firmware features flags to require
secure boot support.

Partially-Implements: blueprint libvirt-firmware-auto-selection
Change-Id: I907c9c88f370a52b54b98e1e1cbda6c21d2bff62
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
2026-02-18 22:43:06 +09:00
Takashi Kajinami 0c939329c5 libvirt: Extend functional test coverage of UEFI boot guests
Adds non-secure boot scenario and stateless firmware scenario to
demonstrate how guest xml contents look like when firmware files are
selected by libvirt.

Partially-Implements: blueprint libvirt-firmware-auto-selection
Change-Id: I88f0b81c8455630145efca8c6349fc00a0c29835
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
2026-02-18 22:43:06 +09:00
Takashi Kajinami 97beb983e6 Fix injection of [cors] allowed_origin
Using a string value was deprecated in oslo.middleware 3.15.0[1] which
was released 9 years age. The value of this option has been treated as
a list value since then.

[1] 7e519d008f7743d75ec299095060a70d5fd00f99

The latest oslo.middelware release removed the deprecated handling.

Change-Id: Ib88c046af14f5d5de0d410a35a702b7a2322c832
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
2026-02-18 22:08:54 +09:00
Sean Mooney 72132f89ee FairLockGuard: Support cross-thread sharing and nesting
This change improves FairLockGuard to properly support two previously
unsupported (or broken) usage patterns:

1. Cross-thread sharing: When threads share the same FairLockGuard
   instance, they now correctly wait for each other instead of raising
   TypeError. Fixed by:
   - Adding _active_thread tracking to identify the owning thread
   - Restructuring lock acquisition order: named locks are now acquired
     OUTSIDE of locks_lock to prevent deadlock when Thread-B waits on
     locks held by Thread-A
   - Only same-thread re-entry triggers the nesting logic, not
     cross-thread access

2. Same-thread nesting: The same FairLockGuard instance can now be
   nested within itself. Fixed by:
   - Adding _nesting_depth counter initialized to 0
   - Nested entries increment depth and return early (locks held)
   - Exits decrement depth; locks only released when depth reaches 0
   - This prevents lock leaks that would occur if inner exit cleared
     self.locks before outer exit could release them

Additional improvements:
- Exception handling during partial lock acquisition now properly
  releases any locks acquired before the failure
- Lock release moved outside locks_lock in __exit__ for consistency

The docstring has been updated to reflect that both patterns now work,
while continuing to discourage them in favor of creating separate
FairLockGuard instances for clarity.

New tests added:
- test_deep_nesting: Verifies 3+ levels of nesting
- test_nested_exception_outer_still_holds_locks: Verifies outer context
  retains locks when inner context raises an exception
- test_empty_lock_list: Verifies empty lock list edge case

Related-Bug: #2048837
Generated-By: claude-code opus 4.5
Change-Id: Ia937b0e2d76c814360f168d5f33b821bfc61aade
Signed-off-by: Sean Mooney <work@seanmooney.info>
2026-02-18 12:51:51 +00:00
Zuul 0383085510 Merge "Preserve UEFI NVRAM variable store" 2026-02-18 06:36:45 +00:00
Zuul c7e40ef573 Merge "Make disk.extend() pass format to qemu-img" 2026-02-18 01:20:15 +00:00
Sean Mooney 6c9110bb8b Handle missing libvirt services in evacuate hook
On Debian 13 (Trixie), libvirt packaging is modularized and
the libvirt-daemon-lock package (providing virtlockd) is
optional. The evacuate hook previously assumed all libvirt
services were installed and failed when stopping/starting
missing units.

Extract a reusable manage_libvirt_service.yaml task file that
checks if a service exists via systemctl list-unit-files
before managing its units. This prevents failures when
optional libvirt packages are not installed and future-proofs
against further packaging changes.

Generated-By: claude-code
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Change-Id: Ie84e2e8ab2d3065b1562ee5e256fa163541955f7
Signed-off-by: Sean Mooney <work@seanmooney.info>
2026-02-17 18:22:22 +00:00
Dan Smith 3eba22ff09 Make disk.extend() pass format to qemu-img
This fixes an instance of us passing a disk image to qemu-img for
resize where we don't constrain the format. As has previously been
identified, it is never safe to do that when the image itself is not
trusted. In this case, an instance with a previously-raw disk image
being used by imagebackend.Flat is susceptible to the user writing a
qcow2 (or other) header to their disk causing the unconstrained
qemu-img resize operation to interpret it as a qcow2 file.

Since Flat maintains the intended disk format in the disk.info file,
and since we would have safety-checked images we got from glance,
we should be able to trust the image.format specifier, which comes
from driver_format in imagebackend, which is read from disk.info.
Since only raw or qcow2 files should be resized anyway, we can further
constrain it to those.

Notes:
 1. qemu-img refuses to resize some types of VMDK files, but it may
    be able to resize others (there are many subformats). Technically,
    Flat will allow running an instance directly from a VMDK file,
    and so this change _could_ be limiting existing "unintentionally
    works" behavior.
 2. This assumes that disk.info is correct, present, etc. The code to
    handle disk.info will regenerate the file if it's missing or
    unreadable by probing the image without a safety check, which
    would be unsafe. However, that is a much more sophisticated attack,
    requiring either access to the system to delete the file or an
    errant operator action in the first place.

Change-Id: I07cbe90b7a7a0a416ef13fbc3a1b7e2272c90951
Closes-Bug: #2137507
Signed-off-by: Dan Smith <dansmith@redhat.com>
2026-02-17 06:35:35 -08:00
Kamil Sambor aa9ec17b72 Destroy scatter_gather in conductor
Ensure destroy_scatter_gather_executor() is invoked during conductor
startup to prevent reuse of a pre-fork scatter_gather executor.

Change-Id: I62a01f51877001f19605762a1b8a09913b441dd2
Signed-off-by: Kamil Sambor <kamil.sambor@gmail.com>
2026-02-17 10:52:10 +01:00
Zuul f3d9188a93 Merge "TPM: support live migration of host secret security" 2026-02-16 18:30:56 +00:00
Zuul c804c3ddb8 Merge "TPM: prepare to bump service version for live migration" 2026-02-16 18:30:42 +00:00
Zuul 2b30cafe98 Merge "Add vtpm_secret_(uuid|value) to LibvirtLiveMigrateData" 2026-02-16 18:28:29 +00:00
Nicolai Ruckel 35b1945522 Preserve UEFI NVRAM variable store
Preserve NVRAM variable store during stop/start, hard reboot, live
migration, and volume retype.

This does not affect cold migration or shelve.

For UEFI guests (hw_firmware_type=uefi), every time the instance is
started, the UEFI variable storage for that instance
(/var/lib/libvirt/qemu/nvram/instance-xxxxxxxx_VARS.fd) is deleted
and reinitialized from the default template.

The changes are based on this patch by Jonas Schäfer to preserve the
vTPM state:
https://review.opendev.org/c/openstack/nova/+/955657

Closes-Bug: #1633447
Closes-Bug: #2131730
Change-Id: I444a9285c07a04bf08a73772235f8dd73d75e513
Signed-off-by: Nicolai Ruckel <nicolai.ruckel@cloudandheat.com>
2026-02-13 23:55:41 +01:00
lajoskatona 873aee5e95 Fix for bug 2140537
If a guest has pinned CPUs the domain XML's
<iothreadpin> should have iothread attribute also.

Closes-Bug: #2140537
Change-Id: I5c2df747a3fdfbd2ee31d50a3d716a0ccc787e15
Signed-off-by: lajoskatona <lajos.katona@est.tech>
2026-02-13 17:17:17 +00:00
Stephen Finucane 6bc431bc52 tests: Invert validation check
Now that all of our controllers have full schema coverage, we can now
assume that all controllers are validated and raise if that's not the
case.

Change-Id: I3a58be8551e7cf13835ad565aae4fc9dc4214bbd
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
2026-02-13 16:51:40 +00:00
Stephen Finucane dab02447e6 api: Add response body schemas for server shares APIs
We had missed one.

Change-Id: Icc63959d73b1881b7db19b93cf8fb80dcb77cad8
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
2026-02-13 16:51:40 +00:00
Stephen Finucane f80e4935e8 api: Add response body schemas for servers APIs (6/6)
The last one: delete. Very simple, as always.

Change-Id: I08a2dbcd86cf652e9cda193f64edfa655f986506
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
2026-02-13 16:51:40 +00:00
Stephen Finucane 9fd431315c api: Add response body schemas for servers APIs (5/6)
The penultimate API: the update view. This is very similar to the
rebuild API so we are able to reuse much of that schema here.

We also move some code outside an try-except as the code in question
can't raise an InstanceNotFound exception.

Change-Id: I0e42de5074dcf699886b20dfd43306683e381ee2
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
2026-02-13 16:51:34 +00:00
Stephen Finucane c9be8b9aba tests: Fix bound
Ensure we do not negative values except for -1 (unlimited).

Change-Id: I9a0184ed54054c6466833df24dfbe9ca7d1b454b
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
2026-02-13 16:17:31 +00:00
Eigil Obrestad bfaec08220 Make nova recognize amx-capabilities
Expands the CPU_TRAITS_MAPPING table to let nova report if a compute-node
supports AMX. This enables nova to pick the correct cpu_model when a
SapphireRapids (or newer) cpu is wanted by the flavor.

Implements: blueprint add-amx-traits
Change-Id: Ieaa2e1be9d3d3ae945ce28d778edc9729d2db9ba
Signed-off-by: Eigil Obrestad <eigil-git@obrestad.org>
Depends-On: https://review.opendev.org/c/openstack/requirements/+/976640
2026-02-12 20:51:02 +01:00
Zuul 4fec7fe09d Merge "Revert "Set openstacksdk-functional-devstack non voting"" 2026-02-11 20:09:06 +00:00
Zuul 420b02d6be Merge "Add regression test to repoduce bug 2140537" 2026-02-11 11:35:14 +00:00
lajoskatona 76d796193c Add regression test to repoduce bug 2140537
Related-Bug: #2140537
Change-Id: I8c7cf544d599d5a11a2ae898822c2bde36f1d52a
Signed-off-by: lajoskatona <lajos.katona@est.tech>
2026-02-09 13:11:53 +01:00
Balazs Gibizer 4227c9b14a Revert "Set openstacksdk-functional-devstack non voting"
This reverts commit 2f9f780a77.

Signed-off-by: Balazs Gibizer <gibi@redhat.com>
Change-Id: Ia3d01ba6da0ade10ad70de951cbcb72204fbce12
2026-02-09 10:19:56 +01:00
Balazs Gibizer 2f9f780a77 Set openstacksdk-functional-devstack non voting
There is neutron issue in the job but it's fix is being blocked by
multiple other issues in the sdk's gate. Let's keep our gate operational
until they fix the sdk gate.

[1] https://review.opendev.org/c/openstack/openstacksdk/+/976008
[2] https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/message/5HHEYPZA6VIORX2XLBZGNMM2EVX2LR65/

Signed-off-by: Balazs Gibizer <gibi@redhat.com>
Change-Id: Ie2fe2ec18a0fe7dbbfe4fbb9094d9542c729122a
2026-02-09 10:16:10 +01:00
Sean Mooney 264e868d49 Support os-vif TAP pre-creation for OVS/OVN ports
Add support for os-vif TAP device pre-creation when Neutron sets
the 'ovs_create_tap' flag in vif_details. This reduces live
migration downtime by ensuring the network is fully wired before
the VM starts.

Changes:
- Add VIF_DETAILS_OVS_CREATE_TAP constant to model.py
- Propagate create_tap from binding details to os-vif port profile
  in os_vif_util.py
- Set managed='no' in libvirt XML when create_tap is enabled so
  libvirt uses the pre-created TAP device
- Set multiqueue on port profile in _plug_os_vif based on instance
  flavor/image hw:vif_multiqueue_enabled property

When checking oslo.versionedobjects fields for backward compat:
- Use 'field in obj.fields' to check if field exists in schema
- Use 'field in obj' to check if field value is set

Depends-On: https://review.opendev.org/c/openstack/os-vif/+/971231
Generated-By: Cursor claude-opus-4.5
Closes-Bug: #2069718
Change-Id:  I32343658b53e317696d1bd8b984793bfeeccd409
Signed-off-by: Sean Mooney <work@seanmooney.info>
2026-02-05 18:55:06 +00:00
Zuul a17b44f3eb Merge "Use an executor to delay STOPPED events" 2026-02-05 17:38:28 +00:00
Zuul 75aed9a19d Merge "Live migration with iothreads" 2026-02-05 10:56:23 +00:00
Zuul c94d2eaedb Merge "Enable mypy on nova/utils.py" 2026-02-05 03:43:47 +00:00
Zuul 6b0bb735a6 Merge "SubclassSignatureTestCase to use NoDBTestCase as base" 2026-02-05 03:14:21 +00:00
Zuul 6a6e05d4d3 Merge "Libvirt event handling without eventlet" 2026-02-05 03:14:05 +00:00
Artom Lifshitz 3eae9477d2 TPM: support live migration of host secret security
This enables live migration for TPM instances with the ``host`` secret
security mode. The ``host`` security mode uses key manager service
secrets owned by the instance owner. The secret is persisted in
Libvirt and is sent over RPC to the destination during a live
migration.

The service version will be bumped in a separate patch.

Related to blueprint vtpm-live-migration

Change-Id: I97e9dd454c793abcb1a20579b1ceaec627be4813
Signed-off-by: melanie witt <melwittt@gmail.com>
2026-02-04 16:52:06 -08:00
melanie witt 2bdf12535c TPM: prepare to bump service version for live migration
This prepares for a service version bump and adds a minimum service
version check in the API to reject live migration requests for vTPM
instances until the entire cloud is upgraded to the new version.

The actual service version bump will be included in a later patch that
implements vTPM live migration.

Related to blueprint vtpm-live-migration

Change-Id: I7daef8037385a4077dc0a78f03ae4b34a57560b7
Signed-off-by: melanie witt <melwittt@gmail.com>
2026-02-04 15:49:06 -08:00
Balazs Gibizer 8b14a16c57 Fix full executor warning on noname executor
The warning log assumed all executors has a name. Our centrally managed
executors has but not the adhoc ones causing a stack trace in the
compute manager power_sync periodics.

Change-Id: I04620364439a6c377f5b8f8f68cbdd3c62c44562
Signed-off-by: Balazs Gibizer <gibi@redhat.com>
2026-02-04 12:15:19 +01:00
Balazs Gibizer 8017b721fd Cleanup libvirt driver at service stop
As libvirt driver's Host object has a new headless thread we need to
make sure that thread is exiting cleanly when nova-compute is being
stopped.

Also at the same time we make sure our unit tests are not leaking such
thread across test cases with a new fixture and fixes in the test code.

Change-Id: Ide274d6caa3314f9d25d51d1f72850cf77c9dee4
Signed-off-by: Balazs Gibizer <gibi@redhat.com>
2026-02-04 12:15:19 +01:00