womax/nova - nova - Gitea: Git with a cup of tea

womax/nova

Author	SHA1	Message	Date
Zuul	64d6ba34c5	Merge "Fix the flasky test test_submit_second_while_delaying_first"	2026-02-19 20:21:47 +00:00
Ghanshyam Maan	2fb9113ed2	Add manager graceful shutdown, timeout, and wait As per the part1 of the graceful shutdown timeouts[1], this commit add/modifies the below timeout/wait needed for graceful shutdown: - Override the default of the graceful_shutdown_timeout to 180. - Add a new config option for manager shutdown timeout. It also adds a graceful_shutdown() method on the manager side, which will be called by the nova/service.py->stop() method before it stops the 2nd RPC server. In part1, this will wait for the configurable wait time, but part2 will implement a better solution to track the in-progress tasks. The idea is to have this single interface from the service manager (graceful_shutdown()) that will be called during graceful shutdown and is responsible for finishing the required tasks and cleanup. Partial implement blueprint nova-services-graceful-shutdown-part1 [1] https://specs.openstack.org/openstack/nova-specs/specs/2026.1/approved/nova-services-graceful-shutdown-part1.html#graceful-shutdown-timeouts Change-Id: I7c1934d3ec7854feac3fc8432627c25eba963ddf Signed-off-by: Ghanshyam Maan <gmaan.os14@gmail.com>	2026-02-19 19:44:41 +00:00
Ghanshyam Maan	b5c0a97582	Fix the flasky test test_submit_second_while_delaying_first This test failied a few times and most recent faaailure is - https://review.opendev.org/c/openstack/nova/+/975586 (PS8 run) Traceback (most recent call last): File "/home/zuul/src/opendev.org/openstack/nova/nova/tests/unit/test_utils.py", line 2185, in test_submit_second_while_delaying_first self.assertGreater(task2_runtime, 2.0) File "/usr/lib/python3.12/unittest/case.py", line 1269, in assertGreater self.fail(self._formatMessage(msg, standardMsg)) File "/usr/lib/python3.12/unittest/case.py", line 715, in fail raise self.failureException(msg) AssertionError: 1.997275639999998 not greater than 2.0 From error, it seems we are capturing the start time after we submit the task to executor who will count the task submit time little ahead of test captured the task start time. let's capture the task start time before task is submitted so that we can caompare the time in more correct way. Change-Id: I5a9845813b614c58e0f5a66e07f8a8c732f38eb3 Signed-off-by: Ghanshyam Maan <gmaan.os14@gmail.com>	2026-02-19 19:06:07 +00:00
Zuul	7a303bc1e2	Merge "Add 2nd RPC server for compute service"	2026-02-19 17:36:34 +00:00
Zuul	7dd05b7af6	Merge "FairLockGuard: Support cross-thread sharing and nesting"	2026-02-18 17:54:57 +00:00
Zuul	0d074c6775	Merge "Make nova recognize amx-capabilities"	2026-02-18 17:22:49 +00:00
Zuul	716fa5edb9	Merge "Fix injection of [cors] allowed_origin"	2026-02-18 15:21:32 +00:00
Zuul	00f554fd92	Merge "Fix full executor warning on noname executor"	2026-02-18 14:37:19 +00:00
Zuul	88adf2c870	Merge "Cleanup libvirt driver at service stop"	2026-02-18 14:37:06 +00:00
Johannes Kulik	2a51df2760	Attaching a volume returns HTTP 202 Instead of returning an HTTP 200 and a `volumeAttachment` object, attaching a volume to an instance returns HTTP 202 starting from API version 2.101. To keep the functionality for older API versions, we move the `_attach_volume()` method from n-api to n-conductor and either do a call or a cast depending on whether the API needs to return a value. n-conductor then handles reserving the block_device_mapping's device by calling n-compute before it starts the previously-already-async volume attachment. We have to move `_check_attach_and_reserve_volume` into compute utils, because it's getting called in both conductor and compute api (for the shelved offloaded attach). The new RPC method in the conductor needs a long timeout when used with API versions less than the new 2.101, because it waits for the call to `reserve_block_device_name()` in nova-compute which already needs a long timeout. Updating the functional tests' `post_server_volume()` and `_attach_volume()` to not return the attachment anymore is possible, as no test uses the returned values. Change-Id: I4d38c2679f0e88cca30055a9c8c45ba1dd6fb5ef Implements: blueprint async-volume-attachments Signed-off-by: Johannes Kulik <johannes.kulik@sap.com>	2026-02-18 15:02:21 +01:00
Zuul	3b23957f77	Merge "Remove spawn_after"	2026-02-18 14:01:19 +00:00
Takashi Kajinami	e212a1e744	libvirt: Add capability to load smm feature from existing xml Some firmwares require smm feature. While the feature doesn't have to be explicitly enabled when auto-selection is enabled, it should be enabled explicitly when firmware files are pre-defined. Partially-Implements: blueprint libvirt-firmware-auto-selection Change-Id: Ia194dcfacd2b743761e720d947a6807689a96da3 Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>	2026-02-18 22:44:12 +09:00
Takashi Kajinami	3136d594a4	libvirt: Add capability to load loader and nvram from xml ... so that we can load these from existing guest XML. This is a preparation work to use firmware auto-selection by libvirt, and is required to avoid re-selection during hard-reboot. Partially-Implements: blueprint libvirt-firmware-auto-selection Change-Id: I899cb7d6ee364def8d1298b77c24cc5156c71126 Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>	2026-02-18 22:44:10 +09:00
Takashi Kajinami	511518e493	libvirt: Add basic xml generation for firmware auto selection Extend the existing (but unused) guest xml generation logic for firmware detection, by adding the firmware features flags to require secure boot support. Partially-Implements: blueprint libvirt-firmware-auto-selection Change-Id: I907c9c88f370a52b54b98e1e1cbda6c21d2bff62 Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>	2026-02-18 22:43:06 +09:00
Takashi Kajinami	0c939329c5	libvirt: Extend functional test coverage of UEFI boot guests Adds non-secure boot scenario and stateless firmware scenario to demonstrate how guest xml contents look like when firmware files are selected by libvirt. Partially-Implements: blueprint libvirt-firmware-auto-selection Change-Id: I88f0b81c8455630145efca8c6349fc00a0c29835 Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>	2026-02-18 22:43:06 +09:00
Takashi Kajinami	97beb983e6	Fix injection of [cors] allowed_origin Using a string value was deprecated in oslo.middleware 3.15.0[1] which was released 9 years age. The value of this option has been treated as a list value since then. [1] 7e519d008f7743d75ec299095060a70d5fd00f99 The latest oslo.middelware release removed the deprecated handling. Change-Id: Ib88c046af14f5d5de0d410a35a702b7a2322c832 Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>	2026-02-18 22:08:54 +09:00
Sean Mooney	72132f89ee	FairLockGuard: Support cross-thread sharing and nesting This change improves FairLockGuard to properly support two previously unsupported (or broken) usage patterns: 1. Cross-thread sharing: When threads share the same FairLockGuard instance, they now correctly wait for each other instead of raising TypeError. Fixed by: - Adding _active_thread tracking to identify the owning thread - Restructuring lock acquisition order: named locks are now acquired OUTSIDE of locks_lock to prevent deadlock when Thread-B waits on locks held by Thread-A - Only same-thread re-entry triggers the nesting logic, not cross-thread access 2. Same-thread nesting: The same FairLockGuard instance can now be nested within itself. Fixed by: - Adding _nesting_depth counter initialized to 0 - Nested entries increment depth and return early (locks held) - Exits decrement depth; locks only released when depth reaches 0 - This prevents lock leaks that would occur if inner exit cleared self.locks before outer exit could release them Additional improvements: - Exception handling during partial lock acquisition now properly releases any locks acquired before the failure - Lock release moved outside locks_lock in __exit__ for consistency The docstring has been updated to reflect that both patterns now work, while continuing to discourage them in favor of creating separate FairLockGuard instances for clarity. New tests added: - test_deep_nesting: Verifies 3+ levels of nesting - test_nested_exception_outer_still_holds_locks: Verifies outer context retains locks when inner context raises an exception - test_empty_lock_list: Verifies empty lock list edge case Related-Bug: #2048837 Generated-By: claude-code opus 4.5 Change-Id: Ia937b0e2d76c814360f168d5f33b821bfc61aade Signed-off-by: Sean Mooney <work@seanmooney.info>	2026-02-18 12:51:51 +00:00
Zuul	0383085510	Merge "Preserve UEFI NVRAM variable store"	2026-02-18 06:36:45 +00:00
Zuul	c7e40ef573	Merge "Make disk.extend() pass format to qemu-img"	2026-02-18 01:20:15 +00:00
Sean Mooney	6c9110bb8b	Handle missing libvirt services in evacuate hook On Debian 13 (Trixie), libvirt packaging is modularized and the libvirt-daemon-lock package (providing virtlockd) is optional. The evacuate hook previously assumed all libvirt services were installed and failed when stopping/starting missing units. Extract a reusable manage_libvirt_service.yaml task file that checks if a service exists via systemctl list-unit-files before managing its units. This prevents failures when optional libvirt packages are not installed and future-proofs against further packaging changes. Generated-By: claude-code Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Change-Id: Ie84e2e8ab2d3065b1562ee5e256fa163541955f7 Signed-off-by: Sean Mooney <work@seanmooney.info>	2026-02-17 18:22:22 +00:00
Dan Smith	3eba22ff09	Make disk.extend() pass format to qemu-img This fixes an instance of us passing a disk image to qemu-img for resize where we don't constrain the format. As has previously been identified, it is never safe to do that when the image itself is not trusted. In this case, an instance with a previously-raw disk image being used by imagebackend.Flat is susceptible to the user writing a qcow2 (or other) header to their disk causing the unconstrained qemu-img resize operation to interpret it as a qcow2 file. Since Flat maintains the intended disk format in the disk.info file, and since we would have safety-checked images we got from glance, we should be able to trust the image.format specifier, which comes from driver_format in imagebackend, which is read from disk.info. Since only raw or qcow2 files should be resized anyway, we can further constrain it to those. Notes: 1. qemu-img refuses to resize some types of VMDK files, but it may be able to resize others (there are many subformats). Technically, Flat will allow running an instance directly from a VMDK file, and so this change _could_ be limiting existing "unintentionally works" behavior. 2. This assumes that disk.info is correct, present, etc. The code to handle disk.info will regenerate the file if it's missing or unreadable by probing the image without a safety check, which would be unsafe. However, that is a much more sophisticated attack, requiring either access to the system to delete the file or an errant operator action in the first place. Change-Id: I07cbe90b7a7a0a416ef13fbc3a1b7e2272c90951 Closes-Bug: #2137507 Signed-off-by: Dan Smith <dansmith@redhat.com>	2026-02-17 06:35:35 -08:00
Zuul	f3d9188a93	Merge "TPM: support live migration of `host` secret security"	2026-02-16 18:30:56 +00:00
Zuul	c804c3ddb8	Merge "TPM: prepare to bump service version for live migration"	2026-02-16 18:30:42 +00:00
Zuul	2b30cafe98	Merge "Add vtpm_secret_(uuid\|value) to LibvirtLiveMigrateData"	2026-02-16 18:28:29 +00:00
Nicolai Ruckel	35b1945522	Preserve UEFI NVRAM variable store Preserve NVRAM variable store during stop/start, hard reboot, live migration, and volume retype. This does not affect cold migration or shelve. For UEFI guests (hw_firmware_type=uefi), every time the instance is started, the UEFI variable storage for that instance (/var/lib/libvirt/qemu/nvram/instance-xxxxxxxx_VARS.fd) is deleted and reinitialized from the default template. The changes are based on this patch by Jonas Schäfer to preserve the vTPM state: https://review.opendev.org/c/openstack/nova/+/955657 Closes-Bug: #1633447 Closes-Bug: #2131730 Change-Id: I444a9285c07a04bf08a73772235f8dd73d75e513 Signed-off-by: Nicolai Ruckel <nicolai.ruckel@cloudandheat.com>	2026-02-13 23:55:41 +01:00
lajoskatona	873aee5e95	Fix for bug 2140537 If a guest has pinned CPUs the domain XML's <iothreadpin> should have iothread attribute also. Closes-Bug: #2140537 Change-Id: I5c2df747a3fdfbd2ee31d50a3d716a0ccc787e15 Signed-off-by: lajoskatona <lajos.katona@est.tech>	2026-02-13 17:17:17 +00:00
Stephen Finucane	6bc431bc52	tests: Invert validation check Now that all of our controllers have full schema coverage, we can now assume that all controllers are validated and raise if that's not the case. Change-Id: I3a58be8551e7cf13835ad565aae4fc9dc4214bbd Signed-off-by: Stephen Finucane <stephenfin@redhat.com>	2026-02-13 16:51:40 +00:00
Stephen Finucane	dab02447e6	api: Add response body schemas for server shares APIs We had missed one. Change-Id: Icc63959d73b1881b7db19b93cf8fb80dcb77cad8 Signed-off-by: Stephen Finucane <stephenfin@redhat.com>	2026-02-13 16:51:40 +00:00
Stephen Finucane	f80e4935e8	api: Add response body schemas for servers APIs (6/6) The last one: delete. Very simple, as always. Change-Id: I08a2dbcd86cf652e9cda193f64edfa655f986506 Signed-off-by: Stephen Finucane <stephenfin@redhat.com>	2026-02-13 16:51:40 +00:00
Stephen Finucane	9fd431315c	api: Add response body schemas for servers APIs (5/6) The penultimate API: the update view. This is very similar to the rebuild API so we are able to reuse much of that schema here. We also move some code outside an try-except as the code in question can't raise an InstanceNotFound exception. Change-Id: I0e42de5074dcf699886b20dfd43306683e381ee2 Signed-off-by: Stephen Finucane <stephenfin@redhat.com>	2026-02-13 16:51:34 +00:00
Stephen Finucane	c9be8b9aba	tests: Fix bound Ensure we do not negative values except for -1 (unlimited). Change-Id: I9a0184ed54054c6466833df24dfbe9ca7d1b454b Signed-off-by: Stephen Finucane <stephenfin@redhat.com>	2026-02-13 16:17:31 +00:00
Eigil Obrestad	bfaec08220	Make nova recognize amx-capabilities Expands the CPU_TRAITS_MAPPING table to let nova report if a compute-node supports AMX. This enables nova to pick the correct cpu_model when a SapphireRapids (or newer) cpu is wanted by the flavor. Implements: blueprint add-amx-traits Change-Id: Ieaa2e1be9d3d3ae945ce28d778edc9729d2db9ba Signed-off-by: Eigil Obrestad <eigil-git@obrestad.org> Depends-On: https://review.opendev.org/c/openstack/requirements/+/976640	2026-02-12 20:51:02 +01:00
Zuul	4fec7fe09d	Merge "Revert "Set openstacksdk-functional-devstack non voting""	2026-02-11 20:09:06 +00:00
Zuul	420b02d6be	Merge "Add regression test to repoduce bug 2140537"	2026-02-11 11:35:14 +00:00
lajoskatona	76d796193c	Add regression test to repoduce bug 2140537 Related-Bug: #2140537 Change-Id: I8c7cf544d599d5a11a2ae898822c2bde36f1d52a Signed-off-by: lajoskatona <lajos.katona@est.tech>	2026-02-09 13:11:53 +01:00
Balazs Gibizer	4227c9b14a	Revert "Set openstacksdk-functional-devstack non voting" This reverts commit `2f9f780a77`. Signed-off-by: Balazs Gibizer <gibi@redhat.com> Change-Id: Ia3d01ba6da0ade10ad70de951cbcb72204fbce12	2026-02-09 10:19:56 +01:00
Balazs Gibizer	2f9f780a77	Set openstacksdk-functional-devstack non voting There is neutron issue in the job but it's fix is being blocked by multiple other issues in the sdk's gate. Let's keep our gate operational until they fix the sdk gate. [1] https://review.opendev.org/c/openstack/openstacksdk/+/976008 [2] https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/message/5HHEYPZA6VIORX2XLBZGNMM2EVX2LR65/ Signed-off-by: Balazs Gibizer <gibi@redhat.com> Change-Id: Ie2fe2ec18a0fe7dbbfe4fbb9094d9542c729122a	2026-02-09 10:16:10 +01:00
Sean Mooney	264e868d49	Support os-vif TAP pre-creation for OVS/OVN ports Add support for os-vif TAP device pre-creation when Neutron sets the 'ovs_create_tap' flag in vif_details. This reduces live migration downtime by ensuring the network is fully wired before the VM starts. Changes: - Add VIF_DETAILS_OVS_CREATE_TAP constant to model.py - Propagate create_tap from binding details to os-vif port profile in os_vif_util.py - Set managed='no' in libvirt XML when create_tap is enabled so libvirt uses the pre-created TAP device - Set multiqueue on port profile in _plug_os_vif based on instance flavor/image hw:vif_multiqueue_enabled property When checking oslo.versionedobjects fields for backward compat: - Use 'field in obj.fields' to check if field exists in schema - Use 'field in obj' to check if field value is set Depends-On: https://review.opendev.org/c/openstack/os-vif/+/971231 Generated-By: Cursor claude-opus-4.5 Closes-Bug: #2069718 Change-Id: I32343658b53e317696d1bd8b984793bfeeccd409 Signed-off-by: Sean Mooney <work@seanmooney.info>	2026-02-05 18:55:06 +00:00
Zuul	a17b44f3eb	Merge "Use an executor to delay STOPPED events"	2026-02-05 17:38:28 +00:00
Zuul	75aed9a19d	Merge "Live migration with iothreads"	2026-02-05 10:56:23 +00:00
Zuul	c94d2eaedb	Merge "Enable mypy on nova/utils.py"	2026-02-05 03:43:47 +00:00
Zuul	6b0bb735a6	Merge "SubclassSignatureTestCase to use NoDBTestCase as base"	2026-02-05 03:14:21 +00:00
Zuul	6a6e05d4d3	Merge "Libvirt event handling without eventlet"	2026-02-05 03:14:05 +00:00
Artom Lifshitz	3eae9477d2	TPM: support live migration of `host` secret security This enables live migration for TPM instances with the ``host`` secret security mode. The ``host`` security mode uses key manager service secrets owned by the instance owner. The secret is persisted in Libvirt and is sent over RPC to the destination during a live migration. The service version will be bumped in a separate patch. Related to blueprint vtpm-live-migration Change-Id: I97e9dd454c793abcb1a20579b1ceaec627be4813 Signed-off-by: melanie witt <melwittt@gmail.com>	2026-02-04 16:52:06 -08:00
melanie witt	2bdf12535c	TPM: prepare to bump service version for live migration This prepares for a service version bump and adds a minimum service version check in the API to reject live migration requests for vTPM instances until the entire cloud is upgraded to the new version. The actual service version bump will be included in a later patch that implements vTPM live migration. Related to blueprint vtpm-live-migration Change-Id: I7daef8037385a4077dc0a78f03ae4b34a57560b7 Signed-off-by: melanie witt <melwittt@gmail.com>	2026-02-04 15:49:06 -08:00
Balazs Gibizer	8b14a16c57	Fix full executor warning on noname executor The warning log assumed all executors has a name. Our centrally managed executors has but not the adhoc ones causing a stack trace in the compute manager power_sync periodics. Change-Id: I04620364439a6c377f5b8f8f68cbdd3c62c44562 Signed-off-by: Balazs Gibizer <gibi@redhat.com>	2026-02-04 12:15:19 +01:00
Balazs Gibizer	8017b721fd	Cleanup libvirt driver at service stop As libvirt driver's Host object has a new headless thread we need to make sure that thread is exiting cleanly when nova-compute is being stopped. Also at the same time we make sure our unit tests are not leaking such thread across test cases with a new fixture and fixes in the test code. Change-Id: Ide274d6caa3314f9d25d51d1f72850cf77c9dee4 Signed-off-by: Balazs Gibizer <gibi@redhat.com>	2026-02-04 12:15:19 +01:00
Balazs Gibizer	3216573655	Remove spawn_after It was a naive implementation it is replaced with StaticallyDelayingCancellableTaskExecutorWrapper. Signed-off-by: Balazs Gibizer <gibi@redhat.com> Change-Id: I5e8d496473d4ec167d1655368a00cbfa78d2c074	2026-02-04 12:15:19 +01:00
Balazs Gibizer	f16170695c	Use an executor to delay STOPPED events During the VM hard reboot there is 3 events coming from libvirt * STOPPED * RESUMED * STARTED The libvirt driver implements automatic power sync of the VM based on the STOPPED event. But it should not do a stop() compute api call if the STOPPED event is followed right after by a STARTED event during hard reboot. So the libvirt driver delays processing the STOPPED event by 15 seconds and cancels the event if another lifecycle event is received for the same domain during that delay. In eventlet mode this is implemented by sheduling a greenlet and cancelling it. With native threading we cannot cancel a running task / thread so we need a bit smarter solution than just adding a sleep to the event handler and putting it in a threadpool. So this patch introduces an Executor wrapper that allows delaying the submission of a task into a real Executor by a predefine delay and checks for cancellation before during the real submission. The wrapper uses a single thread and a queue of tasks. As the delay is the same for every tasks the ordering of the execution of the tasks are the same as the order they was submitted to the wrapper. So the thread can process the queue of tasks one by one, check for the remaining time until the deadline of the oldest task then submit it to the real executor, then take the next task from the queue. Cancellation of a task is checked before any wait for a deadline and before the submission to the real executor. So a task is never executed if cancelled during its delay period. Change-Id: I8fb3bb1e5506f2792522bf822939e7e8ab68763d Signed-off-by: Balazs Gibizer <gibi@redhat.com>	2026-02-04 12:15:12 +01:00
Sean Mooney	c8d34ed3dc	Fix blockio generation for LUN volumes QEMU's scsi-block device driver does not support physical_block_size and logical_block_size properties. When Cinder reports disk geometry for LUN volumes, Nova was incorrectly including a <blockio> element in the libvirt XML, causing QEMU to fail with: Property 'scsi-block.physical_block_size' not found This fix adds a check to skip blockio generation when source_device is 'lun', following the existing pattern used for serial at line 1356. Generated-By: claude-code (Claude Opus 4.5) Closes-Bug: #2127196 Change-Id: Idf87e936edd97aac719222942c9842a9aca4c270 Signed-off-by: Sean Mooney <work@seanmooney.info>	2026-02-03 22:15:19 +00:00

1 2 3 4 5 ...

62016 Commits