womax/nova - nova - Gitea: Git with a cup of tea

womax/nova

Author	SHA1	Message	Date
Ghanshyam Maan	b47d217ca7	Add more test for graceful shutdown Adding more tests for graceful shutdown: - shutdown the destination compute and see how live and cold migration progress - start build instance and ocne comoute start building instance then shutdown the comoute service and see if build instance finish or not. - revert resize server Partial implement blueprint nova-services-graceful-shutdown-part1 Change-Id: I57132fb7b7fa614dfc138508581ff5a67aaed906 Signed-off-by: Ghanshyam Maan <gmaan.os14@gmail.com>	2026-02-25 20:46:24 +00:00
Ghanshyam Maan	996c4ff9e8	Prepare resize/cold migration for graceful shutdown During graceful shutdown, compute service keep a 2nd RPC server active which can be used to finish the in-progress operations. Like live migration, resize and cold migrations also perform RPC call among source and destination compute. For those operation also, we can use 2nd RPC server and make sure they will be completed during graceful shutdown. A quick overview of what all RPC methods are involved in the resize/cold migration and what all will be using 2nd RPC server: Resize/cold migration - prep_resize: No, resize/migration is not started yet. - resize_instance: Yes, here the resize/migration starts. - finish_resize: Yes - cross cell resize case: - prep_snapshot_based_resize_at_dest: NO, this is initial check and migration is not started - prep_snapshot_based_resize_at_source: Yes, this start the migration Confirm resize: NO - confirm_resize: NO - cross cell confirm resize case: - confirm_snapshot_based_resize - NO Revert resize: - revert_resize - NO - check_instance_shared_storage: YES. This is called from dest to source so we need source to respond to it so that revert can continue. - finish_revert_resize on source- YES, at this stage, revert resize is in progress and abandoning it here can lead migration to unreocverable state. - cross cell revert case: - revert_snapshot_based_resize_at_dest: NO - finish_revert_snapshot_based_resize_at_source: YES Partial implement blueprint nova-services-graceful-shutdown-part1 Change-Id: If08b698d012a75b587144501d829403ec616f685 Signed-off-by: Ghanshyam Maan <gmaan.os14@gmail.com>	2026-02-25 20:36:07 +00:00
Ghanshyam Maan	d5ffb58a8d	Use 2nd RPC server in compute operations For graceful shutdown of compute service, it will have two RPC servers. One RPC server is used for the new requests which will be stopped during graceful shutdown and 2nd RPC server (listen on 'compute-alt' topic) will be used to complete the in-progress operations. We select the operations (case by case) and their RPC method to use the 2nd PRC server so that they will not be interupted on shutdown initiative and graceful shutdown time will keep 2nd RPC server active for graceful_shutdown_timeout. A new method 'prepare_for_alt_rpcserver' is added which will fallback to first RPC server if it detect the old compute. As this is upgrade impact, it bumps the compute/service version, adds releasenotes for the same. The list of operations who should use the 2nd RPC server will grow evanutally and this commit moves the below operations to use the 2nd RPC server: * Live migration - Live migration: It use 2nd RPC servers and will try to complete the operation during shutdown. - live_migration_force_complete does not need to use 2nd RPC server. It is direct RPC request from API to compute and if that is rejected during shutdown, it is fine and can be initiated again once compute is up. - live_migration_abort does not need to use 2nd RPC server. Ditto, it is direct RPC request from API to compute. It cancel the queue live migration but if migration is already started, then driver cancel the migration. If it is rejected during shutdown because of RPC is stopped, it is fine and can be initiated again. * server external event * Get server console As graceful shutdown cannot be tested in tempest, this adds a new job to test it. Currently it test the live migration operation which can be extended to other operations who will use 2nd RPC server. Partial implement blueprint nova-services-graceful-shutdown-part1 Change-Id: I4de3afbcfaefbed909a29a831ac18060c4a73246 Signed-off-by: Ghanshyam Maan <gmaan.os14@gmail.com>	2026-02-25 20:32:44 +00:00
Zuul	37a9596eb1	Merge "libvirt: Use firmware auto-selection by libvirt"	2026-02-25 18:13:09 +00:00
Zuul	4c32ea65e3	Merge "libvirt: Add capability to load smm feature from existing xml"	2026-02-25 17:47:46 +00:00
Stephen Finucane	ed83dab5a7	api: Simplify API version check for flavor description Unlike the check for extra specs, the check for whether to include a description field or not is driven entirely by API version rather than API version and policy. We can therefore move the checks inside the functions that generate the response rather than duplicating them elsewhere. Change-Id: I86aa4e1c62a0b0e6fa4d27e559d3197fb73851ba Signed-off-by: Stephen Finucane <stephenfin@redhat.com>	2026-02-25 14:16:36 +00:00
Stephen Finucane	61c1ce6c8e	tests: Clean up flavors tests Ahead of adding additional tests. We do the following: * Move the controller to an instance attribute rather than a class attribute * Modify tests so they all call controller methods directly rather than setting up a fake router (this is the cause of the largest changes) * Remove unnecessary aliasing of exceptions * Remove unnecessary setUp arguments * Split a test into multiple tests * Standardize test class names Change-Id: I2cac4cc79288f7b3bacc4a63a1d36d4cf12013d7 Signed-off-by: Stephen Finucane <stephenfin@redhat.com>	2026-02-25 14:16:36 +00:00
Stephen Finucane	d79132d113	db: Move regex helpers to utils So that we can use them for API DB methods, which are found in nova.objects instead of nova.db. Change-Id: Ifb15ee90ac6a6400b7268ed80f727080e98c4cdf Signed-off-by: Stephen Finucane <stephenfin@redhat.com>	2026-02-25 14:16:36 +00:00
Stephen Finucane	a5bde25463	api: Add runtime check for general additionalProperties Change-Id: I959afd6e6fa89f0656c10599e50ecb179c87d354 Signed-off-by: Stephen Finucane <stephenfin@redhat.com>	2026-02-25 14:12:14 +00:00
Stephen Finucane	0c4c0d8ce8	api: Fix issue with instance usage audit log schema We need another level of nesting [1]. [1] https://groups.google.com/g/json-schema/c/pK_Y1Gb5waM Change-Id: I9828e287208a0dff8f909036df848f7539c534d4 Signed-off-by: Stephen Finucane <stephenfin@redhat.com>	2026-02-25 13:48:11 +00:00
Stephen Finucane	d4264cd447	api: Remove errant field A follow-up for Ia178c1314f99c719827e3eb78735d1019852a273 and I0e42de5074dcf699886b20dfd43306683e381ee2. 'adminPass' is only (optionally) returned in server create and rebuild responses, not in server show or update responses. Change-Id: I2c4ce7a2b1063d71561d6af95a58a36b39356879 Signed-off-by: Stephen Finucane <stephenfin@redhat.com>	2026-02-25 12:31:46 +00:00
Zuul	ebb3175d9c	Merge "tests: Fix bound"	2026-02-25 00:17:42 +00:00
Zuul	c26c66afb3	Merge "api: Add response body schemas for servers APIs (4/6)"	2026-02-24 23:57:17 +00:00
Zuul	983fbc683b	Merge "api: Add response body schemas for servers APIs (3/6)"	2026-02-24 23:57:05 +00:00
Zuul	7dd7a0ffe3	Merge "api: Add response body schemas for servers APIs (2/6)"	2026-02-24 23:45:11 +00:00
Zuul	02d083650b	Merge "api: Add response body schemas for servers APIs (1/6)"	2026-02-24 23:44:56 +00:00
Zuul	9855b5c8cc	Merge "libvirt: Add capability to load loader and nvram from xml"	2026-02-24 20:12:05 +00:00
Zuul	adbea93431	Merge "libvirt: Add basic xml generation for firmware auto selection"	2026-02-24 20:11:42 +00:00
Zuul	04e926a38e	Merge "libvirt: Extend functional test coverage of UEFI boot guests"	2026-02-24 19:14:59 +00:00
Zuul	5ed052061a	Merge "Fix the negative sleep value in graceful_shutdown()"	2026-02-24 19:14:06 +00:00
Zuul	05a539ace9	Merge "Fix for bug 2140537"	2026-02-24 18:23:50 +00:00
Zuul	12ee1cd80d	Merge "Add manager graceful shutdown, timeout, and wait"	2026-02-24 17:58:27 +00:00
Balazs Gibizer	4bce4480b9	Run nova-compute in native threading mode Previous patches removed direct eventlet usage from nova-compute so now we can run it with native threading as well. This patch documents the possibility and switches both nova-compute processes to native threading mode in the nova-next job. Change-Id: I7bb29c627326892d1cf628bbf57efbaedda12f1a Signed-off-by: Balazs Gibizer <gibi@redhat.com>	2026-02-24 16:28:06 +01:00
Balazs Gibizer	9e678b83eb	[compute]Use single long task executor Move the execution of build_and_run_instance and snapshot_instance to one common long task executor. Originally snapshot ran on the RPC pool, build_and_run_instance ran on the default pool. Also each of these tasks had a separate concurrency limit enforced by a semaphore. After this patch each of these tasks use a common Executor. The size of that executor and the way how we limit the concurrency differs in eventlet and in native threading mode. In eventlet mode we have one big Executor with "unlimit" size and individual semaphores are used for each task type to enforce the configured limits. In threading mode we requests the admin to configure the 2 limits to the same number, and we warn if not. We use that limit (or the max of the 2 limits) as the size of the long task Executor. As the limits are the same we don't enforce individual limit any more. The executor size will ensure the shared limit is kept. As the limit is shared a single operation type can consume the whole limit. Note that while live migration is a long-running task we cannot put it into the same long_task_executor as build and snapshot as we need: 1. a very small limit of concurrent live migrations compared to builds and snapshots 2. a way to cancel live migrations easily that are waiting due to the limit Change-Id: I88a6a593af8a5b518715e1245a76ee54752afe83 Signed-off-by: Balazs Gibizer <gibi@redhat.com>	2026-02-24 16:28:06 +01:00
Balazs Gibizer	efd61188c1	Deprecate unlimited compute actions We already deprecated the unlimited max_concurrent_live_migrations config value and now we do the same for max_concurrent_builds and max_concurrent_snapshots as well. The reason is similar. * The unlimited meaning was a lie, it was limited by other constructs in the code. For these option the limit was the size of the RPC executor defaulted to 64. * In native threading mode having unlimited concurrent tasks is unfeasible due to the memory cost of native threads for each task. The deprecation is done in a way that in eventlet mode we keep a similar behavior as before but in native threading mode we enforce a strict maximum even if unlimited is requested. Change-Id: Ibbf76c2c85729820035c9791719bf2c864bce12b Signed-off-by: Balazs Gibizer <gibi@redhat.com>	2026-02-24 16:28:00 +01:00
Takashi Kajinami	5841095740	libvirt: Use firmware auto-selection by libvirt Use the firmware auto-selection feature in libvirt to find the best UEFI firmware file according to the requested feature. Firmware files may be reselected when a libvirt domain is created from scratch, while these are kept during hard-reboot (or live migration which preserves the loader/nvram elements filled by libvirt). Closes-Bug: #2122296 Related-Bug: #2122288 Implements: blueprint libvirt-firmware-auto-selection Change-Id: Ie48b020597a1a2fb3280815eec5ba3565e396f9b Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>	2026-02-24 20:26:41 +09:00
Zuul	a3e9095f02	Merge "Add VNC console support for the Ironic driver"	2026-02-23 19:04:09 +00:00
Ghanshyam Maan	48bdfc8b2f	Fix the negative sleep value in graceful_shutdown() This fixes the following comment to avoid having the negative sleep value in manager graceful_shutdown() - https://review.opendev.org/c/openstack/nova/+/975586/comment/d5e3a603_0c746704/ Change-Id: I07a994bd05ac1e7f734f2a2144327bd2559c1416 Signed-off-by: Ghanshyam Maan <gmaan.os14@gmail.com>	2026-02-23 18:51:43 +00:00
Zuul	64d6ba34c5	Merge "Fix the flasky test test_submit_second_while_delaying_first"	2026-02-19 20:21:47 +00:00
Ghanshyam Maan	2fb9113ed2	Add manager graceful shutdown, timeout, and wait As per the part1 of the graceful shutdown timeouts[1], this commit add/modifies the below timeout/wait needed for graceful shutdown: - Override the default of the graceful_shutdown_timeout to 180. - Add a new config option for manager shutdown timeout. It also adds a graceful_shutdown() method on the manager side, which will be called by the nova/service.py->stop() method before it stops the 2nd RPC server. In part1, this will wait for the configurable wait time, but part2 will implement a better solution to track the in-progress tasks. The idea is to have this single interface from the service manager (graceful_shutdown()) that will be called during graceful shutdown and is responsible for finishing the required tasks and cleanup. Partial implement blueprint nova-services-graceful-shutdown-part1 [1] https://specs.openstack.org/openstack/nova-specs/specs/2026.1/approved/nova-services-graceful-shutdown-part1.html#graceful-shutdown-timeouts Change-Id: I7c1934d3ec7854feac3fc8432627c25eba963ddf Signed-off-by: Ghanshyam Maan <gmaan.os14@gmail.com>	2026-02-19 19:44:41 +00:00
Ghanshyam Maan	b5c0a97582	Fix the flasky test test_submit_second_while_delaying_first This test failied a few times and most recent faaailure is - https://review.opendev.org/c/openstack/nova/+/975586 (PS8 run) Traceback (most recent call last): File "/home/zuul/src/opendev.org/openstack/nova/nova/tests/unit/test_utils.py", line 2185, in test_submit_second_while_delaying_first self.assertGreater(task2_runtime, 2.0) File "/usr/lib/python3.12/unittest/case.py", line 1269, in assertGreater self.fail(self._formatMessage(msg, standardMsg)) File "/usr/lib/python3.12/unittest/case.py", line 715, in fail raise self.failureException(msg) AssertionError: 1.997275639999998 not greater than 2.0 From error, it seems we are capturing the start time after we submit the task to executor who will count the task submit time little ahead of test captured the task start time. let's capture the task start time before task is submitted so that we can caompare the time in more correct way. Change-Id: I5a9845813b614c58e0f5a66e07f8a8c732f38eb3 Signed-off-by: Ghanshyam Maan <gmaan.os14@gmail.com>	2026-02-19 19:06:07 +00:00
Zuul	7a303bc1e2	Merge "Add 2nd RPC server for compute service"	2026-02-19 17:36:34 +00:00
Zuul	7dd05b7af6	Merge "FairLockGuard: Support cross-thread sharing and nesting"	2026-02-18 17:54:57 +00:00
Zuul	0d074c6775	Merge "Make nova recognize amx-capabilities"	2026-02-18 17:22:49 +00:00
Zuul	716fa5edb9	Merge "Fix injection of [cors] allowed_origin"	2026-02-18 15:21:32 +00:00
Zuul	00f554fd92	Merge "Fix full executor warning on noname executor"	2026-02-18 14:37:19 +00:00
Zuul	88adf2c870	Merge "Cleanup libvirt driver at service stop"	2026-02-18 14:37:06 +00:00
Johannes Kulik	2a51df2760	Attaching a volume returns HTTP 202 Instead of returning an HTTP 200 and a `volumeAttachment` object, attaching a volume to an instance returns HTTP 202 starting from API version 2.101. To keep the functionality for older API versions, we move the `_attach_volume()` method from n-api to n-conductor and either do a call or a cast depending on whether the API needs to return a value. n-conductor then handles reserving the block_device_mapping's device by calling n-compute before it starts the previously-already-async volume attachment. We have to move `_check_attach_and_reserve_volume` into compute utils, because it's getting called in both conductor and compute api (for the shelved offloaded attach). The new RPC method in the conductor needs a long timeout when used with API versions less than the new 2.101, because it waits for the call to `reserve_block_device_name()` in nova-compute which already needs a long timeout. Updating the functional tests' `post_server_volume()` and `_attach_volume()` to not return the attachment anymore is possible, as no test uses the returned values. Change-Id: I4d38c2679f0e88cca30055a9c8c45ba1dd6fb5ef Implements: blueprint async-volume-attachments Signed-off-by: Johannes Kulik <johannes.kulik@sap.com>	2026-02-18 15:02:21 +01:00
Zuul	3b23957f77	Merge "Remove spawn_after"	2026-02-18 14:01:19 +00:00
Takashi Kajinami	e212a1e744	libvirt: Add capability to load smm feature from existing xml Some firmwares require smm feature. While the feature doesn't have to be explicitly enabled when auto-selection is enabled, it should be enabled explicitly when firmware files are pre-defined. Partially-Implements: blueprint libvirt-firmware-auto-selection Change-Id: Ia194dcfacd2b743761e720d947a6807689a96da3 Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>	2026-02-18 22:44:12 +09:00
Takashi Kajinami	3136d594a4	libvirt: Add capability to load loader and nvram from xml ... so that we can load these from existing guest XML. This is a preparation work to use firmware auto-selection by libvirt, and is required to avoid re-selection during hard-reboot. Partially-Implements: blueprint libvirt-firmware-auto-selection Change-Id: I899cb7d6ee364def8d1298b77c24cc5156c71126 Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>	2026-02-18 22:44:10 +09:00
Takashi Kajinami	511518e493	libvirt: Add basic xml generation for firmware auto selection Extend the existing (but unused) guest xml generation logic for firmware detection, by adding the firmware features flags to require secure boot support. Partially-Implements: blueprint libvirt-firmware-auto-selection Change-Id: I907c9c88f370a52b54b98e1e1cbda6c21d2bff62 Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>	2026-02-18 22:43:06 +09:00
Takashi Kajinami	0c939329c5	libvirt: Extend functional test coverage of UEFI boot guests Adds non-secure boot scenario and stateless firmware scenario to demonstrate how guest xml contents look like when firmware files are selected by libvirt. Partially-Implements: blueprint libvirt-firmware-auto-selection Change-Id: I88f0b81c8455630145efca8c6349fc00a0c29835 Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>	2026-02-18 22:43:06 +09:00
Takashi Kajinami	97beb983e6	Fix injection of [cors] allowed_origin Using a string value was deprecated in oslo.middleware 3.15.0[1] which was released 9 years age. The value of this option has been treated as a list value since then. [1] 7e519d008f7743d75ec299095060a70d5fd00f99 The latest oslo.middelware release removed the deprecated handling. Change-Id: Ib88c046af14f5d5de0d410a35a702b7a2322c832 Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>	2026-02-18 22:08:54 +09:00
Sean Mooney	72132f89ee	FairLockGuard: Support cross-thread sharing and nesting This change improves FairLockGuard to properly support two previously unsupported (or broken) usage patterns: 1. Cross-thread sharing: When threads share the same FairLockGuard instance, they now correctly wait for each other instead of raising TypeError. Fixed by: - Adding _active_thread tracking to identify the owning thread - Restructuring lock acquisition order: named locks are now acquired OUTSIDE of locks_lock to prevent deadlock when Thread-B waits on locks held by Thread-A - Only same-thread re-entry triggers the nesting logic, not cross-thread access 2. Same-thread nesting: The same FairLockGuard instance can now be nested within itself. Fixed by: - Adding _nesting_depth counter initialized to 0 - Nested entries increment depth and return early (locks held) - Exits decrement depth; locks only released when depth reaches 0 - This prevents lock leaks that would occur if inner exit cleared self.locks before outer exit could release them Additional improvements: - Exception handling during partial lock acquisition now properly releases any locks acquired before the failure - Lock release moved outside locks_lock in __exit__ for consistency The docstring has been updated to reflect that both patterns now work, while continuing to discourage them in favor of creating separate FairLockGuard instances for clarity. New tests added: - test_deep_nesting: Verifies 3+ levels of nesting - test_nested_exception_outer_still_holds_locks: Verifies outer context retains locks when inner context raises an exception - test_empty_lock_list: Verifies empty lock list edge case Related-Bug: #2048837 Generated-By: claude-code opus 4.5 Change-Id: Ia937b0e2d76c814360f168d5f33b821bfc61aade Signed-off-by: Sean Mooney <work@seanmooney.info>	2026-02-18 12:51:51 +00:00
Zuul	0383085510	Merge "Preserve UEFI NVRAM variable store"	2026-02-18 06:36:45 +00:00
Zuul	c7e40ef573	Merge "Make disk.extend() pass format to qemu-img"	2026-02-18 01:20:15 +00:00
Sean Mooney	6c9110bb8b	Handle missing libvirt services in evacuate hook On Debian 13 (Trixie), libvirt packaging is modularized and the libvirt-daemon-lock package (providing virtlockd) is optional. The evacuate hook previously assumed all libvirt services were installed and failed when stopping/starting missing units. Extract a reusable manage_libvirt_service.yaml task file that checks if a service exists via systemctl list-unit-files before managing its units. This prevents failures when optional libvirt packages are not installed and future-proofs against further packaging changes. Generated-By: claude-code Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Change-Id: Ie84e2e8ab2d3065b1562ee5e256fa163541955f7 Signed-off-by: Sean Mooney <work@seanmooney.info>	2026-02-17 18:22:22 +00:00
Dan Smith	3eba22ff09	Make disk.extend() pass format to qemu-img This fixes an instance of us passing a disk image to qemu-img for resize where we don't constrain the format. As has previously been identified, it is never safe to do that when the image itself is not trusted. In this case, an instance with a previously-raw disk image being used by imagebackend.Flat is susceptible to the user writing a qcow2 (or other) header to their disk causing the unconstrained qemu-img resize operation to interpret it as a qcow2 file. Since Flat maintains the intended disk format in the disk.info file, and since we would have safety-checked images we got from glance, we should be able to trust the image.format specifier, which comes from driver_format in imagebackend, which is read from disk.info. Since only raw or qcow2 files should be resized anyway, we can further constrain it to those. Notes: 1. qemu-img refuses to resize some types of VMDK files, but it may be able to resize others (there are many subformats). Technically, Flat will allow running an instance directly from a VMDK file, and so this change _could_ be limiting existing "unintentionally works" behavior. 2. This assumes that disk.info is correct, present, etc. The code to handle disk.info will regenerate the file if it's missing or unreadable by probing the image without a safety check, which would be unsafe. However, that is a much more sophisticated attack, requiring either access to the system to delete the file or an errant operator action in the first place. Change-Id: I07cbe90b7a7a0a416ef13fbc3a1b7e2272c90951 Closes-Bug: #2137507 Signed-off-by: Dan Smith <dansmith@redhat.com>	2026-02-17 06:35:35 -08:00
Kamil Sambor	aa9ec17b72	Destroy scatter_gather in conductor Ensure destroy_scatter_gather_executor() is invoked during conductor startup to prevent reuse of a pre-fork scatter_gather executor. Change-Id: I62a01f51877001f19605762a1b8a09913b441dd2 Signed-off-by: Kamil Sambor <kamil.sambor@gmail.com>	2026-02-17 10:52:10 +01:00

1 2 3 4 5 ...

62045 Commits