womax/nova - nova - Gitea: Git with a cup of tea

womax/nova

Author	SHA1	Message	Date
Balazs Gibizer	4bce4480b9	Run nova-compute in native threading mode Previous patches removed direct eventlet usage from nova-compute so now we can run it with native threading as well. This patch documents the possibility and switches both nova-compute processes to native threading mode in the nova-next job. Change-Id: I7bb29c627326892d1cf628bbf57efbaedda12f1a Signed-off-by: Balazs Gibizer <gibi@redhat.com>	2026-02-24 16:28:06 +01:00
Balazs Gibizer	9e678b83eb	[compute]Use single long task executor Move the execution of build_and_run_instance and snapshot_instance to one common long task executor. Originally snapshot ran on the RPC pool, build_and_run_instance ran on the default pool. Also each of these tasks had a separate concurrency limit enforced by a semaphore. After this patch each of these tasks use a common Executor. The size of that executor and the way how we limit the concurrency differs in eventlet and in native threading mode. In eventlet mode we have one big Executor with "unlimit" size and individual semaphores are used for each task type to enforce the configured limits. In threading mode we requests the admin to configure the 2 limits to the same number, and we warn if not. We use that limit (or the max of the 2 limits) as the size of the long task Executor. As the limits are the same we don't enforce individual limit any more. The executor size will ensure the shared limit is kept. As the limit is shared a single operation type can consume the whole limit. Note that while live migration is a long-running task we cannot put it into the same long_task_executor as build and snapshot as we need: 1. a very small limit of concurrent live migrations compared to builds and snapshots 2. a way to cancel live migrations easily that are waiting due to the limit Change-Id: I88a6a593af8a5b518715e1245a76ee54752afe83 Signed-off-by: Balazs Gibizer <gibi@redhat.com>	2026-02-24 16:28:06 +01:00
Balazs Gibizer	efd61188c1	Deprecate unlimited compute actions We already deprecated the unlimited max_concurrent_live_migrations config value and now we do the same for max_concurrent_builds and max_concurrent_snapshots as well. The reason is similar. * The unlimited meaning was a lie, it was limited by other constructs in the code. For these option the limit was the size of the RPC executor defaulted to 64. * In native threading mode having unlimited concurrent tasks is unfeasible due to the memory cost of native threads for each task. The deprecation is done in a way that in eventlet mode we keep a similar behavior as before but in native threading mode we enforce a strict maximum even if unlimited is requested. Change-Id: Ibbf76c2c85729820035c9791719bf2c864bce12b Signed-off-by: Balazs Gibizer <gibi@redhat.com>	2026-02-24 16:28:00 +01:00
Zuul	00f554fd92	Merge "Fix full executor warning on noname executor"	2026-02-18 14:37:19 +00:00
Zuul	88adf2c870	Merge "Cleanup libvirt driver at service stop"	2026-02-18 14:37:06 +00:00
Zuul	3b23957f77	Merge "Remove spawn_after"	2026-02-18 14:01:19 +00:00
Zuul	0383085510	Merge "Preserve UEFI NVRAM variable store"	2026-02-18 06:36:45 +00:00
Zuul	c7e40ef573	Merge "Make disk.extend() pass format to qemu-img"	2026-02-18 01:20:15 +00:00
Sean Mooney	6c9110bb8b	Handle missing libvirt services in evacuate hook On Debian 13 (Trixie), libvirt packaging is modularized and the libvirt-daemon-lock package (providing virtlockd) is optional. The evacuate hook previously assumed all libvirt services were installed and failed when stopping/starting missing units. Extract a reusable manage_libvirt_service.yaml task file that checks if a service exists via systemctl list-unit-files before managing its units. This prevents failures when optional libvirt packages are not installed and future-proofs against further packaging changes. Generated-By: claude-code Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Change-Id: Ie84e2e8ab2d3065b1562ee5e256fa163541955f7 Signed-off-by: Sean Mooney <work@seanmooney.info>	2026-02-17 18:22:22 +00:00
Dan Smith	3eba22ff09	Make disk.extend() pass format to qemu-img This fixes an instance of us passing a disk image to qemu-img for resize where we don't constrain the format. As has previously been identified, it is never safe to do that when the image itself is not trusted. In this case, an instance with a previously-raw disk image being used by imagebackend.Flat is susceptible to the user writing a qcow2 (or other) header to their disk causing the unconstrained qemu-img resize operation to interpret it as a qcow2 file. Since Flat maintains the intended disk format in the disk.info file, and since we would have safety-checked images we got from glance, we should be able to trust the image.format specifier, which comes from driver_format in imagebackend, which is read from disk.info. Since only raw or qcow2 files should be resized anyway, we can further constrain it to those. Notes: 1. qemu-img refuses to resize some types of VMDK files, but it may be able to resize others (there are many subformats). Technically, Flat will allow running an instance directly from a VMDK file, and so this change _could_ be limiting existing "unintentionally works" behavior. 2. This assumes that disk.info is correct, present, etc. The code to handle disk.info will regenerate the file if it's missing or unreadable by probing the image without a safety check, which would be unsafe. However, that is a much more sophisticated attack, requiring either access to the system to delete the file or an errant operator action in the first place. Change-Id: I07cbe90b7a7a0a416ef13fbc3a1b7e2272c90951 Closes-Bug: #2137507 Signed-off-by: Dan Smith <dansmith@redhat.com>	2026-02-17 06:35:35 -08:00
Zuul	f3d9188a93	Merge "TPM: support live migration of `host` secret security"	2026-02-16 18:30:56 +00:00
Zuul	c804c3ddb8	Merge "TPM: prepare to bump service version for live migration"	2026-02-16 18:30:42 +00:00
Zuul	2b30cafe98	Merge "Add vtpm_secret_(uuid\|value) to LibvirtLiveMigrateData"	2026-02-16 18:28:29 +00:00
Nicolai Ruckel	35b1945522	Preserve UEFI NVRAM variable store Preserve NVRAM variable store during stop/start, hard reboot, live migration, and volume retype. This does not affect cold migration or shelve. For UEFI guests (hw_firmware_type=uefi), every time the instance is started, the UEFI variable storage for that instance (/var/lib/libvirt/qemu/nvram/instance-xxxxxxxx_VARS.fd) is deleted and reinitialized from the default template. The changes are based on this patch by Jonas Schäfer to preserve the vTPM state: https://review.opendev.org/c/openstack/nova/+/955657 Closes-Bug: #1633447 Closes-Bug: #2131730 Change-Id: I444a9285c07a04bf08a73772235f8dd73d75e513 Signed-off-by: Nicolai Ruckel <nicolai.ruckel@cloudandheat.com>	2026-02-13 23:55:41 +01:00
Zuul	4fec7fe09d	Merge "Revert "Set openstacksdk-functional-devstack non voting""	2026-02-11 20:09:06 +00:00
Zuul	420b02d6be	Merge "Add regression test to repoduce bug 2140537"	2026-02-11 11:35:14 +00:00
lajoskatona	76d796193c	Add regression test to repoduce bug 2140537 Related-Bug: #2140537 Change-Id: I8c7cf544d599d5a11a2ae898822c2bde36f1d52a Signed-off-by: lajoskatona <lajos.katona@est.tech>	2026-02-09 13:11:53 +01:00
Balazs Gibizer	4227c9b14a	Revert "Set openstacksdk-functional-devstack non voting" This reverts commit `2f9f780a77`. Signed-off-by: Balazs Gibizer <gibi@redhat.com> Change-Id: Ia3d01ba6da0ade10ad70de951cbcb72204fbce12	2026-02-09 10:19:56 +01:00
Balazs Gibizer	2f9f780a77	Set openstacksdk-functional-devstack non voting There is neutron issue in the job but it's fix is being blocked by multiple other issues in the sdk's gate. Let's keep our gate operational until they fix the sdk gate. [1] https://review.opendev.org/c/openstack/openstacksdk/+/976008 [2] https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/message/5HHEYPZA6VIORX2XLBZGNMM2EVX2LR65/ Signed-off-by: Balazs Gibizer <gibi@redhat.com> Change-Id: Ie2fe2ec18a0fe7dbbfe4fbb9094d9542c729122a	2026-02-09 10:16:10 +01:00
Sean Mooney	264e868d49	Support os-vif TAP pre-creation for OVS/OVN ports Add support for os-vif TAP device pre-creation when Neutron sets the 'ovs_create_tap' flag in vif_details. This reduces live migration downtime by ensuring the network is fully wired before the VM starts. Changes: - Add VIF_DETAILS_OVS_CREATE_TAP constant to model.py - Propagate create_tap from binding details to os-vif port profile in os_vif_util.py - Set managed='no' in libvirt XML when create_tap is enabled so libvirt uses the pre-created TAP device - Set multiqueue on port profile in _plug_os_vif based on instance flavor/image hw:vif_multiqueue_enabled property When checking oslo.versionedobjects fields for backward compat: - Use 'field in obj.fields' to check if field exists in schema - Use 'field in obj' to check if field value is set Depends-On: https://review.opendev.org/c/openstack/os-vif/+/971231 Generated-By: Cursor claude-opus-4.5 Closes-Bug: #2069718 Change-Id: I32343658b53e317696d1bd8b984793bfeeccd409 Signed-off-by: Sean Mooney <work@seanmooney.info>	2026-02-05 18:55:06 +00:00
Zuul	a17b44f3eb	Merge "Use an executor to delay STOPPED events"	2026-02-05 17:38:28 +00:00
Zuul	75aed9a19d	Merge "Live migration with iothreads"	2026-02-05 10:56:23 +00:00
Zuul	c94d2eaedb	Merge "Enable mypy on nova/utils.py"	2026-02-05 03:43:47 +00:00
Zuul	6b0bb735a6	Merge "SubclassSignatureTestCase to use NoDBTestCase as base"	2026-02-05 03:14:21 +00:00
Zuul	6a6e05d4d3	Merge "Libvirt event handling without eventlet"	2026-02-05 03:14:05 +00:00
Artom Lifshitz	3eae9477d2	TPM: support live migration of `host` secret security This enables live migration for TPM instances with the ``host`` secret security mode. The ``host`` security mode uses key manager service secrets owned by the instance owner. The secret is persisted in Libvirt and is sent over RPC to the destination during a live migration. The service version will be bumped in a separate patch. Related to blueprint vtpm-live-migration Change-Id: I97e9dd454c793abcb1a20579b1ceaec627be4813 Signed-off-by: melanie witt <melwittt@gmail.com>	2026-02-04 16:52:06 -08:00
melanie witt	2bdf12535c	TPM: prepare to bump service version for live migration This prepares for a service version bump and adds a minimum service version check in the API to reject live migration requests for vTPM instances until the entire cloud is upgraded to the new version. The actual service version bump will be included in a later patch that implements vTPM live migration. Related to blueprint vtpm-live-migration Change-Id: I7daef8037385a4077dc0a78f03ae4b34a57560b7 Signed-off-by: melanie witt <melwittt@gmail.com>	2026-02-04 15:49:06 -08:00
Balazs Gibizer	8b14a16c57	Fix full executor warning on noname executor The warning log assumed all executors has a name. Our centrally managed executors has but not the adhoc ones causing a stack trace in the compute manager power_sync periodics. Change-Id: I04620364439a6c377f5b8f8f68cbdd3c62c44562 Signed-off-by: Balazs Gibizer <gibi@redhat.com>	2026-02-04 12:15:19 +01:00
Balazs Gibizer	8017b721fd	Cleanup libvirt driver at service stop As libvirt driver's Host object has a new headless thread we need to make sure that thread is exiting cleanly when nova-compute is being stopped. Also at the same time we make sure our unit tests are not leaking such thread across test cases with a new fixture and fixes in the test code. Change-Id: Ide274d6caa3314f9d25d51d1f72850cf77c9dee4 Signed-off-by: Balazs Gibizer <gibi@redhat.com>	2026-02-04 12:15:19 +01:00
Balazs Gibizer	3216573655	Remove spawn_after It was a naive implementation it is replaced with StaticallyDelayingCancellableTaskExecutorWrapper. Signed-off-by: Balazs Gibizer <gibi@redhat.com> Change-Id: I5e8d496473d4ec167d1655368a00cbfa78d2c074	2026-02-04 12:15:19 +01:00
Balazs Gibizer	f16170695c	Use an executor to delay STOPPED events During the VM hard reboot there is 3 events coming from libvirt * STOPPED * RESUMED * STARTED The libvirt driver implements automatic power sync of the VM based on the STOPPED event. But it should not do a stop() compute api call if the STOPPED event is followed right after by a STARTED event during hard reboot. So the libvirt driver delays processing the STOPPED event by 15 seconds and cancels the event if another lifecycle event is received for the same domain during that delay. In eventlet mode this is implemented by sheduling a greenlet and cancelling it. With native threading we cannot cancel a running task / thread so we need a bit smarter solution than just adding a sleep to the event handler and putting it in a threadpool. So this patch introduces an Executor wrapper that allows delaying the submission of a task into a real Executor by a predefine delay and checks for cancellation before during the real submission. The wrapper uses a single thread and a queue of tasks. As the delay is the same for every tasks the ordering of the execution of the tasks are the same as the order they was submitted to the wrapper. So the thread can process the queue of tasks one by one, check for the remaining time until the deadline of the oldest task then submit it to the real executor, then take the next task from the queue. Cancellation of a task is checked before any wait for a deadline and before the submission to the real executor. So a task is never executed if cancelled during its delay period. Change-Id: I8fb3bb1e5506f2792522bf822939e7e8ab68763d Signed-off-by: Balazs Gibizer <gibi@redhat.com>	2026-02-04 12:15:12 +01:00
Sean Mooney	c8d34ed3dc	Fix blockio generation for LUN volumes QEMU's scsi-block device driver does not support physical_block_size and logical_block_size properties. When Cinder reports disk geometry for LUN volumes, Nova was incorrectly including a <blockio> element in the libvirt XML, causing QEMU to fail with: Property 'scsi-block.physical_block_size' not found This fix adds a check to skip blockio generation when source_device is 'lun', following the existing pattern used for serial at line 1356. Generated-By: claude-code (Claude Opus 4.5) Closes-Bug: #2127196 Change-Id: Idf87e936edd97aac719222942c9842a9aca4c270 Signed-off-by: Sean Mooney <work@seanmooney.info>	2026-02-03 22:15:19 +00:00
huanhongda	53a613d994	Live migration with iothreads In commit `76d64b9cb4` we enable one io-thread per qemu instance. Live migration should update this. Related-Bug: #2139351 Change-Id: I1476de288490c88a60db697fbb45b4f783821c14 Signed-off-by: hongda.xun <hongda.xun@easystack.cn>	2026-01-30 17:38:00 +08:00
Sean Mooney	ba24639b8d	Add regression test to repoduce bug 2139351 This tests repoduces the current bug where the iothread pinning is not updated for numa instnace on live migration and enhance the libvirt fixture to make this possible we also provide a sanity check for non numa instnace to show the vcpu cpuset is correctly. Related-Bug: #2139351 Assisted-By: claude-code opus 4.5 Change-Id: Ib2c0d1f826ad4f31e3e9b3f61f2c9b2111bf7edd Signed-off-by: Sean Mooney <work@seanmooney.info>	2026-01-29 15:19:24 +00:00
Balazs Gibizer	9f74d1c5f2	Enable mypy on nova/utils.py As a follow up for a review comment in [1] this patch enables mypy for nova/utils, fixes the existing mypy findings, and adds some trivial type annotations where make sense. [1]https://review.opendev.org/c/openstack/nova/+/956089/comment/caec94ed_4fdb16bf/ Change-Id: I29ca69bd1e583adc1b1f408bd45de183649986d2 Signed-off-by: Balazs Gibizer <gibi@redhat.com>	2026-01-29 11:57:04 +01:00
Balazs Gibizer	c89e54cedc	SubclassSignatureTestCase to use NoDBTestCase as base We have a list of fixtures included in the test.TestCase base class that prevents global data and tread leaking across test cases within the same process. The SubclassSignatureTestCase did not use our base class but it initializes a partial libvirt driver class that will soon use a ThreadPoolExecutor in native threading mode. So we need the leak protection here as well. So this patch moves SubclassSignatureTestCase to use the NoDBTestCase base class. Change-Id: I05e818e8e83757185e5af78a5a4771c90d9fa217 Signed-off-by: Balazs Gibizer <gibi@redhat.com>	2026-01-29 11:54:25 +01:00
Balazs Gibizer	a89c1b44c5	Libvirt event handling without eventlet Our libvirt interface is not eventlet aware and not pure python. So eventlet monkey patching is not enough. So the libvirt driver implemented a native polling thread for libvirt and the queue + pipe mechanism to push event from the native polling thread to the main thread with the eventlet event loop. We don't need all of these complications in native thread mode. There we only need a single thread that poll libvirt for the events. The received events can be executed directly on the polling thread as that is no different from any other threads in the system now. To make the change more understandable the event handling logic is moved behind an abstraction that is implemented twice, once for eventlet with the existing implementation just moved around, and once for native threading with the simplified handling. Change-Id: If479574cd91975810098afa8e3c220c7316a9431 Signed-off-by: Balazs Gibizer <gibi@redhat.com>	2026-01-29 11:54:25 +01:00
melanie witt	8b3701490e	Add vtpm_secret_(uuid\|value) to LibvirtLiveMigrateData This is needed in order to pass TPM secret information to the destination over RPC to support the 'host' secret security mode. The fields are nullable so that secret security modes 'user' and 'deployment' may set them to None. A setting of None lets the other security modes convey that they are actively choosing not to pass any data in the vTPM fields. This is important for interacting with older compute hosts in the middle of a rolling upgrade. We do not want to backlevel new LibvirtLiveMigrateData objects involving vTPM because older compute hosts cannot support vTPM live migration in any capacity. Related to blueprint vtpm-live-migration Change-Id: If2ff2a7bb41dea6e0959c965477b79f3f7d633e7 Signed-off-by: melanie witt <melwittt@gmail.com>	2026-01-28 12:41:54 -08:00
Zuul	59a7093915	Merge "Use the correct name for the ironic check job"	2026-01-28 08:18:07 +00:00
Zuul	4112a4491c	Merge "Preserve vTPM state between power off and power on"	2026-01-28 01:43:47 +00:00
Zuul	ce286865f9	Merge "[hacking]Do not mock threading.Event"	2026-01-27 20:42:15 +00:00
Zuul	134d3ac476	Merge "api: Simplify servers views (3/3)"	2026-01-27 14:17:53 +00:00
Zuul	d3143aeec7	Merge "api: Simplify servers views (2/3)"	2026-01-27 14:13:32 +00:00
Zuul	2032cb2828	Merge "api: Simplify servers views (1/3)"	2026-01-27 13:53:24 +00:00
Steve Baker	1637397253	Use the correct name for the ironic check job The job name has been an alias for 6 years [1] and the accurate preferred name ironic-tempest-bios-ipmi-direct has been in place for 8 months [2]. The intent of job names is to accurately describe the configuration of the job, and the name ironic-tempest-ipa-wholedisk-bios-agent_ipmitool-tinyipa is now inaccurate - specifically the job no longer uses tinyipa. [1] https://opendev.org/openstack/ironic/commit/53f751dcfd86594160dc9be92b616ef5d0d70623 [2] https://opendev.org/openstack/ironic/blame/branch/master/zuul.d/ironic-jobs.yaml#L1210-L1236 Change-Id: I768a6d3c7f9f550a692dd1f6e0435228076f118f Signed-off-by: Steve Baker <sbaker@redhat.com>	2026-01-27 11:15:02 +13:00
Balazs Gibizer	19203d684d	[hacking]Do not mock threading.Event Such mock is too wide and will cause issues with our basic libraries and test infrastructure leading to race conditions and threads leaked across tests. We needed to remove a bunch of such mocks found by the new rule. In some cases we needed to make the mocking more specific for a given Event instance, in other case the mock was not needed at all and the test case was still not taking excessive time. Related-Bug: #2136815 Change-Id: I3ae3740eb07bade4e0883db3e02c0a81e92b9a36 Signed-off-by: Balazs Gibizer <gibi@redhat.com>	2026-01-26 20:26:56 +01:00
Zuul	d840c63a18	Merge "api: Add response body schemas for server metadata APIs"	2026-01-26 14:48:14 +00:00
Zuul	eabb1d1260	Merge "api: Remove networks key from quota schemas"	2026-01-26 14:48:01 +00:00
Zuul	e67372b33e	Merge "api: Add response body schemas for server tags API"	2026-01-25 03:50:50 +00:00
Zuul	d6d8f28640	Merge "api: Add response body schemas for server migrations API"	2026-01-25 03:50:32 +00:00

1 2 3 4 5 ...

61941 Commits