womax/nova - nova - Gitea: Git with a cup of tea

womax/nova

Author	SHA1	Message	Date
Zuul	9efabaa993	Merge "Reproducer for bug 2114951"	2025-09-08 17:48:28 +00:00
Zuul	0dd7cb1fb0	Merge "libvirt: Disable VMCoreInfo device for SEV-encrypted instances"	2025-09-05 16:32:24 +00:00
Zuul	7d5521ac84	Merge "[pci]Keep used dev in Placement regardless of dev_spec"	2025-09-05 15:36:30 +00:00
Zuul	ab744e5040	Merge "[PCI tracker]Remove non configured devs when freed"	2025-09-05 15:36:13 +00:00
Zuul	cb6ed0e3c0	Merge "Reproduce bug/2115905"	2025-09-05 14:40:57 +00:00
Balazs Gibizer	4495f1f019	[pci]Keep used dev in Placement regardless of dev_spec This changes the PCI Placement translator edge case handling logic to resolve a bug preventing VM deletion. If a device is allocated but removed from the dev_spec then we need to keep the device in Placement otherwise the Placement update will be rejected as we are trying to delete an RP that has allocations. This prevent the deletion of a VM that is using this removed device. The alternative would be to not allow the nova-compute service to start if it detects this situation. However this situation can happen in at least two very different cases: 1. The admin removed a dev_spec. In this case adding the dev_spec back, removing the VM, then removing the dev_spec is the right course of action and nova-compute failing to start would be OK to enforce this. 2. A device disappeared as the HW is died. In this case not allowing the nova-compute to start up would prevent the admin to migrate the other VMs away from the host before doing a HW replacement. Note that this is fairly complex change due to the fact that based on purely the PciDevice object we cannot differentiate between the two cases: 1. A PciDevice object is being removed as the related device spec is removed from the configuration or the device is disappeared from the hypervisor. 2. A PciDevice object was held back for a while as the device spec is removed (or the device disappeared from the hypervisor) while the device was allocated to a VM. And now that VM is undergoing deletion. In both case the PCI in Placement logic sees a PciDevice object in dev.status.REMOVED and dev.instance_uuid = None. However the two cases require different handling. 1. The related inventory can be removed from Placement 2. The related inventory cannot be removed from Placement as it is still being allocated to the VM that is undergoing deletion. The second case is due to the sequence of events during a VM deletion being: * We destroy the VM on the hypervisor * We update the PCI tracker to free the device. As the device was held back the tracker not just frees the device but removes it as well as it is not configured any more in the dev_spec so it should not go to AVAILABLE state. * When the PCI tracker is updated it calls the PCI in Placement logic to update Placement inventories as well. At this point the VM deletion still in progress and the VM's allocation hasn't been deleted in Placement, so the Placement inventory cannot be removed as it is still allocated. * After the resource tracker update is finished the compute manager deletes the VM's allocation in Placement. So in this edge case we temporarily keep the Placement inventory and only remove that in a subsequent periodic run where we are sure the VM's allocation is gone. This means there is a time window when the Placement inventory shows an extra resource even though that resource has already been removed from the PCI tracker. During this window the scheduler might select a host based on this ghost inventory and the compute resource tracker will reject the boot request forcing a normal re-schedule. Closes-Bug: #2115905 Change-Id: Ie9d311ea9f59ff49593003e3773b690dd36fdeb2 Signed-off-by: Balazs Gibizer <gibi@redhat.com>	2025-09-04 10:05:20 +00:00
Balazs Gibizer	f37cdf0c41	[PCI tracker]Remove non configured devs when freed The PCI tracker handles the case when a device spec is removed from the configuration while a device is still being allocated. It keeps the device until the VM is deleted to avoid inconsistencies. However the full removal of such a device needs not just the VM deletion, but also a nova-compute restart. The device tracker just frees the device during VM deletion but does not removed them until the next nova-compute startup. This allows the device to be re-allocated by another VM even though the device is not allowed by a device_spec. This change adds yet another in memory dict to the pci tracker to track these devices that are only kept until they are freed. Then during free() this list is consulted and if the device is in the list then the device is marked for removal as well. This kills two birds with one stone: * We prevent the re-allocation of the device as the state of the device will be set to REMOVED not AVAILABLE during VM deletion. * As PCI in Placement relies on the state of the device to decide what to track in placement, this change makes sure that a device that needs to be removed, is now removed from placement too. Note that we have another bug that prevents this removal for now. But at least the reproducers of that bug now starts to behave the same regardless of how many device belongs to the same RP in placement. Related-Bug: #2115905 Change-Id: I63c8fb2669a3c6b3adb77d210c0f9b39d3657c80 Signed-off-by: Balazs Gibizer <gibi@redhat.com>	2025-09-04 10:30:19 +02:00
Balazs Gibizer	d86aa2d15a	Reproduce bug/2115905 Both the PCI tracker and the PCI in Placement logic handles the case when a device spec is removed from the configuration while a device is still being allocated. However there are edge cases in PCI in Placement that it not handled well. Namely that if the VM with this allocation is deleted, then depending on the amount of VFs the PF had originally, the logic might try to delete the RP before the allocation is removed. That is rejected by Placement. This prevent the deletion of such a VM and therefore prevents one of the ways the original inconsistency can be Note that with this patch we see two additional behaviors worth mentioning: * When the VM is successfully deleted (in a single VF or PF case) the PCI tracker still keeps the now free device in the DB and therefore PCI in Placement also keeps the RP. This keeps the non whitelisted device available for allocations until the next nova-compute restart. * The PCI in Placement logic is different between the case where the last device is removed from an RP and the case where there are other devices on the RP, some that can be removed and some that cannot due to allocation. Related-Bug: #2115905 Change-Id: Ib3febb77299da65ada24ed49849c04cbf3c41af1 Signed-off-by: Balazs Gibizer <gibi@redhat.com>	2025-09-04 10:19:31 +02:00
Zuul	9f156aa954	Merge "Fix 'nova-manage image_property set' command"	2025-09-03 17:29:24 +00:00
Zuul	74e4ff46db	Merge "Do not yield in threading mode"	2025-09-03 16:59:21 +00:00
René Ribaud	aa59133626	Reproducer for bug 2114951 Ambiguous regexp prevent using device_filename like 'mkwinimage-cdrom'. The regexp matches a single character in the range between _ (index 94) and r (index 114) (case sensitive) Related-Bug: #2114951 Change-Id: I5c7ce18eb635a75d5aadc889e730ed77c9a10dc3 Signed-off-by: René Ribaud <rribaud@redhat.com>	2025-09-03 18:38:52 +02:00
Zuul	fa31983299	Merge "[CI]Make nova-tox-py312-threading voting"	2025-09-03 10:40:39 +00:00
Zuul	6fa7f807ad	Merge "Fix duplicate words"	2025-09-03 10:24:05 +00:00
Zuul	a4df1dea8c	Merge "Fix pci_tracker.save to delete all removed devs"	2025-09-02 20:20:45 +00:00
Zuul	ba2d41e463	Merge "Add service version for Falmingo"	2025-09-02 20:20:10 +00:00
René Ribaud	60ba6afc49	Add service version for Falmingo We agreed by I2dd906f34118da02783bb7755e0d6c2a2b88eb5d on the support envelope. Pre-RC1, we need to add a service version in the object. Post-RC1, depending on whether it's SLURP or not SLURP, we need to bump the minimum version or not. This patch only focuses on pre-RC1 stage. Given Gazpacho won't be skippable, we won't need a post-RC1 patch for updating the min that will continue to support Epoxy. HTH. Signed-off-by: René Ribaud <rribaud@redhat.com> Change-Id: I5bf6ad1077fe62e6ff628d211b745857167280fb	2025-09-02 15:51:00 +02:00
René Ribaud	73724fef9a	doc: mark the maximum microversion for 2025.2 Flamingo Change-Id: I4158fc072ebeda7709bc08eb7d0b924cbc99ca5a Signed-off-by: René Ribaud <rribaud@redhat.com>	2025-09-02 15:37:02 +02:00
Rajesh Tailor	68fbace8af	Fix duplicate words This change fixes duplicate consecutive words from docs as well as code. Signed-off-by: Rajesh Tailor <ratailor@redhat.com> Change-Id: I236ff41fccf831023b6f85840097148a30e84743	2025-09-02 18:06:31 +05:30
Zuul	9c1d971f01	Merge "Reproduce that only half of the PCI devs are removed"	2025-09-02 11:08:42 +00:00
Rajesh Tailor	19f206f58c	Fix 'nova-manage image_property set' command As of now, if operator wants to set traits using 'nova-manage image_property set' command, it fails with below error, because in ImageMetaProps traits are not stored as individual fields, but stored in 'traits_required' field which is of type list. 'Invalid image property name trait:CUSTOM_XYZ' The setting of traits are handled by _set_attr_from_trait_names method here [1]. This change handles the issue by continue the loop, if the property startswith 'traits' string. [1] https://opendev.org/openstack/nova/src/commit/725a307693806e6e32834198e23be75f771bebc1/nova/objects/image_meta.py#L708-L714 Closes-Bug: #2096341 Change-Id: Ifc20894801f723627726e3c9bed7076144542660 Signed-off-by: Rajesh Tailor <ratailor@redhat.com>	2025-09-02 12:22:55 +05:30
Zuul	539e971126	Merge "Follow-up of AMD SEV-ES support"	2025-09-01 11:59:27 +00:00
Zuul	aed238c064	Merge "Drop CentOS 8 Stream"	2025-09-01 11:30:40 +00:00
Zuul	e700b18f2b	Merge "Replace remaining usage of Ubuntu Jammy"	2025-09-01 11:30:28 +00:00
Zuul	8ddf918a0b	Merge "[test]RPC using threading or eventlet selectively"	2025-09-01 10:11:38 +00:00
Zuul	023c1eab47	Merge "Run unit test with threading mode"	2025-09-01 10:11:11 +00:00
Zuul	29eaf28acc	Merge "Update min support for Flamingo"	2025-08-31 18:13:06 +00:00
Zuul	4301fc390e	Merge "api: Fix validators for hw:cpu_max_* extra specs"	2025-08-31 18:12:45 +00:00
Takashi Kajinami	583d88308f	Replace remaining usage of Ubuntu Jammy Ubuntu Jammy is no longer supported since 2025.2 . Replace it by Ubuntu Noble which is used in the other jobs. Change-Id: I790fb06ede2c41cb80b3d2e8ff7faa7315c84016 Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>	2025-08-31 16:36:44 +09:00
Zuul	7b8e054bd2	Merge "api: Correct expected errors"	2025-08-29 21:12:29 +00:00
Takashi Kajinami	79846eb0d0	libvirt: Disable VMCoreInfo device for SEV-encrypted instances When VMCoreInfo device is enabled, the QEMU fw_cfg device in guest OS requires DMA between host OS and guest OS through the device. However DMA is prohibited when guest memory is encrypted using SEV, and the attempt results in kernel crash. Do not add VMCoreInfo when memory encryption is enabled. Closes-Bug: #2117170 Change-Id: I05c7b1ae46ccd8d9aa42456b493ac6ee7ddd8bae Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>	2025-08-29 21:19:10 +09:00
Zuul	07ab08aa69	Merge "Allow to start unit test without eventlet"	2025-08-29 04:57:32 +00:00
Takashi Kajinami	87385d2411	Follow-up of AMD SEV-ES support Address a few improvements we agreed to cover in follow-ups. Also fix a few problems detected during the code update. - Fix SEV-ES rp not purged when SEV and SEV-ES are disabled at the same time. The previous logic requires 2 cycles which is not necessary. - Fix the lack of NOKS policy in SEV-ES. Change-Id: I59866d39fcc6720e338c6736dffab4fd56b853da Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>	2025-08-29 13:54:19 +09:00
Zuul	dcf90dbb25	Merge "Ask for pre-prod testing for native threading"	2025-08-29 04:35:24 +00:00
Zuul	ce9dcea024	Merge "Purge nested SEV RPs when SEV is disabled"	2025-08-28 23:27:04 +00:00
Zuul	c6aa3a9fa9	Merge "Add functional test scenario for mixed SEV RPs"	2025-08-28 23:25:14 +00:00
Zuul	32d76d08cb	Merge "libvirt: Launch instances with SEV-ES memory encryption"	2025-08-28 23:24:30 +00:00
Zuul	f4ca2e3ef9	Merge "Add hw_mem_encryption_model image property"	2025-08-28 21:03:27 +00:00
Zuul	d5134798de	Merge "Detect AMD SEV-ES support"	2025-08-28 20:36:36 +00:00
Zuul	a5670dc442	Merge "Migrate MEM_ENCRYPTION_CONTEXT from root provider"	2025-08-28 20:36:20 +00:00
Takashi Kajinami	a8386bdab3	Purge nested SEV RPs when SEV is disabled We can determine exact names of these RPs using the compute node name, independently from how nova is configured. So we can easily purge these PRs. Change-Id: I0a18e3a3750137061e04765f2feaf4889c6f5606 Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>	2025-08-28 08:50:42 +09:00
Takashi Kajinami	af287b71c4	Add functional test scenario for mixed SEV RPs As a follow-up of change Iad51c32d0f64ef52513bd2f2b517c91f29c63787 , add a functional test scenario to ensure that new instances can be created even when a cluster has both a compute node with old SEV RP and the other with reshaped SEV RP, to simulate the real world upgrade scenario in existing cluster with SEV feature enabled. Change-Id: I2c576f8de05b69ab51743db53acf52bc2a35eb59 Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>	2025-08-28 08:50:15 +09:00
Takashi Kajinami	4f5a3f3c00	libvirt: Launch instances with SEV-ES memory encryption This is the last piece to allow users to request AMD SEV-ES for memory encryption instead of AMD SEV. The CPU feature for memory encryption can now be requested via the hw:mem_encryption_model flavor extra spec or via the hw_mem_encryption_model image property. Implements: blueprint amd-sev-es-libvirt-support Change-Id: Ifc9b86ad7db887cc22b2cd252fe8adc81fdc29c6 Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>	2025-08-28 08:47:49 +09:00
Takashi Kajinami	dc6641baad	Add hw_mem_encryption_model image property This is prep work to support launching instances with AMD SEV-ES memory encryption and adds the object field to select the CPU feature to encrypt and protect memory data of instances. Partially-Implements: blueprint amd-sev-es-libvirt-support Change-Id: I71fde5438d4e22c9e2566f8a684c5a965a7f3dd3 Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>	2025-08-28 08:47:49 +09:00
Takashi Kajinami	6c0a689d80	Detect AMD SEV-ES support Detect AMD SEV-ES support by kernel/qemu/libvirt and generate a nested RP for ASID slots for SEV-ES under the compute node RP. Deprecate the [libvirt] num_memory_encryption_guests option because the option is effective only for SEV, and now the maximum numbers for SEV/SEV-ES guests can be detected by domain capabilities presented by libvirt. Note that creating an instance with memory encryption enabled now requires AMD SEV trait, because these instances can't run with SEV-ES slots, which are added by this change. Partially-Implements: blueprint amd-sev-es-libvirt-support Change-Id: I5968e75325b989225ed1fc6921257751ae227a0b Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>	2025-08-28 08:47:45 +09:00
Ghanshyam Maan	f914cb185c	Add service role in Nova policy RBAC community wide goal phase-2[1] is to add 'service' role for the service APIs policy rule. This commit defaults the service APIs to 'service' role. This way service APIs will be allowed for service user only. Tempest tests also modified to simulate the service-to-service communication. Tempest tests send the user with service role to nova API. - https://review.opendev.org/c/openstack/tempest/+/892639> Partial implement blueprint policy-service-role-default [1] https://governance.openstack.org/tc/goals/selected/consistent-and-secure-rbac.html#phase-2 Change-Id: I1565ea163fa2c8212f71c9ba375654d2aab28330 Signed-off-by: Ghanshyam Maan <gmaan@ghanshyammann.com>	2025-08-27 19:34:04 +00:00
Balazs Gibizer	ea50365cce	Do not yield in threading mode If a service runs in threading mode nova.utils.cooperative_yield is noop as yielding is only necessary for eventlet. Change-Id: I72a52262f5c501f77d23ed56cbcd1a9c2be72fa7 Signed-off-by: Balazs Gibizer <gibi@redhat.com>	2025-08-27 19:03:34 +02:00
Balazs Gibizer	350cdd1b5e	[CI]Make nova-tox-py312-threading voting Change-Id: I6a220d03f7c879af0d714740102b2d84ce61ca69 Signed-off-by: Balazs Gibizer <gibi@redhat.com>	2025-08-27 19:03:34 +02:00
Balazs Gibizer	1318cd48a1	[test]RPC using threading or eventlet selectively The nova test hardcoded to run the RPC servers in the test with eventlet executor. We change that to be dynamic based on how the tests was started it can use eventlet or threading. This makes some of the so far hanging RPC dependent unit tests passing. Change-Id: I5012122fe66d41459b68202e750391a1939d70d9 Signed-off-by: Balazs Gibizer <gibi@redhat.com>	2025-08-27 19:03:30 +02:00
Balazs Gibizer	83eed99a9f	Run unit test with threading mode The py312-threading tox target will run the currently working unit tests with threading mode. We have an exclude list, those tests are failing or hanging. Also the current test list might still have unstable tests. This also adds a non voting zuul job to run the new target. Change-Id: Ibf41fede996fbf2ebaf6ae83df8cfde35acb2b7e Signed-off-by: Balazs Gibizer <gibi@redhat.com>	2025-08-27 19:01:35 +02:00
Balazs Gibizer	b278240370	Allow to start unit test without eventlet The end goals is to be able to run at least some of the unit tests without eventlet. But there are things preventing that for now. We need to make sure that the oslo.sevice backed is not initialized to eventlet by any early import code before our monkey_patch module can do the selective backed selection based on the env variable. The nova.tests.unit module had some import time code execution that is forcing imports that initialize the oslo.service backend too early, way before nova would do it in normal execution. We could remove objects.register_all() from nova/tests/unit/__init__.py as it seems tests are passing without it. Still that would not be enough so I eventually decide to keep it. The other issue is that the unit test discovery imports all modules under nova.tests.unit and that eventually imports oslo.messaging and that also forces oslo.service backend selection. So we injected an early call to our smart monkey_patch module to preempt that. This does not change the imported modules as monkey_patch module imported anyhow via nova.test module. Just changed the order to allow oslo.service backend selection explicitly. After this patch the unit test can be run via OS_NOVA_DISABLE_EVENTLET_PATCHING=true tox -e py312 Most of the test will pass but there are a bunch of test timing out or hanging. Change-Id: I210cb6a30deaee779d55f88f0f57584c65b0dc05 Signed-off-by: Balazs Gibizer <gibi@redhat.com>	2025-08-27 18:54:26 +02:00

1 2 3 4 5 ...

61661 Commits