Commit Graph

2476 Commits

Author SHA1 Message Date
Johannes Kulik 2a51df2760 Attaching a volume returns HTTP 202
Instead of returning an HTTP 200 and a `volumeAttachment` object,
attaching a volume to an instance returns HTTP 202 starting from API
version 2.101.

To keep the functionality for older API versions, we move the
`_attach_volume()` method from n-api to n-conductor and either do a call
or a cast depending on whether the API needs to return a value.
n-conductor then handles reserving the block_device_mapping's device by
calling n-compute before it starts the previously-already-async volume
attachment.

We have to move `_check_attach_and_reserve_volume` into compute utils,
because it's getting called in both conductor and compute api (for the
shelved offloaded attach).

The new RPC method in the conductor needs a long timeout when used with
API versions less than the new 2.101, because it waits for the call to
`reserve_block_device_name()` in nova-compute which already needs a long
timeout.

Updating the functional tests' `post_server_volume()` and
`_attach_volume()` to not return the attachment anymore is possible, as
no test uses the returned values.

Change-Id: I4d38c2679f0e88cca30055a9c8c45ba1dd6fb5ef
Implements: blueprint async-volume-attachments
Signed-off-by: Johannes Kulik <johannes.kulik@sap.com>
2026-02-18 15:02:21 +01:00
Nicolai Ruckel 35b1945522 Preserve UEFI NVRAM variable store
Preserve NVRAM variable store during stop/start, hard reboot, live
migration, and volume retype.

This does not affect cold migration or shelve.

For UEFI guests (hw_firmware_type=uefi), every time the instance is
started, the UEFI variable storage for that instance
(/var/lib/libvirt/qemu/nvram/instance-xxxxxxxx_VARS.fd) is deleted
and reinitialized from the default template.

The changes are based on this patch by Jonas Schäfer to preserve the
vTPM state:
https://review.opendev.org/c/openstack/nova/+/955657

Closes-Bug: #1633447
Closes-Bug: #2131730
Change-Id: I444a9285c07a04bf08a73772235f8dd73d75e513
Signed-off-by: Nicolai Ruckel <nicolai.ruckel@cloudandheat.com>
2026-02-13 23:55:41 +01:00
Sean Mooney 264e868d49 Support os-vif TAP pre-creation for OVS/OVN ports
Add support for os-vif TAP device pre-creation when Neutron sets
the 'ovs_create_tap' flag in vif_details. This reduces live
migration downtime by ensuring the network is fully wired before
the VM starts.

Changes:
- Add VIF_DETAILS_OVS_CREATE_TAP constant to model.py
- Propagate create_tap from binding details to os-vif port profile
  in os_vif_util.py
- Set managed='no' in libvirt XML when create_tap is enabled so
  libvirt uses the pre-created TAP device
- Set multiqueue on port profile in _plug_os_vif based on instance
  flavor/image hw:vif_multiqueue_enabled property

When checking oslo.versionedobjects fields for backward compat:
- Use 'field in obj.fields' to check if field exists in schema
- Use 'field in obj' to check if field value is set

Depends-On: https://review.opendev.org/c/openstack/os-vif/+/971231
Generated-By: Cursor claude-opus-4.5
Closes-Bug: #2069718
Change-Id:  I32343658b53e317696d1bd8b984793bfeeccd409
Signed-off-by: Sean Mooney <work@seanmooney.info>
2026-02-05 18:55:06 +00:00
Sean Mooney c8d34ed3dc Fix blockio generation for LUN volumes
QEMU's scsi-block device driver does not support physical_block_size
and logical_block_size properties. When Cinder reports disk geometry
for LUN volumes, Nova was incorrectly including a <blockio> element
in the libvirt XML, causing QEMU to fail with:

    Property 'scsi-block.physical_block_size' not found

This fix adds a check to skip blockio generation when source_device
is 'lun', following the existing pattern used for serial at line 1356.

Generated-By: claude-code (Claude Opus 4.5)
Closes-Bug: #2127196
Change-Id: Idf87e936edd97aac719222942c9842a9aca4c270
Signed-off-by: Sean Mooney <work@seanmooney.info>
2026-02-03 22:15:19 +00:00
Zuul 7579dbdf0e Merge "Use *_OR_ADMIN policy defaults for server shares" 2026-01-23 05:00:53 +00:00
Zuul 8fe5d3ce75 Merge "Faults from cell DB missing in GET /servers/detail" 2026-01-23 05:00:40 +00:00
elajkat 76d64b9cb4 blueprint: iothreads-for-instances
Enable one io-thread per qemu instance.

Related-Bug: iothreads-for-instances
Change-Id: I8b22e5bca560d111934fbdf67494a4e288b9e50a
Signed-off-by: lajoskatona <lajos.katona@est.tech>
2026-01-19 16:17:47 +01:00
Zuul 68cec593a7 Merge "Compute manager to use thread pools selectively" 2026-01-16 21:03:28 +00:00
Balazs Gibizer 3c23390cc8 Compute manager to use thread pools selectively
This changes the thread pool usage of the ComputeManager to go through
the concurrency mode aware util functions.

The concurrent live migration pool had a seemingly unlimited option
when configured with value 0, but in reality GreenThreadPool has a
default worker size of 1000. In reality it is almost never right to
have more than one live migration running concurrently. Also with
native threading having 1000 worker is just too costly. So we
decided to deprecate the value 0 and changed the implementation of
unlimited to mean 5 threads in native threading mode. We kept the 1000
greenthread in eventlet mode for backward compatibility.

The _sync_power_states periodic task also spawn tasks for each instance
to be synced. As it uses a shared data structure across these tasks
and the caller a lock is needed to avoid race conditions.
Also the default pool size is 1000 for these tasks in our configuration.
That would use a lot of memory on a busy host in native threading mode.
So we changed the default value from 1000 to 5.

Change-Id: I9567d5fabdf086b5d0493103d9f6bde4f66af387
Signed-off-by: Balazs Gibizer <gibi@redhat.com>
2026-01-16 09:47:42 +01:00
Zuul 80753c5745 Merge "Upgrade note for concurrency mode default change" 2026-01-14 21:23:21 +00:00
Balazs Gibizer f73a23b4d4 Upgrade note for concurrency mode default change
This is a follow up for the release notes added in the commit
35207ee8b5 that changed the default mode
for the scheduler and the API services. At that time we missed to note
the upgrade impact of such change. So this patch extends the reno with
an upgrade note.

Change-Id: I280e7eb9c1da6eeaf50e96e8b19e296961f2651a
Signed-off-by: Balazs Gibizer <gibi@redhat.com>
2026-01-14 13:29:07 +01:00
Zuul 88c538a897 Merge "libvirt: Skip unsupported firmware types" 2026-01-06 12:01:12 +00:00
Ivaylo Mitev fb661ec597 Faults from cell DB missing in GET /servers/detail
Field  is empty in the response of API GET /servers/detail if the
instance (hence instace_faults DB entry) is in nova cell DB.
Unlike that, for API /servers/:id fault is retrieved correctly no matter
in which nova cell the instance belongs.

Closes-Bug: #1856329
Change-Id: I1726f53cfeac0a67a5dacdddda2af2cc1db0af0f
Signed-off-by: Marius Leustean <marius.leustean@sap.com>
2025-12-17 11:51:38 +02:00
Zuul f268b385dd Merge "Use consistent program name for wsgi scripts and entry points" 2025-12-08 22:18:34 +00:00
Zuul 1712ae48e3 Merge "libvirt: add configuration option for volume AIO mode" 2025-12-05 05:20:30 +00:00
Zuul 5d3d0c870a Merge "ensure correct cleanup of multi-attach volumes" 2025-12-04 07:00:30 +00:00
Takashi Kajinami 253aaec4bb Use consistent program name for wsgi scripts and entry points
Make sure that the consistent program name is always set,so that
the same config sub-directory ( /etc/{project}/{prog}.conf.d ) is used
regardless of the way api service is run.

Closes-Bug: #2098514
Change-Id: Ib5c6d431176b83eefafddc1b35589015db6dfd04
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
2025-12-02 02:57:25 +09:00
Takashi Kajinami d2188b9e6b libvirt: Skip unsupported firmware types
Ignore (1) stateless mode firmware and (2) memory device firmware which
do not include a few core keys such as nvram-template. This is
a temporal (and backportable) workaround until firmware detection using
libvirt's internal feature is implemented by [1]

[1] https://blueprints.launchpad.net/nova/+spec/libvirt-firmware-auto-selection

Closes-Bug: #2122288
Change-Id: I99bc36fdd5df816c9ae374db71e4734fb7fc467b
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
2025-11-30 02:24:32 +09:00
Jay Faulkner 56cb5f52fb [ironic] Ensure unprovision happens for new states
States were added to the Ironic API to enable the node servicing
feature, which can be performed on nodes provisioned with Nova
instances. Current nova, if asked to delete these instances, will only
remove the instance metadata and not tear them down.

This change has two parts:
- I have added the new, relevant states to _UNPROVISION_STATES in
  driver.py, which now allows Nova to know that SERVIC* states and
  DEPLOYHOLD are safe to unprovision from.
- I have added all existing ironic states to ironic_states.py and the
  PROVISION_STATE_LIST constant and check the state against it -- in a
  case where a completely unknown state is returned, we should attempt
  an unprovision.

This fix needs to be backported as far as possible, as this bug has
existed since Antelope / 2023.1 (DEPLOYHOLD) or Bobcat / 2023.3
(SERVIC*).

Assisted-by: Claude Code
Closes-bug: #2131960
Change-Id: I31c70d35b0e6e9f8d2252bfb2f0bdec477cc6cc7
Signed-off-by: Jay Faulkner <jay@jvf.cc>
2025-11-20 15:23:58 -08:00
René Ribaud f017e23b81 Use *_OR_ADMIN policy defaults for server shares
Update the server shares API policies to use
PROJECT_READER_OR_ADMIN and PROJECT_MEMBER_OR_ADMIN instead of
PROJECT_READER and PROJECT_MEMBER.

This aligns the server shares policies with other compute API
policies and ensures administrators can list, attach, show and
detach shares regardless of project policy overrides.

Signed-off-by: René Ribaud <rene.ribaud@gmail.com>
Change-Id: I2b237d56b08e3080475dc500e204298018af29c7
2025-11-20 15:15:00 +01:00
melanie witt c5c1b93d21 libvirt: add configuration option for volume AIO mode
With the NFS, FC, and iSCSI Cinder volume backends, Nova explicitly
sets AIO mode ``io=native`` in the Libvirt guest XML. Operators may set
this option to True in order to defer AIO mode selection to QEMU if
forcing ``io=native`` is not desired.

Closes-Bug: #2129788

Change-Id: I6e51706b5cb8be5becebbafe9108df1ba9e0f69f
Signed-off-by: melanie witt <melwittt@gmail.com>
2025-11-19 12:04:31 -08:00
Zuul 0c33871c36 Merge "Add managed='no' flag to libvirt XML definition for VIF type TAP" 2025-11-18 14:57:17 +00:00
Sean Mooney 22012360c4 ensure correct cleanup of multi-attach volumes
If a host has multiple instance with the same shared
multi attach volume and you delete them in parallel
nova need to correctly clean up the volume connection on
the host when the last instance is removed.

currently we do not have a volume level lock to guard the
critical section that determins if the current disconnect is
removing the final usage of the volume.

This can lead to leaking the volume or other issues as
noted in bug: #2048837

This change introduces a FairLockGuard to ensure we acquire
and release the locks in a fair and orderd manner.
The FairLockGuard is used to lock the server delete with
one lock per multi attach volume.

This will ensure that disconnects of diffrent volumes can happen
in parallel but if we are disconnecting the same volume in multiple
greenthread concurrently they will be serialised.

Assisted-By: Cursor Auto
Closes-Bug: #2048837
Change-Id: I67e10cace451259127a5d7da8fbdf7739afe3e51
Signed-off-by: Sean Mooney <work@seanmooney.info>
2025-11-17 13:26:08 +00:00
Zuul 68a0a69c33 Merge "Allow to perform parallel live migrations" 2025-11-07 22:36:34 +00:00
Dmitriy Rabotyagov 25fbf32f22 Allow to perform parallel live migrations
This patch implements parallel live migrations for libvirt driver.
It is achieved through introduction of new configuration parameter
`live_migration_parallel_connections`.

This allows to eliminate bottleneck on live migration speed by
establishing multiple connections for memory transition, thus
leveraging multi-threaded behavior in QEMU.

Implements-blueprint: libvirt-parallel-migrate
Change-Id: I98ff5f07f94d94f3aa0227591f425d532773adb0
Signed-off-by: Dmitriy Rabotyagov <dmitriy.rabotyagov@cleura.com>
2025-11-07 07:17:54 -08:00
Balazs Gibizer 35207ee8b5 Default native threading for sch, api and metadata
This patch switches the default concurrency mode to native threading
for the services that gained native threading support in Flamingo:
nova-scheduler, nova-api, and nova-metadata.

The OS_NOVA_DISABLE_EVENTLET_PATCHING env variable still can be used to
explicitly switch the concurrency mode to eventlet by

  OS_NOVA_DISABLE_EVENTLET_PATCHING=false

We also ensure that the cover, docs, py3xx and functional tox targets
are still running with eventlet while py312-threading kept running
with native threading.

Change-Id: I86c7f31f19ca3345218171f0abfa8ddd4f8fc7ea
Signed-off-by: Balazs Gibizer <gibi@redhat.com>
2025-11-06 19:42:24 +01:00
Nell Jerram 6aba55a23f Add managed='no' flag to libvirt XML definition for VIF type TAP
libvirt 9.5.0 and later by default doesn't allow using a pre-created
TAP device; instead it expects to create and manage the TAP device
itself, which is incompatible with how Nova works.  To restore
compatibility with Nova we need to add the managed="no" flag to the
target device section in the XML domain file.

The libvirt change is here[1].  In particular it breaks Calico for
OpenStack, because the Calico plugin (out of tree[2]) uses VIF type TAP.

1. https://github.com/libvirt/libvirt/commit/a2ae3d299cf
2. https://github.com/projectcalico/calico/blob/master/networking-calico/networking_calico/plugins/ml2/drivers/calico/mech_calico.py#L217

Many thanks to Masahito Muroi <masahito.muroi@linecorp.com> for
proposing an earlier version of this fix.

Closes-Bug: #2033681
Change-Id: I4a7b4ecf69cfe04c5291e5ca2a76db8829d6e592
Signed-off-by: Nell Jerram <nell@tigera.io>
2025-11-06 12:29:11 +00:00
Zuul 9e5ad07aee Merge "setup: Remove pbr's wsgi_scripts" 2025-11-05 11:15:52 +00:00
Stephen Finucane 5da2dc2060 setup: Remove pbr's wsgi_scripts
This is technical dead end and not something we're going to be able to
support long-term in pbr. We need to push users away from this. Doing so
highlights quite a few place where our docs need some work, particularly
in light of the recent removal of the eventlet servers.

Change-Id: I2ffaed710fac2612f5337aca5192af15eab46861
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
2025-11-04 16:11:50 +00:00
Johannes Kulik 710ffbb0c5 api: Pre-query not deleted members in server groups
When retrieving multiple - or all - server groups, the code tries to
find not deleted members for each server group in every cell
individually. This is highly inefficient, which is especially noticable
when the number of server groups rises.

We change this to query all members of all server-groups we will reply
with (i.e. from the already limited list) in advance and pass this set
of existing uuids into the function formatting the server group. This is
more efficient, because we only do one large query instead of up to 1000
times the number of cells.

Change-Id: I3459ce7a8bec9a9e6f3a3b496a3e441078b86af0
Signed-off-by: Johannes Kulik <johannes.kulik@sap.com>
Partial-Bug: #2122109
2025-11-03 11:46:43 +01:00
Zuul 6d5cf6845e Merge "Fix fill_metadata usage for the ImagePropertiesWeigher" 2025-10-16 23:56:01 +00:00
Sylvain Bauza 98885344bd Fix fill_metadata usage for the ImagePropertiesWeigher
When using the weigher, we need to target the right cell context for the
existing instances in the host.
fill_metadata was also having an issue as we need to pass the dict value
from the updated dict by keying the instance uuid, not the whole dict of
updated instances.

Change-Id: I18260095ed263da4204f21de27f866568843804e
Closes-Bug: #2125935
Signed-off-by: Sylvain Bauza <sbauza@redhat.com>
2025-10-16 11:09:45 +02:00
Zuul cc742602bc Merge "Run nova-conductor in native threading mode" 2025-10-02 15:55:16 +00:00
Balazs Gibizer ec426532c3 Run nova-conductor in native threading mode
Previous patches removed direct eventlet usage from nova-conductor so
now we can run it with native threading as well. This patch documents
the possibility and switches both nova-conductor process to native
threading mode in the nova-next job.

Change-Id: If26c0c7199cbda157f24b99a419697ecb6618fa6
Signed-off-by: Balazs Gibizer <gibi@redhat.com>
2025-09-22 10:17:39 +00:00
Julien Le Jeune dc51a4271b nova-conductor puts instance in error state
Nova-conductor puts instance in error if an unknown exception is raised
in the _build_live_migrate_task during the live-migration. [1]
The exception comes from _call_livem_checks_on_host and we can see raise
exception.MigrationPreCheckError if we face to
messaging.MessagingTimeout exception for example. [2]
The function check_can_live_migrate_destination does a check also on source
host with check_can_live_migrate_source [3] and this check can also
return exceptions like MessagingTimeout and this one is not caught properly
because it's a remote "Remote error: MessagingTimeout" due to dest host try to
contact source host and this source host not reply.

[1] https://github.com/openstack/nova/blob/master/nova/conductor/manager.py#L523
[2] https://github.com/openstack/nova/blob/master/nova/conductor/tasks/live_migrate.py#L381
[3] https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L9090

Closes-Bug: #2044235
Change-Id: Ie1f96fee743c235ab35113a9ad1549a67b975839
Signed-off-by: Julien Le Jeune <julien.le-jeune@ovhcloud.com>
2025-09-15 16:41:01 +02:00
Zuul 87bf7700b8 Merge "reno: Update master for unmaintained/2023.1" 2025-09-12 10:55:00 +00:00
OpenStack Release Bot 71607ef8a5 Update master for stable/2025.2
Add file to the reno documentation build to show release notes for
stable/2025.2.

Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/2025.2.

Sem-Ver: feature
Change-Id: I7d967c1d5b1ac7fa2e601acfa25c3b5c3880056e
Signed-off-by: OpenStack Release Bot <infra-root@openstack.org>
Generated-By: openstack/project-config:roles/copy-release-tools-scripts/files/release-tools/add_release_note_page.sh
2025-09-12 08:54:07 +00:00
Zuul 759e03c35d Merge "Add Flamingo prelude section" 2025-09-11 09:03:15 +00:00
Zuul 36c63f1664 Merge "hypervisors: Optimize uptime retrieval for better performance" 2025-09-10 11:36:59 +00:00
René Ribaud 45ddbc2569 Add Flamingo prelude section
Shamelessly copied from the cycle highlights

Signed-off-by: René Ribaud <rribaud@redhat.com>
Change-Id: Ib9de63fe4ccce24921326ef3bcfc690fd4481687
2025-09-10 10:39:44 +02:00
Sean Mooney 567dbe1867 hypervisors: Optimize uptime retrieval for better performance
The /os-hypervisors/detail API endpoint was experiencing significant
performance issues in environments with many compute nodes when using
microversion 2.88 or higher, as it made sequential RPC calls to gather
uptime information from each compute node.

This change optimizes uptime retrieval by:

* Adding uptime to periodic resource updates sent by nova-compute to the
  database, eliminating synchronous RPC calls during API requests
* Restricting RPC-based uptime retrieval to hypervisor types that support
  it (libvirt and z/VM), avoiding unnecessary calls that would always fail
* Preferring cached database uptime data over RPC calls when available

Closes-Bug: #2122036
Assisted-By: Claude <noreply@anthropic.com>
Change-Id: I5723320f578192f7e0beead7d5df5d7e47d54d2b
Co-Authored-By: Sylvain Bauza <sbauza@redhat.com>
Signed-off-by: Sean Mooney <work@seanmooney.info>
2025-09-05 19:03:38 +01:00
Zuul 0dd7cb1fb0 Merge "libvirt: Disable VMCoreInfo device for SEV-encrypted instances" 2025-09-05 16:32:24 +00:00
Takashi Kajinami 79846eb0d0 libvirt: Disable VMCoreInfo device for SEV-encrypted instances
When VMCoreInfo device is enabled, the QEMU fw_cfg device in guest OS
requires DMA between host OS and guest OS through the device. However
DMA is prohibited when guest memory is encrypted using SEV, and
the attempt results in kernel crash.

Do not add VMCoreInfo when memory encryption is enabled.

Closes-Bug: #2117170
Change-Id: I05c7b1ae46ccd8d9aa42456b493ac6ee7ddd8bae
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
2025-08-29 21:19:10 +09:00
Zuul dcf90dbb25 Merge "Ask for pre-prod testing for native threading" 2025-08-29 04:35:24 +00:00
Zuul 32d76d08cb Merge "libvirt: Launch instances with SEV-ES memory encryption" 2025-08-28 23:24:30 +00:00
Zuul d5134798de Merge "Detect AMD SEV-ES support" 2025-08-28 20:36:36 +00:00
Takashi Kajinami 4f5a3f3c00 libvirt: Launch instances with SEV-ES memory encryption
This is the last piece to allow users to request AMD SEV-ES for memory
encryption instead of AMD SEV. The CPU feature for memory encryption
can now be requested via the hw:mem_encryption_model flavor extra spec
or via the hw_mem_encryption_model image property.

Implements: blueprint amd-sev-es-libvirt-support
Change-Id: Ifc9b86ad7db887cc22b2cd252fe8adc81fdc29c6
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
2025-08-28 08:47:49 +09:00
Takashi Kajinami 6c0a689d80 Detect AMD SEV-ES support
Detect AMD SEV-ES support by kernel/qemu/libvirt and generate a nested
RP for ASID slots for SEV-ES under the compute node RP.

Deprecate the [libvirt] num_memory_encryption_guests option because
the option is effective only for SEV, and now the maximum numbers for
SEV/SEV-ES guests can be detected by domain capabilities presented by
libvirt.

Note that creating an instance with memory encryption enabled now
requires AMD SEV trait, because these instances can't run with SEV-ES
slots, which are added by this change.

Partially-Implements: blueprint amd-sev-es-libvirt-support
Change-Id: I5968e75325b989225ed1fc6921257751ae227a0b
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
2025-08-28 08:47:45 +09:00
Ghanshyam Maan f914cb185c Add service role in Nova policy
RBAC community wide goal phase-2[1] is to add 'service'
role for the service APIs policy rule. This commit
defaults the service APIs to 'service' role. This way
service APIs will be allowed for service user only.

Tempest tests also modified to simulate the service-to-service
communication. Tempest tests send the user with service
role to nova API.
- https://review.opendev.org/c/openstack/tempest/+/892639>

Partial implement blueprint policy-service-role-default

[1] https://governance.openstack.org/tc/goals/selected/consistent-and-secure-rbac.html#phase-2

Change-Id: I1565ea163fa2c8212f71c9ba375654d2aab28330
Signed-off-by: Ghanshyam Maan <gmaan@ghanshyammann.com>
2025-08-27 19:34:04 +00:00
Balazs Gibizer 2a9cbdabce Ask for pre-prod testing for native threading
This patch refines our logging, doc, and release notes about the native
threading mode of scheduler, api, and metadata services to ask for
pre-prod testing before enabled in production.

Change-Id: I04bbb3d7e4664a0cab8b30f4c34ee71774536353
Signed-off-by: Balazs Gibizer <gibi@redhat.com>
2025-08-27 18:46:31 +02:00