Commit Graph

2506 Commits

Author SHA1 Message Date
Zuul c33df9dc4b Merge "Support multiple allocations for vGPUs" 2026-03-11 04:44:04 +00:00
Sylvain Bauza 55a36f8f6a Support multiple allocations for vGPUs
Removing the TODO that only allows one VGPU allocation per instance. Now we no
longer need to support the very old VGPU usage for the root provider, this
is easy.

Change-Id: I48d2b700049c81071710e37c05579239255c3539
Related-Bug: #1758086
Signed-off-by: Sylvain Bauza <sbauza@redhat.com>
2026-03-10 11:27:49 +01:00
Zuul fa64fa82bf Merge "return error about external network to the user on build failure" 2026-03-07 06:17:54 +00:00
Doug Goldstein 56c4a69ba6 return error about external network to the user on build failure
Instead of masking the error message with "Failure prepping block
device" when the user requests an invalid configuration, return the
actual error message to the user.

Closes-Bug: 2137673
Change-Id: If12555da64ccba2649a19ee6ccbdac0e888e6ad6
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
2026-03-03 08:08:26 -06:00
Zuul 7a6e13454d Merge "conf: Deprecate AggregateImagePropertiesIsolation opts" 2026-03-01 17:53:08 +00:00
Zuul 66dc4174ff Merge "api: Deprecate os-volumes_boot API" 2026-03-01 03:18:06 +00:00
Zuul 685fd25cc1 Merge "api: Restrict additional query string arguments" 2026-02-28 19:22:07 +00:00
Zuul 9e924c888a Merge "api: Remove dead fields from flavors response" 2026-02-28 02:07:05 +00:00
Zuul 47080f7457 Merge "TPM: bump service version to enable live migration" 2026-02-27 18:21:33 +00:00
Stephen Finucane da0482aad8 api: Deprecate os-volumes_boot API
If > 2.103, return a HTTP 404 (Not Found). Otherwise, proxy through to
the ServersController.

Change-Id: Ic6b487316bb1fbf2cf57de5d8e6aabf06f0cdf52
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
2026-02-26 21:39:26 +00:00
Stephen Finucane 9c8d51fa0c api: Restrict additional query string arguments
All APIs except the root version APIs now use strict query string
parsing. A test is added to ensure same.

A couple of tests need to be updated since they were using the wrong
path: while the path is ignored when calling the controllers directly,
the query strings are not.

Change-Id: I6dcb5b8f1f865df8f6b17cd7f0d730c3bdff241e
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
2026-02-26 21:39:26 +00:00
Stephen Finucane b95a2c5219 api: Remove dead fields from flavors response
Change-Id: I65be4f2e522c9f73a28b8837d7937a371d3e73d3
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
2026-02-26 21:39:10 +00:00
Stephen Finucane e73a0bc84b api: Add ability to filter flavors by name
Change-Id: I0d51d29339d1380b93ccb1501e33891082f930ec
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
2026-02-26 20:23:15 +00:00
Zuul 889e3d83f6 Merge "Attaching a volume returns HTTP 202" 2026-02-26 20:04:37 +00:00
melanie witt 2919e41560 TPM: bump service version to enable live migration
Live migration of TPM instances is enabled only when the entire cloud
has been upgraded.

Related to blueprint vtpm-live-migration

Change-Id: I718d8ad48b82336562a880467c3c7b12b1fb3512
Signed-off-by: melanie witt <melwittt@gmail.com>
2026-02-26 10:24:18 -08:00
Zuul f5579e9ccc Merge "Prepare resize/cold migration for graceful shutdown" 2026-02-26 17:45:45 +00:00
Zuul 44a7c5c2b0 Merge "Use 2nd RPC server in compute operations" 2026-02-26 17:44:59 +00:00
Zuul fbfc44f73b Merge "Run nova-compute in native threading mode" 2026-02-26 17:44:44 +00:00
Zuul d55f0ce38d Merge "[compute]Use single long task executor" 2026-02-26 16:59:48 +00:00
Zuul 6ae5459351 Merge "Deprecate unlimited compute actions" 2026-02-26 16:59:31 +00:00
Stephen Finucane fcbedce558 conf: Deprecate AggregateImagePropertiesIsolation opts
The 'AggregateImagePropertiesIsolation' scheduler filter allows users to
filter host aggregates by comparing aggregate and image metadata. The
'[filter_scheduler] aggregate_image_properties_isolation_namespace' and
'[filter_scheduler] aggregate_image_properties_isolation_separator'
options purport to allow users to specify a prefix to use for both the
aggregate and image metadata keys, allowing users to do e.g.:

  openstack image set --property customized.os_type=linux $IMAGE
  openstack aggregate set --property customized.os_type=windows $AGG1
  openstack aggregate set --property customized.os_type=linux $AGG2

However, as noted in change If7245a90711bd2ea13095ba26b9bc82ea3e17202,
this is no longer possible since we introduced the 'ImageMetaProps' o.vo
in Liberty and promptly lost the ability to see any non-o.vo image
metadata properties from glance.

There's a possibility, however slight, that some people are using
namespaces that match actual nova namespaces such as 'hw' and a
separator of '_', but those will continue to work just fine. Setting
anything else will result in the scheduler filter failing since the
image property will always appear to be absent. As a result, these could
be outright removed rather than deprecated. We choose to deprecate just
so people can see the warnings.

Change-Id: Ide763d75e42427a9df3673313895ef47b8727802
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
2026-02-26 11:09:41 +00:00
Ghanshyam Maan 996c4ff9e8 Prepare resize/cold migration for graceful shutdown
During graceful shutdown, compute service keep a 2nd RPC
server active which can be used to finish the in-progress
operations. Like live migration, resize and cold migrations
also perform RPC call among source and destination compute.
For those operation also, we can use 2nd RPC server and make
sure they will be completed during graceful shutdown.

A quick overview of what all RPC methods are involved in the
resize/cold migration and what all will be using 2nd RPC server:

Resize/cold migration
- prep_resize: No, resize/migration is not started yet.
- resize_instance: Yes, here the resize/migration starts.
- finish_resize: Yes
- cross cell resize case:
  - prep_snapshot_based_resize_at_dest: NO, this is initial check and
    migration is not started
  - prep_snapshot_based_resize_at_source: Yes, this start the migration

Confirm resize: NO
- confirm_resize: NO
- cross cell confirm resize case:
  - confirm_snapshot_based_resize - NO

Revert resize:
- revert_resize - NO
- check_instance_shared_storage: YES. This is called from dest to source
  so we need source to respond to it so that revert can continue.
- finish_revert_resize on source- YES, at this stage, revert resize is
  in progress and abandoning it here can lead migration to unreocverable
  state.
- cross cell revert case:
  - revert_snapshot_based_resize_at_dest: NO
  - finish_revert_snapshot_based_resize_at_source: YES

Partial implement blueprint nova-services-graceful-shutdown-part1

Change-Id: If08b698d012a75b587144501d829403ec616f685
Signed-off-by: Ghanshyam Maan <gmaan.os14@gmail.com>
2026-02-25 20:36:07 +00:00
Ghanshyam Maan d5ffb58a8d Use 2nd RPC server in compute operations
For graceful shutdown of compute service, it will have two RPC servers.
One RPC server is used for the new requests which will be stopped during
graceful shutdown and 2nd RPC server (listen on 'compute-alt' topic)
will be used to complete the in-progress operations.

We select the operations (case by case) and their RPC method to use
the 2nd PRC server so that they will not be interupted on shutdown
initiative and graceful shutdown time will keep 2nd RPC server active
for graceful_shutdown_timeout. A new method 'prepare_for_alt_rpcserver'
is added which will fallback to first RPC server if it detect the old
compute.

As this is upgrade impact, it bumps the compute/service version, adds
releasenotes for the same.

The list of operations who should use the 2nd RPC server will grow
evanutally and this commit moves the below operations to use the 2nd
RPC server:

* Live migration

  - Live migration: It use 2nd RPC servers and will try to complete
    the operation during shutdown.
  - live_migration_force_complete does not need to use 2nd RPC server.
    It is direct RPC request from API to compute and if that is
    rejected during shutdown, it is fine and can be initiated again
    once compute is up.
  - live_migration_abort does not need to use 2nd RPC server. Ditto,
    it is direct RPC request from API to compute. It cancel the queue
    live migration but if migration is already started, then driver
    cancel the migration. If it is rejected during shutdown because of
    RPC is stopped, it is fine and can be initiated again.

* server external event
* Get server console

As graceful shutdown cannot be tested in tempest, this adds a new job
to test it. Currently it test the live migration operation which can
be extended to other operations who will use 2nd RPC server.

Partial implement blueprint nova-services-graceful-shutdown-part1

Change-Id: I4de3afbcfaefbed909a29a831ac18060c4a73246
Signed-off-by: Ghanshyam Maan <gmaan.os14@gmail.com>
2026-02-25 20:32:44 +00:00
Zuul 37a9596eb1 Merge "libvirt: Use firmware auto-selection by libvirt" 2026-02-25 18:13:09 +00:00
Balazs Gibizer 4bce4480b9 Run nova-compute in native threading mode
Previous patches removed direct eventlet usage from nova-compute so
now we can run it with native threading as well. This patch documents
the possibility and switches both nova-compute processes to native
threading mode in the nova-next job.

Change-Id: I7bb29c627326892d1cf628bbf57efbaedda12f1a
Signed-off-by: Balazs Gibizer <gibi@redhat.com>
2026-02-24 16:28:06 +01:00
Balazs Gibizer 9e678b83eb [compute]Use single long task executor
Move the execution of build_and_run_instance and snapshot_instance to
one common long task executor. Originally snapshot ran
on the RPC pool, build_and_run_instance ran on the default pool.
Also each of these tasks had a separate concurrency limit enforced by a
semaphore.

After this patch each of these tasks use a common Executor. The size of
that executor and the way how we limit the concurrency differs in
eventlet and in native threading mode.

In eventlet mode we have one big Executor with "unlimit" size and
individual semaphores are used for each task type to enforce the
configured limits.

In threading mode we requests the admin to configure the 2 limits to the
same number, and we warn if not. We use that limit (or the max of the 2
limits) as the size of the long task Executor. As the limits are the
same we don't enforce individual limit any more. The executor size will
ensure the shared limit is kept. As the limit is shared a single
operation type can consume the whole limit.

Note that while live migration is a long-running task we cannot put it into
the same long_task_executor as build and snapshot as we need:
1. a very small limit of concurrent live migrations compared to
   builds and snapshots
2. a way to cancel live migrations easily that are waiting due to the
   limit

Change-Id: I88a6a593af8a5b518715e1245a76ee54752afe83
Signed-off-by: Balazs Gibizer <gibi@redhat.com>
2026-02-24 16:28:06 +01:00
Balazs Gibizer efd61188c1 Deprecate unlimited compute actions
We already deprecated the unlimited max_concurrent_live_migrations
config value and now we do the same for max_concurrent_builds and
max_concurrent_snapshots as well. The reason is similar.
* The unlimited meaning was a lie, it was limited by other constructs in
  the code. For these option the limit was the size of the RPC executor
  defaulted to 64.
* In native threading mode having unlimited concurrent tasks is
  unfeasible due to the memory cost of native threads for each task.

The deprecation is done in a way that in eventlet mode we keep a similar
behavior as before but in native threading mode we enforce a strict
maximum even if unlimited is requested.

Change-Id: Ibbf76c2c85729820035c9791719bf2c864bce12b
Signed-off-by: Balazs Gibizer <gibi@redhat.com>
2026-02-24 16:28:00 +01:00
Takashi Kajinami 5841095740 libvirt: Use firmware auto-selection by libvirt
Use the firmware auto-selection feature in libvirt to find the best
UEFI firmware file according to the requested feature.

Firmware files may be reselected when a libvirt domain is created from
scratch, while these are kept during hard-reboot (or live migration
which preserves the loader/nvram elements filled by libvirt).

Closes-Bug: #2122296
Related-Bug: #2122288
Implements: blueprint libvirt-firmware-auto-selection
Change-Id: Ie48b020597a1a2fb3280815eec5ba3565e396f9b
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
2026-02-24 20:26:41 +09:00
Zuul a3e9095f02 Merge "Add VNC console support for the Ironic driver" 2026-02-23 19:04:09 +00:00
Johannes Kulik 2a51df2760 Attaching a volume returns HTTP 202
Instead of returning an HTTP 200 and a `volumeAttachment` object,
attaching a volume to an instance returns HTTP 202 starting from API
version 2.101.

To keep the functionality for older API versions, we move the
`_attach_volume()` method from n-api to n-conductor and either do a call
or a cast depending on whether the API needs to return a value.
n-conductor then handles reserving the block_device_mapping's device by
calling n-compute before it starts the previously-already-async volume
attachment.

We have to move `_check_attach_and_reserve_volume` into compute utils,
because it's getting called in both conductor and compute api (for the
shelved offloaded attach).

The new RPC method in the conductor needs a long timeout when used with
API versions less than the new 2.101, because it waits for the call to
`reserve_block_device_name()` in nova-compute which already needs a long
timeout.

Updating the functional tests' `post_server_volume()` and
`_attach_volume()` to not return the attachment anymore is possible, as
no test uses the returned values.

Change-Id: I4d38c2679f0e88cca30055a9c8c45ba1dd6fb5ef
Implements: blueprint async-volume-attachments
Signed-off-by: Johannes Kulik <johannes.kulik@sap.com>
2026-02-18 15:02:21 +01:00
Nicolai Ruckel 35b1945522 Preserve UEFI NVRAM variable store
Preserve NVRAM variable store during stop/start, hard reboot, live
migration, and volume retype.

This does not affect cold migration or shelve.

For UEFI guests (hw_firmware_type=uefi), every time the instance is
started, the UEFI variable storage for that instance
(/var/lib/libvirt/qemu/nvram/instance-xxxxxxxx_VARS.fd) is deleted
and reinitialized from the default template.

The changes are based on this patch by Jonas Schäfer to preserve the
vTPM state:
https://review.opendev.org/c/openstack/nova/+/955657

Closes-Bug: #1633447
Closes-Bug: #2131730
Change-Id: I444a9285c07a04bf08a73772235f8dd73d75e513
Signed-off-by: Nicolai Ruckel <nicolai.ruckel@cloudandheat.com>
2026-02-13 23:55:41 +01:00
Sean Mooney 264e868d49 Support os-vif TAP pre-creation for OVS/OVN ports
Add support for os-vif TAP device pre-creation when Neutron sets
the 'ovs_create_tap' flag in vif_details. This reduces live
migration downtime by ensuring the network is fully wired before
the VM starts.

Changes:
- Add VIF_DETAILS_OVS_CREATE_TAP constant to model.py
- Propagate create_tap from binding details to os-vif port profile
  in os_vif_util.py
- Set managed='no' in libvirt XML when create_tap is enabled so
  libvirt uses the pre-created TAP device
- Set multiqueue on port profile in _plug_os_vif based on instance
  flavor/image hw:vif_multiqueue_enabled property

When checking oslo.versionedobjects fields for backward compat:
- Use 'field in obj.fields' to check if field exists in schema
- Use 'field in obj' to check if field value is set

Depends-On: https://review.opendev.org/c/openstack/os-vif/+/971231
Generated-By: Cursor claude-opus-4.5
Closes-Bug: #2069718
Change-Id:  I32343658b53e317696d1bd8b984793bfeeccd409
Signed-off-by: Sean Mooney <work@seanmooney.info>
2026-02-05 18:55:06 +00:00
Sean Mooney c8d34ed3dc Fix blockio generation for LUN volumes
QEMU's scsi-block device driver does not support physical_block_size
and logical_block_size properties. When Cinder reports disk geometry
for LUN volumes, Nova was incorrectly including a <blockio> element
in the libvirt XML, causing QEMU to fail with:

    Property 'scsi-block.physical_block_size' not found

This fix adds a check to skip blockio generation when source_device
is 'lun', following the existing pattern used for serial at line 1356.

Generated-By: claude-code (Claude Opus 4.5)
Closes-Bug: #2127196
Change-Id: Idf87e936edd97aac719222942c9842a9aca4c270
Signed-off-by: Sean Mooney <work@seanmooney.info>
2026-02-03 22:15:19 +00:00
Steve Baker 791310ae1e Add VNC console support for the Ironic driver
Ironic is adding support for VNC consoles tracked under the following
spec[1]. This change provides support for the Nova Ironic driver to
access the consoles created by this feature effort.

This supersedes an existing Nova spec[2] to add VNC console support to
the Ironic driver, so this change can be considered to implement this
spec also. This change can be merged independently of the Ironic work,
as the Ironic driver handles the VNC console not being available.

The pre-requesites for a graphical console being available for an Ironic
driver node is:

- Ironic is configured to enable graphical consoles
- The node ``console_interface`` is a graphical driver such as
  ``redfish-graphical`` or ``fake-graphical``
- ``nova-novncproxy`` can make network connections to the VNC servers
  which run adjacent to ``ironic-conductor``

The associated depends on adds the novnc validation check to the
baremetal basic ops, which is run in job
ironic-tempest-ipa-wholedisk-bios-agent_ipmitool-tinyipa.

In the support matrix console.vnc support is set to partial for ironic
due to the current lack of vencrypt support on the ironic side.

[1] https://specs.openstack.org/openstack/ironic-specs/specs/approved/graphical-console.html
[2] https://specs.openstack.org/openstack/nova-specs/specs/2023.1/approved/ironic-vnc-console.html

Related-Bug: 2086715
Implements: blueprint ironic-vnc-console
Change-Id: Iec26c67e29f91954eafc6a5a81086e36798d3f26
Signed-off-by: Steve Baker <sbaker@redhat.com>
2026-01-27 10:06:12 +13:00
Zuul 7579dbdf0e Merge "Use *_OR_ADMIN policy defaults for server shares" 2026-01-23 05:00:53 +00:00
Zuul 8fe5d3ce75 Merge "Faults from cell DB missing in GET /servers/detail" 2026-01-23 05:00:40 +00:00
elajkat 76d64b9cb4 blueprint: iothreads-for-instances
Enable one io-thread per qemu instance.

Related-Bug: iothreads-for-instances
Change-Id: I8b22e5bca560d111934fbdf67494a4e288b9e50a
Signed-off-by: lajoskatona <lajos.katona@est.tech>
2026-01-19 16:17:47 +01:00
Zuul 68cec593a7 Merge "Compute manager to use thread pools selectively" 2026-01-16 21:03:28 +00:00
Balazs Gibizer 3c23390cc8 Compute manager to use thread pools selectively
This changes the thread pool usage of the ComputeManager to go through
the concurrency mode aware util functions.

The concurrent live migration pool had a seemingly unlimited option
when configured with value 0, but in reality GreenThreadPool has a
default worker size of 1000. In reality it is almost never right to
have more than one live migration running concurrently. Also with
native threading having 1000 worker is just too costly. So we
decided to deprecate the value 0 and changed the implementation of
unlimited to mean 5 threads in native threading mode. We kept the 1000
greenthread in eventlet mode for backward compatibility.

The _sync_power_states periodic task also spawn tasks for each instance
to be synced. As it uses a shared data structure across these tasks
and the caller a lock is needed to avoid race conditions.
Also the default pool size is 1000 for these tasks in our configuration.
That would use a lot of memory on a busy host in native threading mode.
So we changed the default value from 1000 to 5.

Change-Id: I9567d5fabdf086b5d0493103d9f6bde4f66af387
Signed-off-by: Balazs Gibizer <gibi@redhat.com>
2026-01-16 09:47:42 +01:00
Zuul 80753c5745 Merge "Upgrade note for concurrency mode default change" 2026-01-14 21:23:21 +00:00
Balazs Gibizer f73a23b4d4 Upgrade note for concurrency mode default change
This is a follow up for the release notes added in the commit
35207ee8b5 that changed the default mode
for the scheduler and the API services. At that time we missed to note
the upgrade impact of such change. So this patch extends the reno with
an upgrade note.

Change-Id: I280e7eb9c1da6eeaf50e96e8b19e296961f2651a
Signed-off-by: Balazs Gibizer <gibi@redhat.com>
2026-01-14 13:29:07 +01:00
Zuul 88c538a897 Merge "libvirt: Skip unsupported firmware types" 2026-01-06 12:01:12 +00:00
Ivaylo Mitev fb661ec597 Faults from cell DB missing in GET /servers/detail
Field  is empty in the response of API GET /servers/detail if the
instance (hence instace_faults DB entry) is in nova cell DB.
Unlike that, for API /servers/:id fault is retrieved correctly no matter
in which nova cell the instance belongs.

Closes-Bug: #1856329
Change-Id: I1726f53cfeac0a67a5dacdddda2af2cc1db0af0f
Signed-off-by: Marius Leustean <marius.leustean@sap.com>
2025-12-17 11:51:38 +02:00
Zuul f268b385dd Merge "Use consistent program name for wsgi scripts and entry points" 2025-12-08 22:18:34 +00:00
Zuul 1712ae48e3 Merge "libvirt: add configuration option for volume AIO mode" 2025-12-05 05:20:30 +00:00
Zuul 5d3d0c870a Merge "ensure correct cleanup of multi-attach volumes" 2025-12-04 07:00:30 +00:00
Takashi Kajinami 253aaec4bb Use consistent program name for wsgi scripts and entry points
Make sure that the consistent program name is always set,so that
the same config sub-directory ( /etc/{project}/{prog}.conf.d ) is used
regardless of the way api service is run.

Closes-Bug: #2098514
Change-Id: Ib5c6d431176b83eefafddc1b35589015db6dfd04
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
2025-12-02 02:57:25 +09:00
Takashi Kajinami d2188b9e6b libvirt: Skip unsupported firmware types
Ignore (1) stateless mode firmware and (2) memory device firmware which
do not include a few core keys such as nvram-template. This is
a temporal (and backportable) workaround until firmware detection using
libvirt's internal feature is implemented by [1]

[1] https://blueprints.launchpad.net/nova/+spec/libvirt-firmware-auto-selection

Closes-Bug: #2122288
Change-Id: I99bc36fdd5df816c9ae374db71e4734fb7fc467b
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
2025-11-30 02:24:32 +09:00
Jay Faulkner 56cb5f52fb [ironic] Ensure unprovision happens for new states
States were added to the Ironic API to enable the node servicing
feature, which can be performed on nodes provisioned with Nova
instances. Current nova, if asked to delete these instances, will only
remove the instance metadata and not tear them down.

This change has two parts:
- I have added the new, relevant states to _UNPROVISION_STATES in
  driver.py, which now allows Nova to know that SERVIC* states and
  DEPLOYHOLD are safe to unprovision from.
- I have added all existing ironic states to ironic_states.py and the
  PROVISION_STATE_LIST constant and check the state against it -- in a
  case where a completely unknown state is returned, we should attempt
  an unprovision.

This fix needs to be backported as far as possible, as this bug has
existed since Antelope / 2023.1 (DEPLOYHOLD) or Bobcat / 2023.3
(SERVIC*).

Assisted-by: Claude Code
Closes-bug: #2131960
Change-Id: I31c70d35b0e6e9f8d2252bfb2f0bdec477cc6cc7
Signed-off-by: Jay Faulkner <jay@jvf.cc>
2025-11-20 15:23:58 -08:00
René Ribaud f017e23b81 Use *_OR_ADMIN policy defaults for server shares
Update the server shares API policies to use
PROJECT_READER_OR_ADMIN and PROJECT_MEMBER_OR_ADMIN instead of
PROJECT_READER and PROJECT_MEMBER.

This aligns the server shares policies with other compute API
policies and ensures administrators can list, attach, show and
detach shares regardless of project policy overrides.

Signed-off-by: René Ribaud <rene.ribaud@gmail.com>
Change-Id: I2b237d56b08e3080475dc500e204298018af29c7
2025-11-20 15:15:00 +01:00