Commit Graph

62065 Commits

Author SHA1 Message Date
Zuul f5579e9ccc Merge "Prepare resize/cold migration for graceful shutdown" 2026-02-26 17:45:45 +00:00
Zuul 44a7c5c2b0 Merge "Use 2nd RPC server in compute operations" 2026-02-26 17:44:59 +00:00
Zuul fbfc44f73b Merge "Run nova-compute in native threading mode" 2026-02-26 17:44:44 +00:00
Zuul d55f0ce38d Merge "[compute]Use single long task executor" 2026-02-26 16:59:48 +00:00
Zuul 6ae5459351 Merge "Deprecate unlimited compute actions" 2026-02-26 16:59:31 +00:00
Zuul 84046d1e3f Merge "api: Simplify API version check for flavor description" 2026-02-26 16:20:54 +00:00
Zuul c1a7c81e5d Merge "tests: Clean up flavors tests" 2026-02-26 15:54:30 +00:00
Zuul ddd6067ad5 Merge "db: Move regex helpers to utils" 2026-02-26 15:54:11 +00:00
Takashi Kajinami 1903028492 Accept an empty key for addresses
The name property of networks is optional in neutron. When a server is
attached to a network without name, the key can be empty.

Closes-Bug: #2142767
Change-Id: I31a82bb1574fab6ac03722571ff96443d7a3a51f
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
2026-02-27 00:49:12 +09:00
Zuul 4b38430609 Merge "libvirt: Remove unnecessary arg" 2026-02-26 13:49:24 +00:00
Zuul 0e6505d3b3 Merge "api: Add runtime check for general additionalProperties" 2026-02-26 13:26:25 +00:00
Zuul 14ccb04330 Merge "api: Fix issue with instance usage audit log schema" 2026-02-26 13:26:09 +00:00
Stephen Finucane fcbedce558 conf: Deprecate AggregateImagePropertiesIsolation opts
The 'AggregateImagePropertiesIsolation' scheduler filter allows users to
filter host aggregates by comparing aggregate and image metadata. The
'[filter_scheduler] aggregate_image_properties_isolation_namespace' and
'[filter_scheduler] aggregate_image_properties_isolation_separator'
options purport to allow users to specify a prefix to use for both the
aggregate and image metadata keys, allowing users to do e.g.:

  openstack image set --property customized.os_type=linux $IMAGE
  openstack aggregate set --property customized.os_type=windows $AGG1
  openstack aggregate set --property customized.os_type=linux $AGG2

However, as noted in change If7245a90711bd2ea13095ba26b9bc82ea3e17202,
this is no longer possible since we introduced the 'ImageMetaProps' o.vo
in Liberty and promptly lost the ability to see any non-o.vo image
metadata properties from glance.

There's a possibility, however slight, that some people are using
namespaces that match actual nova namespaces such as 'hw' and a
separator of '_', but those will continue to work just fine. Setting
anything else will result in the scheduler filter failing since the
image property will always appear to be absent. As a result, these could
be outright removed rather than deprecated. We choose to deprecate just
so people can see the warnings.

Change-Id: Ide763d75e42427a9df3673313895ef47b8727802
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
2026-02-26 11:09:41 +00:00
Zuul 18a7dcb6e8 Merge "tests: Invert validation check" 2026-02-26 08:50:37 +00:00
Zuul df75b7c13b Merge "api: Add response body schemas for server shares APIs" 2026-02-26 08:50:24 +00:00
Zuul 42ac2df1f8 Merge "api: Add response body schemas for servers APIs (6/6)" 2026-02-26 03:00:09 +00:00
Ghanshyam Maan a877e0ed15 Add operator document for graceful shutdown
Partial implement blueprint nova-services-graceful-shutdown-part1

Change-Id: I18bdb4b9ca2663b5fa1f88b715d27411827b1c45
Signed-off-by: Ghanshyam Maan <gmaan.os14@gmail.com>
2026-02-26 01:33:03 +00:00
Zuul afac8df46d Merge "api: Add response body schemas for servers APIs (5/6)" 2026-02-25 21:00:49 +00:00
Ghanshyam Maan b47d217ca7 Add more test for graceful shutdown
Adding more tests for graceful shutdown:
- shutdown the destination compute and see how live and cold migration
progress
- start build instance and ocne comoute start building instance then
shutdown the comoute service and see if build instance finish or not.
- revert resize server

Partial implement blueprint nova-services-graceful-shutdown-part1

Change-Id: I57132fb7b7fa614dfc138508581ff5a67aaed906
Signed-off-by: Ghanshyam Maan <gmaan.os14@gmail.com>
2026-02-25 20:46:24 +00:00
Ghanshyam Maan 996c4ff9e8 Prepare resize/cold migration for graceful shutdown
During graceful shutdown, compute service keep a 2nd RPC
server active which can be used to finish the in-progress
operations. Like live migration, resize and cold migrations
also perform RPC call among source and destination compute.
For those operation also, we can use 2nd RPC server and make
sure they will be completed during graceful shutdown.

A quick overview of what all RPC methods are involved in the
resize/cold migration and what all will be using 2nd RPC server:

Resize/cold migration
- prep_resize: No, resize/migration is not started yet.
- resize_instance: Yes, here the resize/migration starts.
- finish_resize: Yes
- cross cell resize case:
  - prep_snapshot_based_resize_at_dest: NO, this is initial check and
    migration is not started
  - prep_snapshot_based_resize_at_source: Yes, this start the migration

Confirm resize: NO
- confirm_resize: NO
- cross cell confirm resize case:
  - confirm_snapshot_based_resize - NO

Revert resize:
- revert_resize - NO
- check_instance_shared_storage: YES. This is called from dest to source
  so we need source to respond to it so that revert can continue.
- finish_revert_resize on source- YES, at this stage, revert resize is
  in progress and abandoning it here can lead migration to unreocverable
  state.
- cross cell revert case:
  - revert_snapshot_based_resize_at_dest: NO
  - finish_revert_snapshot_based_resize_at_source: YES

Partial implement blueprint nova-services-graceful-shutdown-part1

Change-Id: If08b698d012a75b587144501d829403ec616f685
Signed-off-by: Ghanshyam Maan <gmaan.os14@gmail.com>
2026-02-25 20:36:07 +00:00
Ghanshyam Maan d5ffb58a8d Use 2nd RPC server in compute operations
For graceful shutdown of compute service, it will have two RPC servers.
One RPC server is used for the new requests which will be stopped during
graceful shutdown and 2nd RPC server (listen on 'compute-alt' topic)
will be used to complete the in-progress operations.

We select the operations (case by case) and their RPC method to use
the 2nd PRC server so that they will not be interupted on shutdown
initiative and graceful shutdown time will keep 2nd RPC server active
for graceful_shutdown_timeout. A new method 'prepare_for_alt_rpcserver'
is added which will fallback to first RPC server if it detect the old
compute.

As this is upgrade impact, it bumps the compute/service version, adds
releasenotes for the same.

The list of operations who should use the 2nd RPC server will grow
evanutally and this commit moves the below operations to use the 2nd
RPC server:

* Live migration

  - Live migration: It use 2nd RPC servers and will try to complete
    the operation during shutdown.
  - live_migration_force_complete does not need to use 2nd RPC server.
    It is direct RPC request from API to compute and if that is
    rejected during shutdown, it is fine and can be initiated again
    once compute is up.
  - live_migration_abort does not need to use 2nd RPC server. Ditto,
    it is direct RPC request from API to compute. It cancel the queue
    live migration but if migration is already started, then driver
    cancel the migration. If it is rejected during shutdown because of
    RPC is stopped, it is fine and can be initiated again.

* server external event
* Get server console

As graceful shutdown cannot be tested in tempest, this adds a new job
to test it. Currently it test the live migration operation which can
be extended to other operations who will use 2nd RPC server.

Partial implement blueprint nova-services-graceful-shutdown-part1

Change-Id: I4de3afbcfaefbed909a29a831ac18060c4a73246
Signed-off-by: Ghanshyam Maan <gmaan.os14@gmail.com>
2026-02-25 20:32:44 +00:00
Zuul 37a9596eb1 Merge "libvirt: Use firmware auto-selection by libvirt" 2026-02-25 18:13:09 +00:00
Zuul 4c32ea65e3 Merge "libvirt: Add capability to load smm feature from existing xml" 2026-02-25 17:47:46 +00:00
Stephen Finucane ed83dab5a7 api: Simplify API version check for flavor description
Unlike the check for extra specs, the check for whether to include a
description field or not is driven entirely by API version rather than
API version and policy. We can therefore move the checks inside the
functions that generate the response rather than duplicating them
elsewhere.

Change-Id: I86aa4e1c62a0b0e6fa4d27e559d3197fb73851ba
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
2026-02-25 14:16:36 +00:00
Stephen Finucane 61c1ce6c8e tests: Clean up flavors tests
Ahead of adding additional tests. We do the following:

* Move the controller to an instance attribute rather than a class
  attribute
* Modify tests so they all call controller methods directly rather than
  setting up a fake router (this is the cause of the largest changes)
* Remove unnecessary aliasing of exceptions
* Remove unnecessary setUp arguments
* Split a test into multiple tests
* Standardize test class names

Change-Id: I2cac4cc79288f7b3bacc4a63a1d36d4cf12013d7
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
2026-02-25 14:16:36 +00:00
Stephen Finucane d79132d113 db: Move regex helpers to utils
So that we can use them for API DB methods, which are found in
nova.objects instead of nova.db.

Change-Id: Ifb15ee90ac6a6400b7268ed80f727080e98c4cdf
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
2026-02-25 14:16:36 +00:00
Stephen Finucane a5bde25463 api: Add runtime check for general additionalProperties
Change-Id: I959afd6e6fa89f0656c10599e50ecb179c87d354
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
2026-02-25 14:12:14 +00:00
Stephen Finucane 0c4c0d8ce8 api: Fix issue with instance usage audit log schema
We need another level of nesting [1].

[1] https://groups.google.com/g/json-schema/c/pK_Y1Gb5waM

Change-Id: I9828e287208a0dff8f909036df848f7539c534d4
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
2026-02-25 13:48:11 +00:00
Stephen Finucane d4264cd447 api: Remove errant field
A follow-up for Ia178c1314f99c719827e3eb78735d1019852a273 and
I0e42de5074dcf699886b20dfd43306683e381ee2. 'adminPass' is only
(optionally) returned in server create and rebuild responses, not in
server show or update responses.

Change-Id: I2c4ce7a2b1063d71561d6af95a58a36b39356879
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
2026-02-25 12:31:46 +00:00
Zuul ebb3175d9c Merge "tests: Fix bound" 2026-02-25 00:17:42 +00:00
Zuul c26c66afb3 Merge "api: Add response body schemas for servers APIs (4/6)" 2026-02-24 23:57:17 +00:00
Zuul 983fbc683b Merge "api: Add response body schemas for servers APIs (3/6)" 2026-02-24 23:57:05 +00:00
Zuul 7dd7a0ffe3 Merge "api: Add response body schemas for servers APIs (2/6)" 2026-02-24 23:45:11 +00:00
Zuul 02d083650b Merge "api: Add response body schemas for servers APIs (1/6)" 2026-02-24 23:44:56 +00:00
Zuul 9855b5c8cc Merge "libvirt: Add capability to load loader and nvram from xml" 2026-02-24 20:12:05 +00:00
Zuul adbea93431 Merge "libvirt: Add basic xml generation for firmware auto selection" 2026-02-24 20:11:42 +00:00
Zuul 04e926a38e Merge "libvirt: Extend functional test coverage of UEFI boot guests" 2026-02-24 19:14:59 +00:00
Zuul 5ed052061a Merge "Fix the negative sleep value in graceful_shutdown()" 2026-02-24 19:14:06 +00:00
Zuul 05a539ace9 Merge "Fix for bug 2140537" 2026-02-24 18:23:50 +00:00
Zuul 12ee1cd80d Merge "Add manager graceful shutdown, timeout, and wait" 2026-02-24 17:58:27 +00:00
Balazs Gibizer 4bce4480b9 Run nova-compute in native threading mode
Previous patches removed direct eventlet usage from nova-compute so
now we can run it with native threading as well. This patch documents
the possibility and switches both nova-compute processes to native
threading mode in the nova-next job.

Change-Id: I7bb29c627326892d1cf628bbf57efbaedda12f1a
Signed-off-by: Balazs Gibizer <gibi@redhat.com>
2026-02-24 16:28:06 +01:00
Balazs Gibizer 9e678b83eb [compute]Use single long task executor
Move the execution of build_and_run_instance and snapshot_instance to
one common long task executor. Originally snapshot ran
on the RPC pool, build_and_run_instance ran on the default pool.
Also each of these tasks had a separate concurrency limit enforced by a
semaphore.

After this patch each of these tasks use a common Executor. The size of
that executor and the way how we limit the concurrency differs in
eventlet and in native threading mode.

In eventlet mode we have one big Executor with "unlimit" size and
individual semaphores are used for each task type to enforce the
configured limits.

In threading mode we requests the admin to configure the 2 limits to the
same number, and we warn if not. We use that limit (or the max of the 2
limits) as the size of the long task Executor. As the limits are the
same we don't enforce individual limit any more. The executor size will
ensure the shared limit is kept. As the limit is shared a single
operation type can consume the whole limit.

Note that while live migration is a long-running task we cannot put it into
the same long_task_executor as build and snapshot as we need:
1. a very small limit of concurrent live migrations compared to
   builds and snapshots
2. a way to cancel live migrations easily that are waiting due to the
   limit

Change-Id: I88a6a593af8a5b518715e1245a76ee54752afe83
Signed-off-by: Balazs Gibizer <gibi@redhat.com>
2026-02-24 16:28:06 +01:00
Balazs Gibizer efd61188c1 Deprecate unlimited compute actions
We already deprecated the unlimited max_concurrent_live_migrations
config value and now we do the same for max_concurrent_builds and
max_concurrent_snapshots as well. The reason is similar.
* The unlimited meaning was a lie, it was limited by other constructs in
  the code. For these option the limit was the size of the RPC executor
  defaulted to 64.
* In native threading mode having unlimited concurrent tasks is
  unfeasible due to the memory cost of native threads for each task.

The deprecation is done in a way that in eventlet mode we keep a similar
behavior as before but in native threading mode we enforce a strict
maximum even if unlimited is requested.

Change-Id: Ibbf76c2c85729820035c9791719bf2c864bce12b
Signed-off-by: Balazs Gibizer <gibi@redhat.com>
2026-02-24 16:28:00 +01:00
Takashi Kajinami 5841095740 libvirt: Use firmware auto-selection by libvirt
Use the firmware auto-selection feature in libvirt to find the best
UEFI firmware file according to the requested feature.

Firmware files may be reselected when a libvirt domain is created from
scratch, while these are kept during hard-reboot (or live migration
which preserves the loader/nvram elements filled by libvirt).

Closes-Bug: #2122296
Related-Bug: #2122288
Implements: blueprint libvirt-firmware-auto-selection
Change-Id: Ie48b020597a1a2fb3280815eec5ba3565e396f9b
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
2026-02-24 20:26:41 +09:00
Zuul a3e9095f02 Merge "Add VNC console support for the Ironic driver" 2026-02-23 19:04:09 +00:00
Ghanshyam Maan 48bdfc8b2f Fix the negative sleep value in graceful_shutdown()
This fixes the following comment to avoid having the
negative sleep value in manager graceful_shutdown()

- https://review.opendev.org/c/openstack/nova/+/975586/comment/d5e3a603_0c746704/

Change-Id: I07a994bd05ac1e7f734f2a2144327bd2559c1416
Signed-off-by: Ghanshyam Maan <gmaan.os14@gmail.com>
2026-02-23 18:51:43 +00:00
Zuul 64d6ba34c5 Merge "Fix the flasky test test_submit_second_while_delaying_first" 2026-02-19 20:21:47 +00:00
Ghanshyam Maan 2fb9113ed2 Add manager graceful shutdown, timeout, and wait
As per the part1 of the graceful shutdown timeouts[1], this commit
add/modifies the below timeout/wait needed for graceful shutdown:

- Override the default of the graceful_shutdown_timeout to 180.
- Add a new config option for manager shutdown timeout.

It also adds a graceful_shutdown() method on the manager side, which
will be called by the nova/service.py->stop() method before it stops
the 2nd RPC server. In part1, this will wait for the configurable wait
time, but part2 will implement a better solution to track the
in-progress tasks. The idea is to have this single interface from the
service manager (graceful_shutdown()) that will be called during
graceful shutdown and is responsible for finishing the required tasks
and cleanup.

Partial implement blueprint nova-services-graceful-shutdown-part1

[1] https://specs.openstack.org/openstack/nova-specs/specs/2026.1/approved/nova-services-graceful-shutdown-part1.html#graceful-shutdown-timeouts

Change-Id: I7c1934d3ec7854feac3fc8432627c25eba963ddf
Signed-off-by: Ghanshyam Maan <gmaan.os14@gmail.com>
2026-02-19 19:44:41 +00:00
Ghanshyam Maan b5c0a97582 Fix the flasky test test_submit_second_while_delaying_first
This test failied a few times and most recent faaailure is
- https://review.opendev.org/c/openstack/nova/+/975586 (PS8 run)

Traceback (most recent call last):
  File "/home/zuul/src/opendev.org/openstack/nova/nova/tests/unit/test_utils.py", line 2185, in test_submit_second_while_delaying_first
    self.assertGreater(task2_runtime, 2.0)
  File "/usr/lib/python3.12/unittest/case.py", line 1269, in assertGreater
    self.fail(self._formatMessage(msg, standardMsg))
  File "/usr/lib/python3.12/unittest/case.py", line 715, in fail
    raise self.failureException(msg)
AssertionError: 1.997275639999998 not greater than 2.0

From error, it seems we are capturing the start time after we
submit the task to executor who will count the task submit time
little ahead of test captured the task start time.

let's capture the task start time before task is submitted so that
we can caompare the time in more correct way.

Change-Id: I5a9845813b614c58e0f5a66e07f8a8c732f38eb3
Signed-off-by: Ghanshyam Maan <gmaan.os14@gmail.com>
2026-02-19 19:06:07 +00:00
Zuul 7a303bc1e2 Merge "Add 2nd RPC server for compute service" 2026-02-19 17:36:34 +00:00