Commit Graph

14 Commits

Author SHA1 Message Date
Ghanshyam Maan b47d217ca7 Add more test for graceful shutdown
Adding more tests for graceful shutdown:
- shutdown the destination compute and see how live and cold migration
progress
- start build instance and ocne comoute start building instance then
shutdown the comoute service and see if build instance finish or not.
- revert resize server

Partial implement blueprint nova-services-graceful-shutdown-part1

Change-Id: I57132fb7b7fa614dfc138508581ff5a67aaed906
Signed-off-by: Ghanshyam Maan <gmaan.os14@gmail.com>
2026-02-25 20:46:24 +00:00
Ghanshyam Maan 996c4ff9e8 Prepare resize/cold migration for graceful shutdown
During graceful shutdown, compute service keep a 2nd RPC
server active which can be used to finish the in-progress
operations. Like live migration, resize and cold migrations
also perform RPC call among source and destination compute.
For those operation also, we can use 2nd RPC server and make
sure they will be completed during graceful shutdown.

A quick overview of what all RPC methods are involved in the
resize/cold migration and what all will be using 2nd RPC server:

Resize/cold migration
- prep_resize: No, resize/migration is not started yet.
- resize_instance: Yes, here the resize/migration starts.
- finish_resize: Yes
- cross cell resize case:
  - prep_snapshot_based_resize_at_dest: NO, this is initial check and
    migration is not started
  - prep_snapshot_based_resize_at_source: Yes, this start the migration

Confirm resize: NO
- confirm_resize: NO
- cross cell confirm resize case:
  - confirm_snapshot_based_resize - NO

Revert resize:
- revert_resize - NO
- check_instance_shared_storage: YES. This is called from dest to source
  so we need source to respond to it so that revert can continue.
- finish_revert_resize on source- YES, at this stage, revert resize is
  in progress and abandoning it here can lead migration to unreocverable
  state.
- cross cell revert case:
  - revert_snapshot_based_resize_at_dest: NO
  - finish_revert_snapshot_based_resize_at_source: YES

Partial implement blueprint nova-services-graceful-shutdown-part1

Change-Id: If08b698d012a75b587144501d829403ec616f685
Signed-off-by: Ghanshyam Maan <gmaan.os14@gmail.com>
2026-02-25 20:36:07 +00:00
Ghanshyam Maan d5ffb58a8d Use 2nd RPC server in compute operations
For graceful shutdown of compute service, it will have two RPC servers.
One RPC server is used for the new requests which will be stopped during
graceful shutdown and 2nd RPC server (listen on 'compute-alt' topic)
will be used to complete the in-progress operations.

We select the operations (case by case) and their RPC method to use
the 2nd PRC server so that they will not be interupted on shutdown
initiative and graceful shutdown time will keep 2nd RPC server active
for graceful_shutdown_timeout. A new method 'prepare_for_alt_rpcserver'
is added which will fallback to first RPC server if it detect the old
compute.

As this is upgrade impact, it bumps the compute/service version, adds
releasenotes for the same.

The list of operations who should use the 2nd RPC server will grow
evanutally and this commit moves the below operations to use the 2nd
RPC server:

* Live migration

  - Live migration: It use 2nd RPC servers and will try to complete
    the operation during shutdown.
  - live_migration_force_complete does not need to use 2nd RPC server.
    It is direct RPC request from API to compute and if that is
    rejected during shutdown, it is fine and can be initiated again
    once compute is up.
  - live_migration_abort does not need to use 2nd RPC server. Ditto,
    it is direct RPC request from API to compute. It cancel the queue
    live migration but if migration is already started, then driver
    cancel the migration. If it is rejected during shutdown because of
    RPC is stopped, it is fine and can be initiated again.

* server external event
* Get server console

As graceful shutdown cannot be tested in tempest, this adds a new job
to test it. Currently it test the live migration operation which can
be extended to other operations who will use 2nd RPC server.

Partial implement blueprint nova-services-graceful-shutdown-part1

Change-Id: I4de3afbcfaefbed909a29a831ac18060c4a73246
Signed-off-by: Ghanshyam Maan <gmaan.os14@gmail.com>
2026-02-25 20:32:44 +00:00
Sean Mooney 6c9110bb8b Handle missing libvirt services in evacuate hook
On Debian 13 (Trixie), libvirt packaging is modularized and
the libvirt-daemon-lock package (providing virtlockd) is
optional. The evacuate hook previously assumed all libvirt
services were installed and failed when stopping/starting
missing units.

Extract a reusable manage_libvirt_service.yaml task file that
checks if a service exists via systemctl list-unit-files
before managing its units. This prevents failures when
optional libvirt packages are not installed and future-proofs
against further packaging changes.

Generated-By: claude-code
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Change-Id: Ie84e2e8ab2d3065b1562ee5e256fa163541955f7
Signed-off-by: Sean Mooney <work@seanmooney.info>
2026-02-17 18:22:22 +00:00
melanie witt 6827b763b4 run-evacuate-hook: Check cinder before creating BFV server
Awhile back change I52046e6f7acdfb20eeba67dda59cbb5169e5d17e disabled
cinder in the nova-ovs-hybrid-plug job and added checks for cinder
before attempting to run evacuate BFV tests.

Resource setup for BFV was however not bypassed and the attempt to
setup a BFV server resource fails with:

  keystoneauth1.exceptions.catalog.EndpointNotFound: publicURL endpoint
  for volumev3 service not found

This adds a bypass to avoid attempting to create a BFV server when
cinder is not available.

Change-Id: I52c7e5ce268bb38cee16c18c5523fe0e224970aa
2024-02-06 17:52:30 +00:00
Dan Smith bfdc99ffbb Install lxml before we need it in post-run
Change-Id: Ibf6bfde6c524821fa5dc3c01b2eb57635e587de6
Closes-Bug: #2039463
2023-10-16 08:32:56 -07:00
Amit Uniyal c486cc89dc Make our nova-ovs-hybrid-plug job omit cinder
modifies nova-ovs-hybrid-plug job to disable cinder and swift to
ensure we test for this going forward.

Change-Id: I52046e6f7acdfb20eeba67dda59cbb5169e5d17e
2023-09-13 12:23:43 -07:00
melanie witt e96ac439d3 Use OSC in run-evacuate-hook instead of novaclient
Recently a change landed in devstack [1] to install packages into a
global venv by default and the "nova" command was not symlinked for
compat, so jobs using run-evacuate-hook are failing with:

  nova: command not found

We had intended to switch away from using novaclient CLI commands in our
scripts anyway, so we can just use this opportunity to switch to OSC.

[1]: If9bc7ba45522189d03f19b86cb681bb150ee2f25

Change-Id: Ifd969b84a99a9c0460bceb1a28fcee6e51cbb4ae
2023-08-12 01:44:02 +00:00
Lucas Alvares Gomes 20a7c98eff [OVN] Adapt the live-migration job scripts to work with OVN
There's no q-agt service in an OVN deployment.

Change-Id: Ia25c966c70542bcd02f5540b5b94896c17e49888
Signed-off-by: Lucas Alvares Gomes <lucasagomes@gmail.com>
2021-03-15 09:41:03 +00:00
Lee Yarwood 76360e566b nova-live-migration: Disable *all* virt services during negative tests
libvirtd was being restarted on the controller during negative
evacuation tests that rely on the service being to cause an
evacuation failure.

This change adds various virt services to the list of services stopped
and now disabled on the host to ensure these don't cause systemd to
restart libvirtd:

* virtlogd.service
* virtlogd-admin.socket
* virtlogd.socket
* virtlockd.service
* virtlockd-admin.socket
* virtlockd.socket

Closes-Bug: #1903979
Change-Id: Ic83252bbda76c205bcbf0eef184ce0b201e224fc
2020-11-27 13:35:42 +00:00
Lee Yarwood 226250beb6 nova-evacuate: Disable libvirtd service and sockets during negative tests
The recent switch to Focal introduced a change in behaviour for the
libvirtd service that can now be restarted through new systemd socket
services associated with it once stopped. As we need it to remain
stopped during the initial negative evacuation tests on the controller
we now need to also stop these socket services and then later restart
them.

Change-Id: I2333872670e9e6c905efad7461af4d149f8216b6
2020-10-02 17:01:57 +00:00
Lee Yarwood f357d80407 zuul: Introduce nova-evacuate
This change reworks the evacuation parts of the original
nova-live-migration job into a zuulv3 native ansible role and initial
job covering local ephemeral and iSCSI/LVM volume attached instance
evacuation. Future jobs will cover ceph and other storage backends.

Change-Id: I380e9ca1e6a84da2b2ae577fb48781bf5c740e23
2020-09-23 16:47:47 +01:00
Matt Riedemann 7661995b69 Enable cross-cell resize in the nova-multi-cell job
This changes the nova-multi-cell job to essentially
force cross-cell resize and cold migration. By "force"
I mean there is only one compute in each cell and
resize to the same host is disabled, so the scheduler
has no option but to move the server to the other cell.

This adds a new role to write the nova policy.yaml file
to enable cross-cell resize and a pre-run playbook so
that the policy file setup before tempest runs.

Part of blueprint cross-cell-resize

Change-Id: Ia4f3671c40e69674afc7a96b5d9b198dabaa4224
2019-12-23 10:10:57 -05:00
Matt Riedemann cee072b962 Convert nova-next to a zuul v3 job
For the most part this should be a pretty straight-forward
port of the run.yaml. The most complicated thing is executing
the post_test_hook.sh script. For that, a new post-run playbook
and role are added.

The relative path to devstack scripts in post_test_hook.sh itself
had to drop the 'new' directory since we are no longer executing
the script through devstack-gate anymore the 'new' path does not
exist.

Change-Id: Ie3dc90862c895a8bd9bff4511a16254945f45478
2019-07-23 11:32:35 -04:00