Adding more tests for graceful shutdown:
- shutdown the destination compute and see how live and cold migration
progress
- start build instance and ocne comoute start building instance then
shutdown the comoute service and see if build instance finish or not.
- revert resize server
Partial implement blueprint nova-services-graceful-shutdown-part1
Change-Id: I57132fb7b7fa614dfc138508581ff5a67aaed906
Signed-off-by: Ghanshyam Maan <gmaan.os14@gmail.com>
During graceful shutdown, compute service keep a 2nd RPC
server active which can be used to finish the in-progress
operations. Like live migration, resize and cold migrations
also perform RPC call among source and destination compute.
For those operation also, we can use 2nd RPC server and make
sure they will be completed during graceful shutdown.
A quick overview of what all RPC methods are involved in the
resize/cold migration and what all will be using 2nd RPC server:
Resize/cold migration
- prep_resize: No, resize/migration is not started yet.
- resize_instance: Yes, here the resize/migration starts.
- finish_resize: Yes
- cross cell resize case:
- prep_snapshot_based_resize_at_dest: NO, this is initial check and
migration is not started
- prep_snapshot_based_resize_at_source: Yes, this start the migration
Confirm resize: NO
- confirm_resize: NO
- cross cell confirm resize case:
- confirm_snapshot_based_resize - NO
Revert resize:
- revert_resize - NO
- check_instance_shared_storage: YES. This is called from dest to source
so we need source to respond to it so that revert can continue.
- finish_revert_resize on source- YES, at this stage, revert resize is
in progress and abandoning it here can lead migration to unreocverable
state.
- cross cell revert case:
- revert_snapshot_based_resize_at_dest: NO
- finish_revert_snapshot_based_resize_at_source: YES
Partial implement blueprint nova-services-graceful-shutdown-part1
Change-Id: If08b698d012a75b587144501d829403ec616f685
Signed-off-by: Ghanshyam Maan <gmaan.os14@gmail.com>
For graceful shutdown of compute service, it will have two RPC servers.
One RPC server is used for the new requests which will be stopped during
graceful shutdown and 2nd RPC server (listen on 'compute-alt' topic)
will be used to complete the in-progress operations.
We select the operations (case by case) and their RPC method to use
the 2nd PRC server so that they will not be interupted on shutdown
initiative and graceful shutdown time will keep 2nd RPC server active
for graceful_shutdown_timeout. A new method 'prepare_for_alt_rpcserver'
is added which will fallback to first RPC server if it detect the old
compute.
As this is upgrade impact, it bumps the compute/service version, adds
releasenotes for the same.
The list of operations who should use the 2nd RPC server will grow
evanutally and this commit moves the below operations to use the 2nd
RPC server:
* Live migration
- Live migration: It use 2nd RPC servers and will try to complete
the operation during shutdown.
- live_migration_force_complete does not need to use 2nd RPC server.
It is direct RPC request from API to compute and if that is
rejected during shutdown, it is fine and can be initiated again
once compute is up.
- live_migration_abort does not need to use 2nd RPC server. Ditto,
it is direct RPC request from API to compute. It cancel the queue
live migration but if migration is already started, then driver
cancel the migration. If it is rejected during shutdown because of
RPC is stopped, it is fine and can be initiated again.
* server external event
* Get server console
As graceful shutdown cannot be tested in tempest, this adds a new job
to test it. Currently it test the live migration operation which can
be extended to other operations who will use 2nd RPC server.
Partial implement blueprint nova-services-graceful-shutdown-part1
Change-Id: I4de3afbcfaefbed909a29a831ac18060c4a73246
Signed-off-by: Ghanshyam Maan <gmaan.os14@gmail.com>