Merge "Use 2nd RPC server in compute operations"
This commit is contained in:
@@ -0,0 +1,55 @@
|
||||
---
|
||||
features:
|
||||
- |
|
||||
Nova services now support graceful shutdown on ``SIGTERM``. When a service
|
||||
receives ``SIGTERM``, it will stop accepting new RPC requests and wait for
|
||||
in-progress tasks to reach a safe termination point.
|
||||
|
||||
The compute service creates a second RPC server on an ``compute-alt`` topic
|
||||
which remains active during graceful shutdown, allowing compute service to
|
||||
finish the in-progress tasks.
|
||||
|
||||
Currently below operations are using second RPC server:
|
||||
|
||||
* Live migration
|
||||
* Server external Event
|
||||
* Get Console output
|
||||
|
||||
Nova added two new configuration options which will control this behavior:
|
||||
|
||||
* ``[DEFAULT]/graceful_shutdown_timeout`` - The overall time the service
|
||||
waits before forcefully exit. This is defaults to 180 seconds for each
|
||||
Nova services.
|
||||
* ``[DEFAULT]/manager_shutdown_timeout`` - The time the service manager
|
||||
waits for in-progress tasks to complete during graceful shutdown. This
|
||||
is defaults to 160 seconds for each service manager. This must be less
|
||||
than ``graceful_shutdown_timeout``.
|
||||
|
||||
You can increase these timeouts based on the traffic and how long the
|
||||
long-running (e.g. live migrations) tasks take in your deployment.
|
||||
|
||||
We plan to improve the graceful shutdown in future releases by task
|
||||
tracking and transitioning resources to a recoverable state. Until then,
|
||||
this feature is experimental.
|
||||
upgrade:
|
||||
- |
|
||||
The default value of ``[DEFAULT]/graceful_shutdown_timeout`` has been
|
||||
changed from 60 to 180 seconds for all Nova services. This means that
|
||||
when a Nova service receives ``SIGTERM``, it will now wait up to 180
|
||||
seconds for a graceful shutdown before being forcefully terminated.
|
||||
Operators using external system (e.g. k8s, systemd) to manage the
|
||||
Nova serviecs should ensure that their service stop timeouts are set
|
||||
to at least ``graceful_shutdown_timeout`` to avoid forcefully killing
|
||||
service before Nova finish its graceful shutdown. For example, the
|
||||
systemd ``TimeoutStopSec`` should be set to at least 180 seconds (or
|
||||
greater) for Nova services.
|
||||
- |
|
||||
A new configuration option ``[DEFAULT]/manager_shutdown_timeout`` has been
|
||||
added with a default value of 160 seconds. This controls how long the
|
||||
service manager waits for in-progress tasks to finish during graceful
|
||||
shutdown. Operators may want to tune this value based on how long their
|
||||
typical long-running operations (e.g. live migrations) take to complete.
|
||||
- |
|
||||
The compute service now creates a second RPC server on the ``compute-alt``
|
||||
topic. This means each compute worker will create an additional RabbitMQ
|
||||
queue.
|
||||
Reference in New Issue
Block a user