Document native threading mode and tuneables

Change-Id: I003177de3a9f69c71c19eb8eaa7232785e03e669 Signed-off-by: Balazs Gibizer <gibi@redhat.com>
2025-05-09 16:30:55 +02:00
parent 6c03f9d1da
commit 8701a93743
3 changed files with 135 additions and 4 deletions
@@ -0,0 +1,106 @@
+Nova service concurrency
+========================
+
+For a long time nova services relied almost exclusively on the Eventlet library
+for processing multiple API requests, RPC requests and other tasks that needed
+concurrency. Since Eventlet is not expected to support the next major cPython
+version the OpenStack TC set a `goal`__ to replace Eventlet and therefore Nova
+has started transitioning its concurrency model to native threads. During this
+transition Nova maintains the Eventlet based concurrency mode while building
+up support for the native threading mode.
+
+.. __: https://governance.openstack.org/tc/goals/selected/remove-eventlet.html
+
+.. note::
+
+   The native threading mode is not ready yet. Do not use it in production.
+
+Selecting concurrency mode for a service
+----------------------------------------
+
+Nova still uses Eventlet by default, but allows switching services to native
+threading mode at service startup via setting the environment variable
+``OS_NOVA_DISABLE_EVENTLET_PATCHING=true``.
+
+.. note::
+
+   Since nova 32.0.0 (2025.2 Flamingo) the nova-scheduler can be switched to
+   native threading mode.
+
+
+Tunables for the native threading mode
+--------------------------------------
+As native threads are more expensive resources than greenthreads Nova provides
+a set of configuration options to allow fine tuning the deployment based on
+load and resource constraints. The default values are selected to support a
+basic, small deployment without consuming substantially  more memory resources,
+than the legacy Eventlet mode. Increasing the size of the below thread pools
+means that the given service will consume more memory but will also allow more
+tasks to be executed concurrently.
+
+* :oslo.config:option:`cell_worker_thread_pool_size`: Used to execute tasks
+  across all the cells within the deployment.
+
+  E.g. To generate the result of the ``openstack server list`` CLI command, the
+  nova-api service will use one native thread for each cell to load the nova
+  instances from the related cell database.
+
+  So if the deployment has many cells then the size of this pool probably needs
+  to be increased.
+
+  This option is only relevant for nova-api, nova-metadata, nova-scheduler, and
+  nova-conductor as these are the services doing cross cell operations.
+
+* :oslo.config:option:`executor_thread_pool_size`: Used to handle incoming RPC
+  requests. Services with many more inbound requests will need larger pools.
+  For example, a single conductor serves requests from many computes as well
+  as the scheduler. A compute node only serves requests from the API for
+  lifecycle operations and other computes during migrations.
+
+  This option is only relevant for nova-scheduler, nova-conductor, and
+  nova-compute as these are the services acting as RPC servers.
+
+* :oslo.config:option:`default_thread_pool_size`: Used by various concurrent
+  tasks in the service that are not categorized into the above pools.
+
+  This option is relevant to every nova service using ``nova.utils.spawn()``.
+
+Seeing the usage of the pools
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+When new work is submitted to any of these pools in both concurrency modes
+Nova logs the statistics of the pool (work executed, threads available,
+work queued, etc).
+This can be useful when fine tuning of the pool size is needed.
+The parameter :oslo.config:option:`thread_pool_statistic_period` defines how
+frequently such logging happens from a specific pool in seconds. A value of
+60 seconds means that stats will be logged from a pool maximum once every
+60 seconds. The value 0 means that logging happens every time work is submitted
+to the pool. The default value is -1 meaning that the stats logging is
+disabled.
+
+Preventing hanging threads
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Threads from a pool are not cancellable once they are executing a task,
+therefore it is important to ensure external dependencies cannot hold up a
+task execution indefinitely as that will lead to having fewer threads in the
+pool available for incoming work and therefore reduced overall capacity.
+
+Nova's RPC interface already uses proper timeout handling to avoid hanging
+threads. But adding timeout handling to the Nova's database interface is
+database server and database client library dependent.
+
+For mysql-server the `max_execution_time`__ configuration option can be used
+to limit the execution time of a database query on the server side. Similar
+options exist for other database servers.
+
+.. __: https://dev.mysql.com/doc/refman/8.4/en/server-system-variables.html#sysvar_max_execution_time
+
+For the pymysql database client a client side timeout can be implemented by
+adding the `read_timeout`__ connection parameter to the connection string.
+
+.. __: https://pymysql.readthedocs.io/en/latest/modules/connections.html#module-pymysql.connections
+
+We recommend using both in deployments where Nova services are running in
+native threading mode.
@@ -105,6 +105,9 @@ the defaults from the :doc:`install guide </install/index>` will be sufficient.
 * :doc:`Running nova-api on wsgi </user/wsgi>`: Considerations for using a real
  WSGI container instead of the baked-in eventlet web server.

+* :doc:`Nova service concurrency </admin/concurrency>`: Considerations on how
+  to use and tune Nova services in threading mode.
+
 .. toctree::
   :maxdepth: 2

@@ -113,6 +116,7 @@ the defaults from the :doc:`install guide </install/index>` will be sufficient.
   default-ports
   availability-zones
   configuration/index
+   concurrency


 Basic configuration