tests: Use GreenThreadPoolExecutor.shutdown(wait=True)

We are still having some issues in the gate where greenlets from
previous tests continue to run while the next test starts, causing
false negative failures in unit or functional test jobs.

This adds a new fixture that will ensure
GreenThreadPoolExecutor.shutdown() is called with wait=True, to wait
for greenlets in the pool to finish running before moving on.

In local testing, doing this does not appear to adversely affect test
run times, which was my primary concern.

As a baseline, I ran a subset of functional tests in a loop
until failure without the patch and after 11 hours, I got a failure
reproducing the bug. With the patch, running the same subset of
functional tests in a loop has been running for 24 hours and has not
failed yet.

Based on this, I think it may be worth trying this out to see if it
will help stability of our unit and functional test jobs. And if it
ends up impacting test run times or causes other issues, we can
revert it.

Partial-Bug: #1946339

Change-Id: Ia916310522b007061660172fa4d63d0fde9a55ac
This commit is contained in:
melanie witt
2023-05-11 19:23:52 +00:00
parent e9a54ff350
commit c095cfe04e
2 changed files with 42 additions and 0 deletions
+7
View File
@@ -317,6 +317,13 @@ class TestCase(base.BaseTestCase):
# all other tests.
scheduler_utils.reset_globals()
# Wait for bare greenlets spawn_n()'ed from a GreenThreadPoolExecutor
# to finish before moving on from the test. When greenlets from a
# previous test remain running, they may attempt to access structures
# (like the database) that have already been torn down and can cause
# the currently running test to fail.
self.useFixture(nova_fixtures.GreenThreadPoolShutdownWait())
def _setup_cells(self):
"""Setup a normal cellsv2 environment.
+35
View File
@@ -1938,3 +1938,38 @@ class ComputeNodeIdFixture(fixtures.Fixture):
'nova.compute.manager.ComputeManager.'
'_ensure_existing_node_identity',
mock.DEFAULT))
class GreenThreadPoolShutdownWait(fixtures.Fixture):
"""Always wait for greenlets in greenpool to finish.
We use the futurist.GreenThreadPoolExecutor, for example, in compute
manager to run live migration jobs. It runs those jobs in bare greenlets
created by eventlet.spawn_n(). Bare greenlets cannot be killed the same
way as GreenThreads created by eventlet.spawn().
Because they cannot be killed, in the test environment we must either let
them run to completion or move on while they are still running (which can
cause test failures as the leaked greenlets attempt to access structures
that have already been torn down).
When a compute service is stopped by Service.stop(), the compute manager's
cleanup_host() method is called and while cleaning up, the compute manager
calls the GreenThreadPoolExecutor.shutdown() method with wait=False. This
means that a test running GreenThreadPoolExecutor jobs will not wait for
the bare greenlets to finish running -- it will instead move on immediately
while greenlets are still running.
This fixture will ensure GreenThreadPoolExecutor.shutdown() is always
called with wait=True in an effort to reduce the number of leaked bare
greenlets.
See https://bugs.launchpad.net/nova/+bug/1946339 for details.
"""
def setUp(self):
super().setUp()
real_shutdown = futurist.GreenThreadPoolExecutor.shutdown
self.useFixture(fixtures.MockPatch(
'futurist.GreenThreadPoolExecutor.shutdown',
lambda self, wait: real_shutdown(self, wait=True)))