We recently added a hard failure to nova service startup for the case
where computes were more than one version old (as indicated by their
service record). This helps to prevent starting up new control
services when a very old compute is still running. However, during an
FFU, control services that have skipped multiple versions will be
started and find the older compute records (which could not be updated
yet due to their reliance on the control services being up) and refuse
to start. This creates a cross-dependency which is not resolvable
without hacking the database.
This patch adds a workaround flag to allow turning that hard fail into
a warning to proceed past the issue. This less-than-ideal solution
is simple and backportable, but perhaps a better solution can be
implemented for the future.
Related-Bug: #1958883
Change-Id: Iddbc9b2a13f19cea9a996aeadfe891f4ef3b0264
The commit I168fffac8002f274a905cfd53ac4f6c9abe18803 added a wrapper
around fasteners.ReaderWriterLock to fix up an issue with eventlet. But
the wrapper was added to nova.utils module that is use not only by the
nova tests but also the nova production code. This made the fixture
library a dependency of the nova production code. While the current
ReaderWriterLock usage only limited to the nova test sub tree. The
I712f88fc1b6053fe6d1f13e708f3bd8874452a8f commit fix the issue of not
having fixtures in the nova requirements.txt. However I think a better
fix is to move the wrapper to the test subtree instead. This patch does
that and restores the state of the requirements.txt
Change-Id: I6903ce53b9b91325f7268cf2ebd02e4488579560
Related-Bug: #1958075
The commit 887c445a7a made the nova.utils
module dependent on the fixtures library but the change missed updating
requirements and the fixtures library is not installed automatically.
This change migrates the fixtures library from test-requirements.txt to
requirements.txt so that the library is installed without test codes.
Closes-Bug: #1958075
Change-Id: I712f88fc1b6053fe6d1f13e708f3bd8874452a8f
This is a follow up change to I168fffac8002f274a905cfd53ac4f6c9abe18803
which added a hackaround to enable our tests to pass with
fasteners>=0.15 which was upgraded recently as part of a
openstack/requirements update.
The ReaderWriterLock from fasteners (and thus lockutils) cannot work
correctly with eventlet patched code, so this adds a wrapper containing
the aforementioned hackaround along with a hacking check to do our best
to ensure that future use of ReaderWriterLock will be through the
wrapper.
Change-Id: Ia7bcb40a21a804c7bc6b74f501d95ce2a88b09b5
This patch adds a regression test which asserts that if a live migration
is aborted while it's 'queued', the instance's status is never reverted
back to ACTIVE, and instance remains in MIGRATING state.
There is simple idea behind implemented LiveMigrationQueuedAbortTest:
we start two instances on the same compute and try to migrate
them simultaneously when max_concurrent_live_migrations is set to 1
and nova.tests.fixtures.libvirt.Domain.migrateToURI3 is locked.
As a result, we get two live migrations stuck in 'migrating' and
'queued' states and we can issue API call to abort the second one.
Lock is removed and first instance is live migrated after second
instance's live migration is aborted.
Co-Authored-By: Alex Stupnikov <aleksey.stupnikov@gmail.com>
Partial-Bug: #1949808
Change-Id: I67d41a8e439b1ff3c5983ee17823616b80698639
This patch adds a workaround that can be enabled
to send an announce_self QEMU monitor command
post live-migration to send out RARP frames
that was lost due to port binding or flows not
being installed.
Please note that this makes marks the domain
in libvirt as tainted.
See previous information about this issue in
the [1] bug.
[1] https://bugs.launchpad.net/nova/+bug/1815989
Change-Id: I7a6a6fe5f5b23e76948b59a85ca9be075a1c2d6d
Related-Bug: 1815989
As of fasteners >= 0.15, the workaround code to use eventlet.getcurrent
if eventlet patching is detected has been removed and
threading.current_thread is being used instead [1]. Although we are
running in a greenlet in our test environment, we are not running in a
greenlet of type GreenThread. A GreenThread is created by calling
eventlet.spawn and spawn is not used to run our tests. At the time of
this writing, the eventlet patched threading.current_thread method
falls back to the original unpatched current_thread method if it is not
called from a GreenThead [2] and that breaks our tests involving this
fixture. We can work around this by patching threading.current_thread
with eventlet.getcurrent during creation of the lock object.
[1] https://github.com/harlowja/fasteners/commit/467ed75ee1e9465ebff8b5edf452770befb93913
[2] https://github.com/eventlet/eventlet/blob/v0.32.0/eventlet/green/threading.py#L128
Change-Id: I168fffac8002f274a905cfd53ac4f6c9abe18803
You can currently remove a host that has instances scheduled to it from
an aggregate. If the aggregate is configured as part of an availability
zone (AZ), this would in turn remove the host from the AZ, leaving
instances originally scheduled to that AZ stranded on a host that is no
longer a member of the AZ. This is clearly undesirable and should be
blocked at the API level.
You can also add a host to an aggregate where it wasn't in one before.
Because nova provides a default AZ for hosts that don't belong to an
aggregate, adding a host to an aggregate doesn't just assign it to an
AZ, it removes it from the default 'nova' one (or whatever you've
configured via '[DEFAULT] default_availability_zone'). As noted in the
docs [1], people should not rely on scheduling to the default AZ, but if
they had, we'd end up in the same situation as above.
Add tests for both, with a fix coming after.
[1] https://docs.openstack.org/nova/latest/admin/availability-zones.html
Change-Id: I21f7f93ee0ec0cd3a290afba59342b31d074cf2f
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Related-Bug: #1907775
There is a race condition between an incoming resize and an
update_available_resource periodic in the resource tracker. The race
window starts when the resize_instance RPC finishes and ends when the
finish_resize compute RPC finally applies the migration context on the
instance.
In the race window, if the update_available_resource periodic is run on
the destination node, then it will see the instance as being tracked on
this host as the instance.node is already pointing to the dest. But the
instance.numa_topology still points to the source host topology as the
migration context is not applied yet. This leads to CPU pinning error if
the source topology does not fit to the dest topology. Also it stops the
periodic task and leaves the tracker in an inconsistent state. The
inconsistent state only cleanup up after the periodic is run outside of
the race window.
This patch applies the migration context temporarily to the specific
instances during the periodic to keep resource accounting correct.
Change-Id: Icaad155e22c9e2d86e464a0deb741c73f0dfb28a
Closes-Bug: #1953359
Closes-Bug: #1952915
This patch extends the original reproduction
I4be429c56aaa15ee12f448978c38214e741eae63 to cover
bug 1952915 as well as they have a common root cause.
Change-Id: I57982131768d87e067d1413012b96f1baa68052b
Related-Bug: #1953359
Related-Bug: #1952915
As we already discussed at the PTG, the consensus was to accept contributors
to use this label for asking cores to review some changes.
Documenting it first so a dependent patch would then modify Gerrit once
we agree.
Change-Id: I38e999954e2c91d049e1af5cda6dd0b4f8168a0e
This patch adds a functional test that reproduces a race between
incoming migration and the update_available_resource periodic
Change-Id: I4be429c56aaa15ee12f448978c38214e741eae63
Related-Bug: #1953359
When suspending a VM in OpenStack, Nova detaches all the mediated
devices from the guest machine, but does not reattach them on the resume
operation. This patch makes Nova reattach the mdevs that were detached
when the guest was suspended.
This behavior is due to libvirt not supporting the hot-unplug of
mediated devices at the time the feature was being developed. The
limitation has been lifted since then, and now we have to amend the
resume function so it will reattach the mediated devices that were
detached on suspension.
Closes-bug: #1948705
Signed-off-by: Gustavo Santos <gustavofaganello.santos@windriver.com>
Change-Id: I083929f36d9e78bf7713a87cae6d581e0d946867
This change adds a simple [cinder]debug configurable to allow
cinderclient and os_brick to be made to log at DEBUG independently of
the rest of Nova.
Change-Id: I84f5b73adddf42831f1d9e129c25bf955e6eda78