As a final patch for the series this adds release notes for the complete
feature.
Change-Id: I655f5144cbfa834ee089c474c5caa3cf8140354f
Implements: qos-minimum-guaranteed-packet-rate
Since I99a49b107b1872ddf83d1d8497a26a8d728feb07 the nova-next job fails
due to I missed a dependency between that neutron patch and
https://review.opendev.org/c/openstack/nova/+/802060 . So this patch
disable testing until the nova adaptation lands.
Change-Id: Ic28ef83f5193e6c1fbac1577ef58fe0d9e45694d
The nova-manage placement heal_allocations CLI is capable of healing
missing placement allocations due to port resource requests. To support
the new extended port resource request this code needs to be adapted
too.
When the heal_allocation command got the port resource request
support in train, the only way to figure out the missing allocations was
to dig into the placement RP tree directly. Since then nova gained
support for interface attach with such ports and to support that
placement gained support for in_tree filtering in allocation candidate
queries. So now the healing logic can be generalized to following:
For a given instance
1) Find the ports that has resource request but no allocation key in the
binding profile. These are the ports we need to heal
2) Gather the RequestGroups from the these ports and run an
allocation_candidates query restricted to the current compute of the
instance with in_tree filtering.
3) Extend the existing instance allocation with a returned allocation
candidate and update the instance allocation in placement.
4) Update the binding profile of these ports in neutron
The main change compared to the existing implementation is in step 2)
the rest mostly the same.
Note that support for old resource request format is kept alongside of
the new resource request format until Neutron makes the new format
mandatory.
blueprint: qos-minimum-guaranteed-packet-rate
Change-Id: I58869d2a5a4ed988fc786a6f1824be441dd48484
The signature of the lock() function gained a new kwargs in
oslo.concurrency 4.5 which makes some nova unit test failing. This patch
changes the mocking the unit test to mock in a higher level in the
locking infra to avoid depending on the exact signature.
This will enable
https://review.opendev.org/c/openstack/requirements/+/814889/ to land.
Change-Id: Icecb75d9a374b29f2cf70d3aa155dc6c92bf715f
When a compute node goes down and all instances on the compute node
are evacuated, allocation records about these instance are still left
in the source compute node until nova-compute service is again started
on the node. However if a compute node is completely broken, it is not
possible to start the service again.
In this situation deleting nova-compute service for the compute node
doesn't delete its resource provider record, and even if a user tries
to delete the resource provider, the delete request is rejected because
allocations are still left on that node.
This change ensures that remaining allocations left by successful
evacuations are cleared when deleting a nova-compute service, to avoid
any resource provider record left even if a compute node can't be
recovered. Migration records are still left in 'done' status to trigger
clean-up tasks in case the compute node is recovered later.
Closes-Bug: #1829479
Change-Id: I3ce6f6275bfe09d43718c3a491b3991a804027bd
The patch I03cf285ad83e09d88cdb702a88dfed53c01610f8 fixed most of the
possible cases for this to happen but missed one. An early enough
exception during _delete() can cause that the instance_uuid never gets
defined but then we try to use it during the finally block. This patch
moves the saving of the instance_uuid to the top of the try block to
avoid the issue.
Change-Id: Ib3073d7f595c8927532b7c49fc7e5ffe80d508b9
Closes-Bug: #1940812
Related-Bug: #1914777
The port.resource_request field is admin only. Nova depends on the
value of this field to do a proper scheduling and resource allocation
and deallocation for ports with resource request as well as to update
the port.binding:profile.allocation field with the resource providers
the requested resources are fulfilled from. However in some cases nova
does not use a neutron admin client / elevated context to read the
port. In this case neutron returns None for the port.resource_request
field and nova thinks that the port has no resource request.
This patch fixes all three places where previous testing showed that
context elevation was missing.
Change-Id: Icb35e20179572fb713a397b4605312cf3294b41b
Closes-Bug: #1945310
We're seeing quite a few timeout failures on the following tests in
'nova.tests.unit.db.main.test_migrations':
- TestModelsLegacySyncMySQL.test_models_sync
- TestMigrationsWalkMySQL.test_walk_versions
- TestModelsSyncMySQL.test_innodb_tables
- TestModelsSyncMySQL.test_models_sync
Evidently MySQL is particularly affected here. Test run times are slow
even on a relatively powered machine like my localhost (Lenovo T460s w/
Intel Core i7-6600U CPU + 20G RAM) and the CI machines are only making
matters worse. Local experiments with alternative MySQL libraries, such
'mysqlclient', did not yield any improvements in performance so we must
simply live with this for now. Do so by setting 'TIMEOUT_SCALING_FACTOR'
for these tests to 4, meaning these tests will now get a whopping 640
seconds (or over 10 minutes) to execute (we set OS_TEST_TIMEOUT to 160
in 'tox.ini'). We set this for both main and API DB migrations, even
though only the former is currently exhibiting issues, to head off
future problems. An alternative to this would be to override the timeout
on a test-by-test basis, as Cinder has done [1], but that seems more
complicated for no good reason. Yet another alternative would be to
reintroduce the serialization of these tests first introduced in change
I6ce930fa86c82da1008089791942b1fff7d04c18, but that is left until later
in the hopes that simply increasing the timeout will resolve the issues.
[1] https://github.com/openstack/cinder/blob/19.0.0/cinder/tests/unit/db/test_migrations.py
Change-Id: I82b9a064d77251945ff1ae99d7049f367ddde92e
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
These fields were never used in the API database. They can be removed
now, some years after originally intended.
Change-Id: I781875022d37d2c0626347f42c87707a29a9ab21
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
We have a policy of removing fields from SQLAlchemy models at least one
cycle before we remove the underlying database columns. This can result
in a discrepancy between the state that our newfangled database
migration tool, alembic, sees and what's actually in the database. We
were ignoring these removed fields (and one foreign key constraint) in
two different locations for both databases: as part of the alembic
configuration and as part of the tests we have to ensure our migrations
are in sync with our models (note: the tests actually use the alembic
mechanism to detect the changes [1]). De-duplicate these.
[1] https://github.com/sqlalchemy/alembic/issues/724#issuecomment-672081357
Change-Id: I978b4e44cf7f522a70cc5ca76e6d6f1a985d5469
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Change I18846a5c7557db45bb63b97c7e8be5c4367e4547 enabled auto-generation
of migrations for the main database. Let's now extend this to the API
database using the same formula. While we're here, we also enable
"batch" migrations for SQLite [1] by default, which allow us to work
around SQLite's inability to support the ALTER statement for all but a
limited set of cases. As noted in the documentation [2], this will have
no impact on other backends where "we'd see the usual 'ALTER' statements
done as though there were no batch directive".
[1] https://stackoverflow.com/a/31140916/613428
[2] https://alembic.sqlalchemy.org/en/latest/batch.html
Change-Id: I51c3a53286a0eced4bf57ad4fc13ac5f3616f7eb
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
In out functional tests we run nova services as eventlets. Also those
services can spawn there own eventlets for RPC or other parallel
processing. The test case executor only sees and tracks the main
eventlet where the code of the test case is running. When that is
finishes the test executor considers the test case to be finished
regardless of the other spawned eventlets. This could lead to leaked
eventlets that are running in parallel with later test cases.
One way that it can cause trouble is via the global variables in
nova.rpc module. Those globals are re-initialized for each test case so
they are not directly leaking information between test cases. However if
a late eventlet calls nova.rpc.get_versioned_notifier() it will get a
totally usable FakeVersionedNotifier object regardless of which test
case this notifier is belongs to or which test case the eventlet belongs
to. This way the late eventlet can send a notification to the currently
running test case and therefore can make it fail.
The current case we saw is the following:
1) The test case
nova.tests.functional.test_servers.ServersTestV219.test_description_errors
creates a server but don't wait for it to reach terminal state (ACTIVE
/ ERROR). This test case finishes quickly but leaks running eventlets
in the background waiting for some RPC call to return.
2) As the test case finished the cleanup code deletes the test case
specific setup, including the DB.
3) The test executor moves forward and starts running another test case
4) 60 seconds later the leaked eventlet times out waiting for the RPC
call to return and tries doing things, but fails as the DB is already
gone. Then it tries to report this as an error notification. It calls
nova.rpc.get_versioned_notifier() and gets a fresh notifier that is
connected to the currently running test case. Then emits the error
notification there.
5) The currently running test case also waits for an error notification
to be triggered by the currently running test code. But it gets the
notification form the late eventlet first. As the content of the
notification does not match with the expectations the currently
running test case fails. The late eventlet prints a lot of
error about the DB being gone making the troubleshooting pretty hard.
This patch proposes a way to fix this by marking each eventlet at spawn
time with the id of the test case that was directly or indirectly
started it.
Then when the NotificationFixture gets a notification it compares the
test case id stored in the calling eventlet with the id of the test case
initialized the NotificationFixture. If the two ids do not match then
the fixture ignores the notification and raises an exception to the
caller eventlet to make it terminate.
Change-Id: I012dcf63306bae624dc4f66aae6c6d96a20d4327
Closes-Bug: #1946339
At the moment, oslo.reports is enabled when running nova-api
standalone, but not when using uWSGI.
We're now updating the uwsgi entry point as well to include the
oslo.reports hook, which is extremely helpful when debugging
deadlocks.
Change-Id: I605f0e40417fe9b0a383cc8b3fefa1325f9690d9
Currently neutron can report ports to have MAC addresses
in upper case when they're created like that. In the meanwhile
libvirt configuration file always stores MAC in lower case
which leads to KeyError while trying to retrieve migrate_vif.
Closes-Bug: #1945646
Change-Id: Ie3129ee395427337e9abcef2f938012608f643e1
As documented in the api-ref this is the API end users would poll to
determine the state of a volume operation so use it in our func tests.
Change-Id: I88e827603bb1cd6265bf234f4286ba3b204c3ce1
When Cinder orchestrates a volume migration between backends it
initially creates a temporary volume on the destination before calling
Nova to swap to that volume. When this is complete Nova calls back to
Cinder and the temporary volume on the destination is renamed to the
original volume UUID making the migration transparent to end users.
Previously Nova would not account for this within the connection_info
stahed when connecting the new volume and would continue to point to the
original UUID of the temporary volume. For most codepaths in Nova this
isn't an issue but when dealing with multiattach volumes the libvirt
driver has a specific path that uses this stored volume_id within the
connection_info when an attempt is made to detach the volume,
nova.virt.libvirt.LibvirtDriver._should_disconnect_target. In this case
this would lead to a failed lookup of the volume in Cinder and an
eventual 500 returned by Nova.
This change corrects this by ensuring any volume_id stashed in the
new connection_info we gather during a swap_volume is overwritten with
the correct id returned by the eventual call to Cinder's
os-migrate_volume_completion API [1].
[1] https://docs.openstack.org/api-ref/block-storage/v3/index.html?expanded=complete-migration-of-a-volume-detail#volumes-volumes
Co-Authored-By: Lee Yarwood <lyarwood@redhat.com>
Closes-Bug: #1943431
Change-Id: I43612714b343d98320b19b5b38264afc700790e3