Commit Graph

59272 Commits

Author SHA1 Message Date
zhangbailin f024490e95 [Trival] Fix wrong microversion in TestClass name
Change-Id: I0dce3963c7726cf126ad2c4b3e25f005fdd12790
2021-11-05 10:10:50 +08:00
Zuul ee91d92f91 Merge "Clean up allocations left by evacuation when deleting service" 2021-11-04 14:20:28 +00:00
Zuul e2b1581d8c Merge "fup: Refactor and simplify Cinder fixture GET volume mock" 2021-11-04 13:25:04 +00:00
Zuul 658d544611 Merge "fup: Move _wait_for_volume_{attach,detach} to os-volume_attachments" 2021-11-04 13:24:55 +00:00
Zuul 858b700ba5 Merge "compute: Update volume_id within connection_info during swap_volume" 2021-11-04 13:24:47 +00:00
Zuul 909cfc7636 Merge "api: enable oslo.reports when using uWSGI" 2021-11-03 21:57:17 +00:00
Zuul d293b8ab21 Merge "Prevent leaked eventlets to send notifications" 2021-11-03 20:49:13 +00:00
Zuul e463f43bb0 Merge "db: Remove unused build_requests columns" 2021-11-03 19:52:36 +00:00
Zuul f4e31bc0a0 Merge "Reno for qos-minimum-guaranteed-packet-rate" 2021-11-03 19:51:53 +00:00
Zuul ff4b396abf Merge "Avoid unbound instance_uuid var during delete" 2021-11-03 19:51:32 +00:00
Zuul 9c2d9fd1eb Merge "db: De-duplicate list of removed table columns" 2021-11-03 09:36:48 +00:00
Zuul 2030dd63cd Merge "db: Enable auto-generation of API DB migrations" 2021-11-03 09:34:58 +00:00
Zuul 0088f6aa0f Merge "Revert "Temp disable nova-manage placement heal_allocation testing"" 2021-11-02 12:42:48 +00:00
Zuul 4bed410ce6 Merge "Query ports with admin client to get resource_request" 2021-11-02 12:42:39 +00:00
Zuul 438e0983aa Merge "Fix unit test for oslo.concurrency 4.5" 2021-11-02 12:42:18 +00:00
Balazs Gibizer 9c2cb1fd4f Reno for qos-minimum-guaranteed-packet-rate
As a final patch for the series this adds release notes for the complete
feature.

Change-Id: I655f5144cbfa834ee089c474c5caa3cf8140354f
Implements: qos-minimum-guaranteed-packet-rate
2021-11-02 10:23:58 +00:00
Balazs Gibizer 5725297e12 Revert "Temp disable nova-manage placement heal_allocation testing"
This reverts commit 45e71fb9ce.

Reason for revert: https://review.opendev.org/c/openstack/nova/+/802060 is landed now, so the heal allocation tests should work again

Change-Id: I829e7466cec3f768ad13dcf793315fb029126a2e
2021-11-02 07:05:52 +00:00
Zuul 82be4652e2 Merge "[nova-manage]support extended resource request" 2021-11-01 22:58:03 +00:00
Balazs Gibizer 45e71fb9ce Temp disable nova-manage placement heal_allocation testing
Since I99a49b107b1872ddf83d1d8497a26a8d728feb07 the nova-next job fails
due to I missed a dependency between that neutron patch and
https://review.opendev.org/c/openstack/nova/+/802060 . So this patch
disable testing until the nova adaptation lands.

Change-Id: Ic28ef83f5193e6c1fbac1577ef58fe0d9e45694d
2021-11-01 11:26:55 +00:00
Balazs Gibizer 90ed7e574d [nova-manage]support extended resource request
The nova-manage placement heal_allocations CLI is capable of healing
missing placement allocations due to port resource requests. To support
the new extended port resource request this code needs to be adapted
too.

When the heal_allocation command got the port resource request
support in train, the only way to figure out the missing allocations was
to dig into the placement RP tree directly. Since then nova gained
support for interface attach with such ports and to support that
placement gained support for in_tree filtering in allocation candidate
queries. So now the healing logic can be generalized to following:

For a given instance
1) Find the ports that has resource request but no allocation key in the
   binding profile. These are the ports we need to heal
2) Gather the RequestGroups from the these ports and run an
   allocation_candidates query restricted to the current compute of the
   instance with in_tree filtering.
3) Extend the existing instance allocation with a returned allocation
   candidate and update the instance allocation in placement.
4) Update the binding profile of these ports in neutron

The main change compared to the existing implementation is in step 2)
the rest mostly the same.

Note that support for old resource request format is kept alongside of
the new resource request format until Neutron makes the new format
mandatory.

blueprint: qos-minimum-guaranteed-packet-rate

Change-Id: I58869d2a5a4ed988fc786a6f1824be441dd48484
2021-11-01 09:20:30 +01:00
Balazs Gibizer be9b022bfc Fix unit test for oslo.concurrency 4.5
The signature of the lock() function gained a new kwargs in
oslo.concurrency 4.5 which makes some nova unit test failing. This patch
changes the mocking the unit test to mock in a higher level in the
locking infra to avoid depending on the exact signature.

This will enable
https://review.opendev.org/c/openstack/requirements/+/814889/ to land.

Change-Id: Icecb75d9a374b29f2cf70d3aa155dc6c92bf715f
2021-10-29 09:37:18 +02:00
Federico Ressi 171138146a Check Nova project changes with Tobiko scenario test cases
Change-Id: I30fd6563292520865545a10ae9af32c765b314da
2021-10-28 07:58:12 +00:00
Zuul c79d6366d1 Merge "Fix instance's image_ref lost on failed unshelving" 2021-10-27 18:29:57 +00:00
Zuul 2f644a82fe Merge "Ensure MAC addresses characters are in the same case" 2021-10-27 18:03:19 +00:00
Takashi Kajinami e5a34fffdf Clean up allocations left by evacuation when deleting service
When a compute node goes down and all instances on the compute node
are evacuated, allocation records about these instance are still left
in the source compute node until nova-compute service is again started
on the node. However if a compute node is completely broken, it is not
possible to start the service again.
In this situation deleting nova-compute service for the compute node
doesn't delete its resource provider record, and even if a user tries
to delete the resource provider, the delete request is rejected because
allocations are still left on that node.

This change ensures that remaining allocations left by successful
evacuations are cleared when deleting a nova-compute service, to avoid
any resource provider record left even if a compute node can't be
recovered. Migration records are still left in 'done' status to trigger
clean-up tasks in case the compute node is recovered later.

Closes-Bug: #1829479
Change-Id: I3ce6f6275bfe09d43718c3a491b3991a804027bd
2021-10-26 19:26:19 +09:00
Zuul 00452a403b Merge "Reproducer unit test for bug 1934094" 2021-10-25 16:54:36 +00:00
Balazs Gibizer 14e43f385e Avoid unbound instance_uuid var during delete
The patch I03cf285ad83e09d88cdb702a88dfed53c01610f8 fixed most of the
possible cases for this to happen but missed one. An early enough
exception during _delete() can cause that the instance_uuid never gets
defined but then we try to use it during the finally block. This patch
moves the saving of the instance_uuid to the top of the try block to
avoid the issue.

Change-Id: Ib3073d7f595c8927532b7c49fc7e5ffe80d508b9
Closes-Bug: #1940812
Related-Bug: #1914777
2021-10-20 09:48:07 +00:00
Balazs Gibizer 49b481ec98 Query ports with admin client to get resource_request
The port.resource_request field is admin only. Nova depends on the
value of this field to do a proper scheduling and resource allocation
and deallocation for ports with resource request as well as to update
the port.binding:profile.allocation field with the resource providers
the requested resources are fulfilled from. However in some cases nova
does not use a neutron admin client / elevated context to read the
port. In this case neutron returns None for the port.resource_request
field and nova thinks that the port has no resource request.

This patch fixes all three places where previous testing showed that
context elevation was missing.

Change-Id: Icb35e20179572fb713a397b4605312cf3294b41b
Closes-Bug: #1945310
2021-10-20 11:39:23 +02:00
Pierre-Samuel Le Stang 518b952bde Fix instance's image_ref lost on failed unshelving
Closes-Bug: #1934094

Signed-off-by: Pierre-Samuel Le Stang <pierre-samuel.le-stang@corp.ovh.com>
Change-Id: Id908a7224ff3378b3b51726bbfa8b5805d38ca59
2021-10-20 09:12:02 +02:00
Pierre-Samuel Le Stang 78e10f5f14 Reproducer unit test for bug 1934094
Related bug: #1934094

Signed-off-by: Pierre-Samuel Le Stang <pierre-samuel.le-stang@corp.ovh.com>
Change-Id: I8f7f26e10519c87e06de47bd4d9845151f889129
2021-10-19 16:34:52 +02:00
Stephen Finucane fb083138eb db: Increase timeout for migration tests
We're seeing quite a few timeout failures on the following tests in
'nova.tests.unit.db.main.test_migrations':

- TestModelsLegacySyncMySQL.test_models_sync
- TestMigrationsWalkMySQL.test_walk_versions
- TestModelsSyncMySQL.test_innodb_tables
- TestModelsSyncMySQL.test_models_sync

Evidently MySQL is particularly affected here. Test run times are slow
even on a relatively powered machine like my localhost (Lenovo T460s w/
Intel Core i7-6600U CPU + 20G RAM) and the CI machines are only making
matters worse. Local experiments with alternative MySQL libraries, such
'mysqlclient', did not yield any improvements in performance so we must
simply live with this for now. Do so by setting 'TIMEOUT_SCALING_FACTOR'
for these tests to 4, meaning these tests will now get a whopping 640
seconds (or over 10 minutes) to execute (we set OS_TEST_TIMEOUT to 160
in 'tox.ini'). We set this for both main and API DB migrations, even
though only the former is currently exhibiting issues, to head off
future problems. An alternative to this would be to override the timeout
on a test-by-test basis, as Cinder has done [1], but that seems more
complicated for no good reason. Yet another alternative would be to
reintroduce the serialization of these tests first introduced in change
I6ce930fa86c82da1008089791942b1fff7d04c18, but that is left until later
in the hopes that simply increasing the timeout will resolve the issues.

[1] https://github.com/openstack/cinder/blob/19.0.0/cinder/tests/unit/db/test_migrations.py

Change-Id: I82b9a064d77251945ff1ae99d7049f367ddde92e
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
2021-10-19 11:08:41 +01:00
Stephen Finucane 9657297dd6 db: Remove unused build_requests columns
These fields were never used in the API database. They can be removed
now, some years after originally intended.

Change-Id: I781875022d37d2c0626347f42c87707a29a9ab21
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
2021-10-18 20:26:18 +01:00
Stephen Finucane a4d7f70740 db: De-duplicate list of removed table columns
We have a policy of removing fields from SQLAlchemy models at least one
cycle before we remove the underlying database columns. This can result
in a discrepancy between the state that our newfangled database
migration tool, alembic, sees and what's actually in the database. We
were ignoring these removed fields (and one foreign key constraint) in
two different locations for both databases: as part of the alembic
configuration and as part of the tests we have to ensure our migrations
are in sync with our models (note: the tests actually use the alembic
mechanism to detect the changes [1]). De-duplicate these.

[1] https://github.com/sqlalchemy/alembic/issues/724#issuecomment-672081357

Change-Id: I978b4e44cf7f522a70cc5ca76e6d6f1a985d5469
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
2021-10-18 20:26:18 +01:00
Stephen Finucane 60b977b76d db: Enable auto-generation of API DB migrations
Change I18846a5c7557db45bb63b97c7e8be5c4367e4547 enabled auto-generation
of migrations for the main database. Let's now extend this to the API
database using the same formula. While we're here, we also enable
"batch" migrations for SQLite [1] by default, which allow us to work
around SQLite's inability to support the ALTER statement for all but a
limited set of cases. As noted in the documentation [2], this will have
no impact on other backends where "we'd see the usual 'ALTER' statements
done as though there were no batch directive".

[1] https://stackoverflow.com/a/31140916/613428
[2] https://alembic.sqlalchemy.org/en/latest/batch.html

Change-Id: I51c3a53286a0eced4bf57ad4fc13ac5f3616f7eb
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
2021-10-18 20:26:18 +01:00
Zuul e14eef0719 Merge "Fix the wrong exception used to retry detach API calls" 2021-10-18 17:16:44 +00:00
Zuul a54ec71d23 Merge "db: Add migration to resolve shadow table discrepancies" 2021-10-18 16:21:43 +00:00
Balazs Gibizer 61fc81a676 Prevent leaked eventlets to send notifications
In out functional tests we run nova services as eventlets. Also those
services can spawn there own eventlets for RPC or other parallel
processing. The test case executor only sees and tracks the main
eventlet where the code of the test case is running. When that is
finishes the test executor considers the test case to be finished
regardless of the other spawned eventlets. This could lead to leaked
eventlets that are running in parallel with later test cases.

One way that it can cause trouble is via the global variables in
nova.rpc module. Those globals are re-initialized for each test case so
they are not directly leaking information between test cases. However if
a late eventlet calls nova.rpc.get_versioned_notifier() it will get a
totally usable FakeVersionedNotifier object regardless of which test
case this notifier is belongs to or which test case the eventlet belongs
to. This way the late eventlet can send a notification to the currently
running test case and therefore can make it fail.

The current case we saw is the following:

1) The test case
  nova.tests.functional.test_servers.ServersTestV219.test_description_errors
  creates a server but don't wait for it to reach terminal state (ACTIVE
  / ERROR). This test case finishes quickly but leaks running eventlets
  in the background waiting for some RPC call to return.
2) As the test case finished the cleanup code deletes the test case
   specific setup, including the DB.
3) The test executor moves forward and starts running another test case
4) 60 seconds later the leaked eventlet times out waiting for the RPC
   call to return and tries doing things, but fails as the DB is already
   gone. Then it tries to  report this as an error notification. It calls
   nova.rpc.get_versioned_notifier() and gets a fresh notifier that is
   connected to the currently running test case. Then emits the error
   notification there.
5) The currently running test case also waits for an error notification
   to be triggered by the currently running test code. But it gets the
   notification form the late eventlet first. As the content of the
   notification does not match with the expectations the currently
   running test case fails. The late eventlet prints a lot of
   error about the DB being gone making the troubleshooting pretty hard.

This patch proposes a way to fix this by marking each eventlet at spawn
time with the id of the test case that was directly or indirectly
started it.

Then when the NotificationFixture gets a notification it compares the
test case id stored in the calling eventlet with the id of the test case
initialized the NotificationFixture. If the two ids do not match then
the fixture ignores the notification and raises an exception to the
caller eventlet to make it terminate.

Change-Id: I012dcf63306bae624dc4f66aae6c6d96a20d4327
Closes-Bug: #1946339
2021-10-14 18:27:30 +02:00
Lucian Petrut 46401ef666 api: enable oslo.reports when using uWSGI
At the moment, oslo.reports is enabled when running nova-api
standalone, but not when using uWSGI.

We're now updating the uwsgi entry point as well to include the
oslo.reports hook, which is extremely helpful when debugging
deadlocks.

Change-Id: I605f0e40417fe9b0a383cc8b3fefa1325f9690d9
2021-10-14 09:23:08 +03:00
Ghanshyam Mann 7b063e4d05 Define new functional test tox env for placement gate to run
We have placement-nova-tox-functional-py38 job defined and run
on placement gate[1] to run the nova functional test excluding
api and notification _sample_tests, and db-related tests but that
job skip those tests via tox_extra_args which is not right way
to do as we currently facing error when tox_extra_args is included
in tox siblings task
- https://opendev.org/zuul/zuul-jobs/commit/c02c28a982da8d5a9e7b4ca38d30967f6cd1531d

- https://zuul.openstack.org/build/a8c186b2c7124856ae32477f10e2b9a4

Let's define a new tox env which can exclude the required test
in stestr command itself.

[1] https://opendev.org/openstack/placement/src/commit/bd5b19c00e1ab293fc157f4699bc4f4719731c25/.zuul.yaml#L83

Change-Id: I20d6339a5203aed058f432f68e2ec1af57030401
2021-10-12 18:20:35 -05:00
Dmitriy Rabotyagov 6a15169ed9 Ensure MAC addresses characters are in the same case
Currently neutron can report ports to have MAC addresses
in upper case when they're created like that. In the meanwhile
libvirt configuration file always stores MAC in lower case
which leads to KeyError while trying to retrieve migrate_vif.

Closes-Bug: #1945646
Change-Id: Ie3129ee395427337e9abcef2f938012608f643e1
2021-10-11 07:52:05 +00:00
Zuul fdfdba2658 Merge "Update min supported service version for Yoga" 2021-10-08 02:36:41 +00:00
Zuul a8d3ab2513 Merge "Reproduce bug 1945310" 2021-10-08 01:25:27 +00:00
Zuul 8f3ee03296 Merge "tests: Silence noise from database tests" 2021-10-08 00:31:52 +00:00
Zuul 210cd8426e Merge "tests: Address some nits with database migration series" 2021-10-08 00:31:22 +00:00
Zuul 5934ae3b53 Merge "tests: Walk database migrations in correct order" 2021-10-08 00:30:55 +00:00
Lee Yarwood cf4e516f55 zuul: Move live migration jobs back to voting
With the resolution of bug #1945983 within devstack we can now move our
live migration jobs back to voting.

Related-Bug: #1945983
Closes-Bug: #1912310
Depends-On: https://review.opendev.org/c/openstack/devstack/+/812391
Depends-On: https://review.opendev.org/c/openstack/devstack/+/812925
Change-Id: I25177554802579952510c73985287fd76681012c
2021-10-07 11:21:51 +01:00
Lee Yarwood 22e9d22369 fup: Refactor and simplify Cinder fixture GET volume mock
Change-Id: I7ae80e204e927c2185260c6962d063cce799a7b8
2021-10-07 10:45:43 +01:00
Lee Yarwood 9a1cac7110 fup: Move _wait_for_volume_{attach,detach} to os-volume_attachments
As documented in the api-ref this is the API end users would poll to
determine the state of a volume operation so use it in our func tests.

Change-Id: I88e827603bb1cd6265bf234f4286ba3b204c3ce1
2021-10-07 10:45:43 +01:00
maaoyu 6fd071b904 compute: Update volume_id within connection_info during swap_volume
When Cinder orchestrates a volume migration between backends it
initially creates a temporary volume on the destination before calling
Nova to swap to that volume. When this is complete Nova calls back to
Cinder and the temporary volume on the destination is renamed to the
original volume UUID making the migration transparent to end users.

Previously Nova would not account for this within the connection_info
stahed when connecting the new volume and would continue to point to the
original UUID of the temporary volume. For most codepaths in Nova this
isn't an issue but when dealing with multiattach volumes the libvirt
driver has a specific path that uses this stored volume_id within the
connection_info when an attempt is made to detach the volume,
nova.virt.libvirt.LibvirtDriver._should_disconnect_target. In this case
this would lead to a failed lookup of the volume in Cinder and an
eventual 500 returned by Nova.

This change corrects this by ensuring any volume_id stashed in the
new connection_info we gather during a swap_volume is overwritten with
the correct id returned by the eventual call to Cinder's
os-migrate_volume_completion API [1].

[1] https://docs.openstack.org/api-ref/block-storage/v3/index.html?expanded=complete-migration-of-a-volume-detail#volumes-volumes

Co-Authored-By: Lee Yarwood <lyarwood@redhat.com>
Closes-Bug: #1943431
Change-Id: I43612714b343d98320b19b5b38264afc700790e3
2021-10-07 10:45:43 +01:00
Zuul 66574018b5 Merge "nova-manage: Ensure mountpoint is passed when updating attachment" 2021-10-05 15:03:49 +00:00