Change I1aa3ca6cc70cef65d24dec1e7db9491c9b73f7ab in Queens,
which was backported through to Newton, introduced a regression
when listing deleted servers with a marker because it assumes
that if BuildRequestList.get_by_filters does not raise
MarkerNotFound that the marker was found among the build requests
and does not account for that get_by_filters method short-circuiting
if filtering servers with deleted/cleaned/limit=0. The API code
then nulls out the marker which means you'll continue to get the
marker instance back in the results even though you shouldn't,
and that can cause an infinite loop in some client-side tooling like
nova's CLI:
nova list --deleted --limit -1
This adds a functional recreate test for the regression which will
be updated when the bug is fixed.
Change-Id: I324193129acb6ac739133c7e76920762a8987a84
Related-Bug: #1849409
This reverts commit 0436a95f37.
This was meant to get us more debug details when hitting the
failure but the results are not helpful [1] so revert this
and the fix for the resulting regression [2].
[1] http://paste.openstack.org/show/782116/
[2] I7f9edc9a4b4930f4dce98df271888fa8082a1701
Change-Id: Iab8029f081a654278ea7dbbec79a766aea6764ae
Related-Bug: #1844929
Adds view builders for keypair index, show and create.
We already have 'view' class for keypairs, so we can move
the implementation of several things in this file to make the code of
the controller more readable and simple.
We have this pattern for other controllers, too.
Co-Authored-By: Takashi Natsume <natsume.takashi@lab.ntt.co.jp>
Change-Id: I2820143b7b5b6f74a6c3ca67a5c9d0980e3e9a86
Per the referenced bug, it is possible for update_available_resource to
race with a migration such that the migration record exists, but the
instance's migration context doesn't. In such cases we shouldn't try to
track the instance's assigned resources on this host (because there
aren't any yet).
Change-Id: I69f99adfa8c91b50086052ca1b15c55e86ed614d
Closes-Bug: #1849165
Add a functional regression test for the referenced bug:
If a migration is initiated, and update_available_resource runs on the
destination between when the migration record is associated with the
destination and when the migration context is added to the instance, it
will raise a TypeError attempting to _populate_assigned_resources for
that instance, because that method attempts to access the
(as-yet-nonexistent) migration context.
Note that this doesn't fail the migration; it just leaves ugly logs. In
real life it probably also leaves other pieces of
update_available_resource unfinished on the destination.
Related-Bug: #1849165
Change-Id: I7e96cd24049c205f76a684a2e7425f85b4376f73
Ieb4ab13cf8ca5683fcd7b18ed669e8a26659bff1 removed the upper-constraints
from the install_command which caused that only the test-requirements
are installed with the upper-constraints enforced. This caused that when
tox installed nova in the virtual env it installed the content of the
requirement.txt without enforcing the upper-constraints. Today networkx
2.4 package has been released to pypi. The taskflow lib depends on
networkx but does not pin the requirement but the openstack
upper-constraints pins the networkx requirements properly. Nova depends
on taskflow therefore when nova is installed by tox without the
upper-constraints the new networkx 2.4 is installed. This broke the nova
unit tests.
This patch makes sure that all the requirements are installed with the
upper-constraints enforced.
Change-Id: Iba797243d2a137b551223165a1af1a8676bcea02
Closes-Bug: #1848499
This adds a functional recreate test for a scenario where
reverting migration-based allocations during resize failure
in the compute service results in leaking allocations for a
deleted server.
Change-Id: Iac4dd9feebb1a405826c95cb6b046b82c61140a2
Related-Bug: #1848343
This adds a live migration functional recreate test
like Ifd156ac8789d3fc84d56d400cf1e160e2cd2fbee is
for cold migrate/resize.
Change-Id: I856db36d63779d521fe26b27ef5a12b7a4d3bd91
Related-Bug: #1848343
This adds a functional test to recreate a bug where the
instance is deleted after conductor has swapped allocations
to the migration consumer but before casting to compute. In
this case, the scheduler fails due to NoValidHost which is
entirely reasonable. The bug is that the conductor task rollback
code re-creates the allocations on the source node for the
now-deleted instance and as such those allocations get leaked.
Note that we have similar exposures in the live migration
task and reverting allocations when resize fails in the
compute service. A TODO is left inline to add tests for those
separately.
Change-Id: Ifd156ac8789d3fc84d56d400cf1e160e2cd2fbee
Related-Bug: #1848343
This addressed a few feedback items from earlier in the stack.
Related to blueprint image-precache-support
Change-Id: I622a9180d7b53dd35e60e2335fe185da1d6ac019
This adds a new microversion and support for requesting image pre-caching
on an aggregate.
Related to blueprint image-precache-support
Change-Id: I4ab96095106b38737ed355fcad07e758f8b5a9b0
This change adds a new conductor sub-task which will make a
synchronous RPC call (using long_rpc_timeout) to the new method
"prep_snapshot_based_resize_at_source" on the source compute.
If the instance is not volume-backed, the sub-task will create
an image and pass the image ID to the compute method to upload
the snapshot data.
If the migration fails at this point, any snapshot image created should
be deleted. Recovering the guest on the source host should be as simple
as hard rebooting the server (which is allowed with servers in ERROR
status).
Part of blueprint cross-cell-resize
Change-Id: I5bfcac018c1d1196d4efcb321213eb5a1d4c7a6b
This adds a new compute service method prep_snapshot_based_resize_at_source
which will be synchronously called over RPC from (super)conductor to the
source host during a cross-cell resize which will:
* Power off the instance
* Upload snapshot data if the instance is not volume-backed
* Delete the old BDM volume attachment records
* Destroy the guest on the hypervisor but retain disks in the
case of a later fault or revert
* Activate the dest host port bindings
Think of this as a hybrid of how shelve_instance and resize_instance
work, with a lot of the flow matching what the migrate_disk_and_power_off
compute driver method does except for transferring disks to the dest host.
Also note that resources are not freed up from the ResourceTracker on
the source host at this point, nor are the instance.host/node values
nulled out since we are keeping a "placehoder" on the source in case
something fails later or the user reverts the resize.
Part of blueprint cross-cell-resize
Change-Id: I1887097ae38014dd19fb0ce333d7f223ad3d2130
This adds the sub-task to prep/verify the target host(s)
for the resize in the target cell. The PrepResizeAtDestTask
sub-task will make a synchronous RPC call (using
long_rpc_timeout) to method prep_snapshot_based_resize_at_dest
on the dest compute service which will claim resources on
the target host. The task also creates (inactive) port
bindings and volume attachments to be used on the target host.
If the prep task on the selected target host fails with a
MigrationPreCheckError, conductor will iterate over alternate
hosts and check them until a suitable target host is found
or we raise MaxRetriesExceeded.
The instance.migration_context is returned from the task so
it can be copied from the target DB to the source DB. This is
necessary for the API to route network-vif-plugged events later
when spawning the guest in the target cell.
Part of blueprint cross-cell-resize
Change-Id: I66d8f06f19c5c631e33208580428aa843abb38d2
These were added in Rocky [1] and can now been removed, since we don't
need to support anything from before Train in Ussuri.
[1] I4636a8d270ce01c1831bc951c4497ad472bc9aa8
Change-Id: Ib01ebeff0647f6e27714856f3a36c3896eeab27f
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
This adds the bulk of the image pre-caching logic to the conductor
task manager. It takes an aggregate and list of image ids from the
API service and handles the process of calling to the relevant compute
nodes to initiate the image downloads, honoring the (new) config knob
for overall task parallelism.
Related to blueprint image-precache-support
Change-Id: Id7c0ab7ae0586d49d88ff2afae149e25e59a3489
In microversion 2.80, the ``GET /os-migrations`` API will have
optional ``user_id`` and ``project_id`` query parameters for
filtering migrations by user and/or project:
* GET /os-migrations?user_id=ef9d34b4-45d0-4530-871b-3fb535988394
* GET /os-migrations?project_id=011ee9f4-8f16-4c38-8633-a254d420fd54
* GET /os-migrations?user_id=ef9d34b4-45d0-4530-871b-3fb535988394&project_id=011ee9f4-8f16-4c38-8633-a254d420fd54
And expose the ``user_id`` and ``project_id`` fields in the following APIs:
* GET /os-migrations
* GET /servers/{server_id}/migrations
* GET /servers/{server_id}/migrations/{migration_id}
Co-Authored-By: Qiu Fossen <qiujunting>
Part of blueprint add-user-id-field-to-the-migrations-table
Change-Id: I7313d6cde1a5e1dc7dd6f3c0dff9f30bbf4bee2c
This adds a new method to the compute service which will
be synchronously RPC called from (super)conductor when
preparing for a cross-cell resize. It will perform an
RT.resize_claim() which will claim things like PCI devices
and/or NUMA topology resources which are not otherwise "claimed"
in the placement service during scheduling. The MigrationContext
is created in the target cell DB as part of this claim.
Notifications, fault and instance action event creation should
be consistent with the same-cell "prep_resize" method. One
difference is the reverts_task_state decorator is not used here
since conductor is responsible for trying alternative hosts and
it does not make sense for this method to reset the instance
task_state to None on failure if conductor is going to try
another host. The existing prep_resize method is not used in
general since for cross-cell-resize conductor handles orchestrating
the call to the source compute and reschedules, which are things
prep_resize does for same-cell resize. We could munge the
existing method but I felt this was cleaner to keep them separate.
Part of blueprint cross-cell-resize
Change-Id: I518ae675b7a67da64a5796e57e87860f0c3ef0db