Commit Graph

55692 Commits

Author SHA1 Message Date
Zuul 6cb4194902 Merge "Add functional regression test for bug 1849409" 2019-10-23 18:29:50 +00:00
Zuul fd470598dc Merge "Adds view builders for keypairs controller" 2019-10-23 17:29:03 +00:00
Matt Riedemann 45c2752f2c Add functional regression test for bug 1849409
Change I1aa3ca6cc70cef65d24dec1e7db9491c9b73f7ab in Queens,
which was backported through to Newton, introduced a regression
when listing deleted servers with a marker because it assumes
that if BuildRequestList.get_by_filters does not raise
MarkerNotFound that the marker was found among the build requests
and does not account for that get_by_filters method short-circuiting
if filtering servers with deleted/cleaned/limit=0. The API code
then nulls out the marker which means you'll continue to get the
marker instance back in the results even though you shouldn't,
and that can cause an infinite loop in some client-side tooling like
nova's CLI:

  nova list --deleted --limit -1

This adds a functional recreate test for the regression which will
be updated when the bug is fixed.

Change-Id: I324193129acb6ac739133c7e76920762a8987a84
Related-Bug: #1849409
2019-10-23 09:41:54 -04:00
Zuul 2718de6ed7 Merge "Revert "Log CellTimeout traceback in scatter_gather_cells"" 2019-10-23 08:27:52 +00:00
Zuul 5351e5d1d8 Merge "Revert "vif: Resolve a TODO and update another"" 2019-10-23 08:08:50 +00:00
Zuul 98b521034b Merge "Remove compute compat checks for aborting queued live migrations" 2019-10-23 08:08:36 +00:00
Matt Riedemann 9377d00ccf Revert "Log CellTimeout traceback in scatter_gather_cells"
This reverts commit 0436a95f37.

This was meant to get us more debug details when hitting the
failure but the results are not helpful [1] so revert this
and the fix for the resulting regression [2].

[1] http://paste.openstack.org/show/782116/
[2] I7f9edc9a4b4930f4dce98df271888fa8082a1701

Change-Id: Iab8029f081a654278ea7dbbec79a766aea6764ae
Related-Bug: #1844929
2019-10-22 17:12:28 -04:00
Pavel Kholkin f3ae221f60 Adds view builders for keypairs controller
Adds view builders for keypair index, show and create.

We already have 'view' class for keypairs, so we can move
the implementation of several things in this file to make the code of
the controller more readable and simple.
We have this pattern for other controllers, too.

Co-Authored-By: Takashi Natsume <natsume.takashi@lab.ntt.co.jp>
Change-Id: I2820143b7b5b6f74a6c3ca67a5c9d0980e3e9a86
2019-10-22 18:02:45 +00:00
Lenny Verkhovsky 23586abc61 Revert "vif: Resolve a TODO and update another"
This reverts commit 3f56e44b84.
Closes-Bug: #1839920

MacVtap CI[1] started to fail after merging commit[2]
[1] https://wiki.openstack.org/wiki/ThirdPartySystems/Mellanox_CI
[2] https://review.opendev.org/#/c/666631/

Related-Bug: #1841067
Change-Id: Ieb901b802f7e3f08e0b5a2443bd8d3f1783260eb
2019-10-22 11:15:28 +03:00
Eric Fried 80385a22ee Don't populate resources for not-yet-migrated inst
Per the referenced bug, it is possible for update_available_resource to
race with a migration such that the migration record exists, but the
instance's migration context doesn't. In such cases we shouldn't try to
track the instance's assigned resources on this host (because there
aren't any yet).

Change-Id: I69f99adfa8c91b50086052ca1b15c55e86ed614d
Closes-Bug: #1849165
2019-10-21 15:13:21 -05:00
Eric Fried 761be5d0cb Func: bug 1849165: mig race with _populate_assigned_resources
Add a functional regression test for the referenced bug:

If a migration is initiated, and update_available_resource runs on the
destination between when the migration record is associated with the
destination and when the migration context is added to the instance, it
will raise a TypeError attempting to _populate_assigned_resources for
that instance, because that method attempts to access the
(as-yet-nonexistent) migration context.

Note that this doesn't fail the migration; it just leaves ugly logs. In
real life it probably also leaves other pieces of
update_available_resource unfinished on the destination.

Related-Bug: #1849165
Change-Id: I7e96cd24049c205f76a684a2e7425f85b4376f73
2019-10-21 15:13:17 -05:00
Zuul 964d7dc879 Merge "Add PrepResizeAtSourceTask" 2019-10-21 11:31:32 +00:00
Zuul be99b2cbc1 Merge "Add prep_snapshot_based_resize_at_source compute method" 2019-10-21 11:19:03 +00:00
Zuul f9b1332c21 Merge "Add PrepResizeAtDestTask" 2019-10-21 11:18:57 +00:00
Zuul e822b2fb1e Merge "Remove Stein compute compat checks for volume type support" 2019-10-21 00:08:09 +00:00
Zuul e2edf7b2d0 Merge "Remove dead reserve_volume compat code in _validate_bdm" 2019-10-20 00:38:25 +00:00
Zuul c323178536 Merge "Add support for cloud-init on LXC instances" 2019-10-20 00:12:22 +00:00
Zuul 3b5a79d8b9 Merge "Add support for 'initenv' elements" 2019-10-18 10:41:41 +00:00
Zuul 578b6ea58c Merge "Fix up some feedback on image precache support" 2019-10-18 05:06:30 +00:00
Zuul e9ac4c5a30 Merge "Add image caching API for aggregates" 2019-10-18 04:41:36 +00:00
Zuul 0304970b7b Merge "Add compute side revert allocation test for bug 1848343" 2019-10-17 19:17:19 +00:00
Zuul d4fcf1d942 Merge "Add live migration recreate test for bug 1848343" 2019-10-17 19:14:12 +00:00
Zuul 64fb3f8d51 Merge "Add functional recreate test for bug 1848343" 2019-10-17 19:13:53 +00:00
Balazs Gibizer be09b73796 Make sure tox install requirements.txt with upper-constraints
Ieb4ab13cf8ca5683fcd7b18ed669e8a26659bff1 removed the upper-constraints
from the install_command which caused that only the test-requirements
are installed with the upper-constraints enforced. This caused that when
tox installed nova in the virtual env it installed the content of the
requirement.txt without enforcing the upper-constraints. Today networkx
2.4 package has been released to pypi. The taskflow lib depends on
networkx but does not pin the requirement but the openstack
upper-constraints pins the networkx requirements properly. Nova depends
on taskflow therefore when nova is installed by tox without the
upper-constraints the new networkx 2.4 is installed. This broke the nova
unit tests.

This patch makes sure that all the requirements are installed with the
upper-constraints enforced.

Change-Id: Iba797243d2a137b551223165a1af1a8676bcea02
Closes-Bug: #1848499
2019-10-17 16:01:33 +02:00
Matt Riedemann 760ccb32db Add compute side revert allocation test for bug 1848343
This adds a functional recreate test for a scenario where
reverting migration-based allocations during resize failure
in the compute service results in leaking allocations for a
deleted server.

Change-Id: Iac4dd9feebb1a405826c95cb6b046b82c61140a2
Related-Bug: #1848343
2019-10-16 15:23:50 -04:00
Matt Riedemann 252ee93086 Add live migration recreate test for bug 1848343
This adds a live migration functional recreate test
like Ifd156ac8789d3fc84d56d400cf1e160e2cd2fbee is
for cold migrate/resize.

Change-Id: I856db36d63779d521fe26b27ef5a12b7a4d3bd91
Related-Bug: #1848343
2019-10-16 14:03:59 -04:00
Matt Riedemann 24318f8cd4 Add functional recreate test for bug 1848343
This adds a functional test to recreate a bug where the
instance is deleted after conductor has swapped allocations
to the migration consumer but before casting to compute. In
this case, the scheduler fails due to NoValidHost which is
entirely reasonable. The bug is that the conductor task rollback
code re-creates the allocations on the source node for the
now-deleted instance and as such those allocations get leaked.

Note that we have similar exposures in the live migration
task and reverting allocations when resize fails in the
compute service. A TODO is left inline to add tests for those
separately.

Change-Id: Ifd156ac8789d3fc84d56d400cf1e160e2cd2fbee
Related-Bug: #1848343
2019-10-16 13:08:08 -04:00
Dan Smith fee9503ead Fix up some feedback on image precache support
This addressed a few feedback items from earlier in the stack.

Related to blueprint image-precache-support

Change-Id: I622a9180d7b53dd35e60e2335fe185da1d6ac019
2019-10-16 07:40:23 -07:00
Zuul 1a226aaa9e Merge "Update compute rpc version alias for train" 2019-10-16 06:30:49 +00:00
Dan Smith 3391298706 Add image caching API for aggregates
This adds a new microversion and support for requesting image pre-caching
on an aggregate.

Related to blueprint image-precache-support

Change-Id: I4ab96095106b38737ed355fcad07e758f8b5a9b0
2019-10-15 21:22:31 -04:00
Matt Riedemann 242557333a Add PrepResizeAtSourceTask
This change adds a new conductor sub-task which will make a
synchronous RPC call (using long_rpc_timeout) to the new method
"prep_snapshot_based_resize_at_source" on the source compute.

If the instance is not volume-backed, the sub-task will create
an image and pass the image ID to the compute method to upload
the snapshot data.

If the migration fails at this point, any snapshot image created should
be deleted. Recovering the guest on the source host should be as simple
as hard rebooting the server (which is allowed with servers in ERROR
status).

Part of blueprint cross-cell-resize

Change-Id: I5bfcac018c1d1196d4efcb321213eb5a1d4c7a6b
2019-10-15 19:15:48 -04:00
Matt Riedemann b4fb248ad2 Add prep_snapshot_based_resize_at_source compute method
This adds a new compute service method prep_snapshot_based_resize_at_source
which will be synchronously called over RPC from (super)conductor to the
source host during a cross-cell resize which will:

* Power off the instance
* Upload snapshot data if the instance is not volume-backed
* Delete the old BDM volume attachment records
* Destroy the guest on the hypervisor but retain disks in the
  case of a later fault or revert
* Activate the dest host port bindings

Think of this as a hybrid of how shelve_instance and resize_instance
work, with a lot of the flow matching what the migrate_disk_and_power_off
compute driver method does except for transferring disks to the dest host.

Also note that resources are not freed up from the ResourceTracker on
the source host at this point, nor are the instance.host/node values
nulled out since we are keeping a "placehoder" on the source in case
something fails later or the user reverts the resize.

Part of blueprint cross-cell-resize

Change-Id: I1887097ae38014dd19fb0ce333d7f223ad3d2130
2019-10-15 19:15:48 -04:00
Matt Riedemann 6d118e2921 Add PrepResizeAtDestTask
This adds the sub-task to prep/verify the target host(s)
for the resize in the target cell. The PrepResizeAtDestTask
sub-task will make a synchronous RPC call (using
long_rpc_timeout) to method prep_snapshot_based_resize_at_dest
on the dest compute service which will claim resources on
the target host. The task also creates (inactive) port
bindings and volume attachments to be used on the target host.

If the prep task on the selected target host fails with a
MigrationPreCheckError, conductor will iterate over alternate
hosts and check them until a suitable target host is found
or we raise MaxRetriesExceeded.

The instance.migration_context is returned from the task so
it can be copied from the target DB to the source DB. This is
necessary for the API to route network-vif-plugged events later
when spawning the guest in the target cell.

Part of blueprint cross-cell-resize

Change-Id: I66d8f06f19c5c631e33208580428aa843abb38d2
2019-10-15 19:15:48 -04:00
Zuul 61c32d6bd9 Merge "Fix legacy issues in filter migrations by user_id/project_id" 2019-10-15 22:56:48 +00:00
Zuul 289fc24fd0 Merge "Add cache_images() to conductor" 2019-10-15 22:56:33 +00:00
Zuul 0238cf431b Merge "Filter migrations by user_id/project_id" 2019-10-15 18:34:32 +00:00
Stephen Finucane 8d5172dadc Remove compute compat checks for aborting queued live migrations
These were added in Rocky [1] and can now been removed, since we don't
need to support anything from before Train in Ussuri.

[1] I4636a8d270ce01c1831bc951c4497ad472bc9aa8

Change-Id: Ib01ebeff0647f6e27714856f3a36c3896eeab27f
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
2019-10-15 15:49:53 +00:00
Zuul 149327a3ab Merge "VMware: Update flavor-related metadata on resize" 2019-10-15 14:53:59 +00:00
Zuul 63fb66e39a Merge "Add prep_snapshot_based_resize_at_dest compute method" 2019-10-14 23:40:13 +00:00
Zuul d4f0bd93dc Merge "Leave brackets on Ceph IP addresses for libguestfs" 2019-10-14 20:56:14 +00:00
Zuul 3387811743 Merge "Deprecate [api]auth_strategy and noauth2" 2019-10-14 20:56:08 +00:00
zhangbailin 355570962b Fix legacy issues in filter migrations by user_id/project_id
Fix some issue in https://review.opendev.org/#/c/679413.

Change-Id: I3a6e55b771d7324fb0809059813e7553562c21c4
2019-10-14 18:49:19 +00:00
Dan Smith 11d909c2cb Add cache_images() to conductor
This adds the bulk of the image pre-caching logic to the conductor
task manager. It takes an aggregate and list of image ids from the
API service and handles the process of calling to the relevant compute
nodes to initiate the image downloads, honoring the (new) config knob
for overall task parallelism.

Related to blueprint image-precache-support

Change-Id: Id7c0ab7ae0586d49d88ff2afae149e25e59a3489
2019-10-14 11:35:11 -07:00
zhangbailin ac165112b7 Filter migrations by user_id/project_id
In microversion 2.80, the ``GET /os-migrations`` API will have
optional ``user_id`` and ``project_id`` query parameters for
filtering migrations by user and/or project:

* GET /os-migrations?user_id=ef9d34b4-45d0-4530-871b-3fb535988394
* GET /os-migrations?project_id=011ee9f4-8f16-4c38-8633-a254d420fd54
* GET /os-migrations?user_id=ef9d34b4-45d0-4530-871b-3fb535988394&project_id=011ee9f4-8f16-4c38-8633-a254d420fd54

And expose the ``user_id`` and ``project_id`` fields in the following APIs:

* GET /os-migrations
* GET /servers/{server_id}/migrations
* GET /servers/{server_id}/migrations/{migration_id}

Co-Authored-By: Qiu Fossen <qiujunting>
Part of blueprint add-user-id-field-to-the-migrations-table
Change-Id: I7313d6cde1a5e1dc7dd6f3c0dff9f30bbf4bee2c
2019-10-14 11:35:11 -07:00
Zuul d14ae3a126 Merge "Avoid using image with kernel in BDM large request func test" 2019-10-14 16:07:13 +00:00
Matt Riedemann 4cc1798bd4 Add prep_snapshot_based_resize_at_dest compute method
This adds a new method to the compute service which will
be synchronously RPC called from (super)conductor when
preparing for a cross-cell resize. It will perform an
RT.resize_claim() which will claim things like PCI devices
and/or NUMA topology resources which are not otherwise "claimed"
in the placement service during scheduling. The MigrationContext
is created in the target cell DB as part of this claim.

Notifications, fault and instance action event creation should
be consistent with the same-cell "prep_resize" method. One
difference is the reverts_task_state decorator is not used here
since conductor is responsible for trying alternative hosts and
it does not make sense for this method to reset the instance
task_state to None on failure if conductor is going to try
another host. The existing prep_resize method is not used in
general since for cross-cell-resize conductor handles orchestrating
the call to the source compute and reschedules, which are things
prep_resize does for same-cell resize. We could munge the
existing method but I felt this was cleaner to keep them separate.

Part of blueprint cross-cell-resize

Change-Id: I518ae675b7a67da64a5796e57e87860f0c3ef0db
2019-10-14 11:06:30 -04:00
Dan Smith de373c7007 Update compute rpc version alias for train
This adds a compute rpc version alias for the named release train.

Change-Id: I49196e16819abd1a4ac94619a2909ee523b14215
2019-10-14 07:58:11 -07:00
Zuul 6b60cae019 Merge "setup.cfg: Cleanup" 2019-10-14 11:11:15 +00:00
Zuul c270c48d09 Merge "libvirt: Change _compare_cpu to raise InvalidCPUInfo" 2019-10-12 11:53:09 +00:00
Zuul 32fcc4f459 Merge "Add cache_image() support to the compute/{rpcapi,api,manager}" 2019-10-11 18:41:40 +00:00