Commit Graph

52734 Commits

Author SHA1 Message Date
Zuul a0eacbf7ff Merge "Add functional regression test for bug 1794996" 2018-10-29 11:47:30 +00:00
Zuul 050e63d527 Merge "Add volume-backed evacuate test" 2018-10-29 11:47:20 +00:00
Zuul 4a4082c722 Merge "Add post-test hook for testing evacuate" 2018-10-29 11:47:14 +00:00
Zuul 88db2f9c0a Merge "Add restrictions on updated_at when getting instance action records" 2018-10-29 04:33:15 +00:00
Zuul a279671984 Merge "Fix os-simple-tenant-usage result order" 2018-10-27 08:57:52 +00:00
Zuul bcda32adf9 Merge "Use RequestSpec.user_id in scheduler.utils.claim_resources" 2018-10-27 08:49:21 +00:00
Zuul 9a201632d3 Merge "Add restrictions on updated_at when getting migrations" 2018-10-27 00:42:54 +00:00
Zuul 9ef70d31f3 Merge "Add more documentation for online_data_migrations CLI" 2018-10-26 19:32:36 +00:00
Zuul 5d36176955 Merge "Add a hacking rule for deprecated assertion methods" 2018-10-26 18:31:44 +00:00
Zuul d223b1298b Merge "api-ref: Remove unnecessary minimum microversion" 2018-10-26 18:31:36 +00:00
Zuul b9e786ce01 Merge "Reject forced move with nested source allocation" 2018-10-26 18:31:29 +00:00
Zuul 272fbb1993 Merge "Add API ref guideline for examples" 2018-10-26 16:30:58 +00:00
Zuul cee0dae6ab Merge "Bump os-brick version to 2.6.1" 2018-10-26 16:30:50 +00:00
Lucian Petrut afc3a16ce3 Fix os-simple-tenant-usage result order
nova usage-list can return incorrect results, having resources counted
twice. This only occurs when using the 2.40 microversion or later.

This microversion introduced pagination, which doesn't work properly.
Nova API will sort the instances using the tenant id and instance uuid,
but 'os-simple-tenant-usage' will not preserve the order when returning
the results.

For this reason, subsequent API calls made by the client will use the
wrong marker (which is supposed to be the last instance id), ending
up counting the same instances twice.

Change-Id: I6c7a67b23ec49aa207c33c38580acd834bb27e3c
Closes-Bug: #1796689
2018-10-26 14:47:52 +00:00
Zuul 6256c60716 Merge "Add functional recreate test for bug 1799727" 2018-10-26 01:02:36 +00:00
Zuul 49382f204d Merge "Make CellDatabases fixture reentrant" 2018-10-25 21:51:37 +00:00
Matt Riedemann d252f81573 Add functional regression test for bug 1794996
The _destroy_evacuated_instances method on compute
startup tries to cleanup guests from the hypervisor
and allocations held against that compute node resource
provider by evacuated instances, but doesn't take into
account that those evacuated instances could have been
deleted in the meantime which leads to a lazy-load
InstanceNotFound error that kills the startup of the
compute service.

This change adds a functional regression test to recreate
the bug. A subsequent change with the fix will update
the test to show the bug is fixed.

Note that assertFlavorMatchesAllocation and
_boot_and_check_allocations are redefined in the test
class because If6aa37d9b6b48791e070799ab026c816fda4441c
refactored those methods which will cause problems with
backports of this test. The redefined methods will be
removed in a follow up cleanup patch.

Change-Id: I19b0d8baea5440f5d5bc49a6956d9a97bf031a05
Related-Bug: #1794996
2018-10-25 16:15:56 -04:00
Matt Riedemann 2023f46015 Add volume-backed evacuate test
This adds a volume-backed instance evacuate scenario
to the test_evacuate post-test script.

Change-Id: I37120d9ce02de6dadbd279de195d2f289c891123
2018-10-25 16:15:56 -04:00
Matt Riedemann 8327011f91 Add post-test hook for testing evacuate
This adds a post-test bash script to test evacuate
in a multinode job.

This performs two tests:

1. A negative test where we inject a fault by stopping
   libvirt prior to the evacuation and wait for the
   server to go to ERROR status.

2. A positive where we restart libvirt, wait for the
   compute service to be enabled and then evacuate
   the server and wait for it to be ACTIVE.

For now we hack this into the nova-live-migration
job, but it should probably live in a different job
long-term.

Change-Id: I9b7c9ad6b0ab167ba4583681efbbce4b18941178
2018-10-25 16:15:56 -04:00
Matt Riedemann e1a982403d Use RequestSpec.user_id in scheduler.utils.claim_resources
We now have the RequestSpec.user_id since change:

  I3e174ae76931f8279540e92328c7c36a7bcaabc0

So we can try to use that in claim_resources() if it's
set, otherwise continue to fallback on the RequestContext.

Change-Id: I2f9752b0bf7e616556035b58853c57f964c2d5e9
2018-10-25 15:41:49 -04:00
Zuul 8ec31bfd42 Merge "api-ref: Add descriptions of error cases" 2018-10-25 18:26:00 +00:00
Zuul a0563e754c Merge "Consider allocations invovling child providers during allocation cleanup" 2018-10-25 14:13:44 +00:00
Zuul 238407ad00 Merge "Drop legacy cold migrate allocation compat code" 2018-10-25 13:55:19 +00:00
Balazs Gibizer fd351903a1 Reject forced move with nested source allocation
Both os-migrateLive and evacuate server API actions support a force
flag. If force is set to True in the request then nova does not call the
scheduler but instead tries to blindly copy the source host allocation
to the desitnation host. If the source host allocation contains
resources from more than the root RP then such blind copy cannot be done
properly. Therefore this patch detects such situation and rejects
the forced move operation if the server has complex allocations on the
source host.

There is a separate bluperint
remove-force-flag-from-live-migrate-and-evacuate that will remove the
force flag in a new API microversion.

Note that before the force flag was added to these APIs Nova bypassed the
scheduler when the target host was specified.

Blueprint: use-nested-allocation-candidates
Change-Id: I7cbd5d9fb875ebf72995362e0b6693492ce32051
2018-10-25 15:44:59 +02:00
Zuul 8b627fe68c Merge "Remove the CachingScheduler" 2018-10-25 13:22:53 +00:00
Takashi NATSUME 8ad33f35a4 Add API ref guideline for examples
Add guideline for JSON request/response body examples
in the API reference.

Change-Id: I2dcc2fc1a16cc5dcba7879518a2b101df3576304
2018-10-25 07:46:48 +00:00
Takashi NATSUME 4c531a5b94 api-ref: Add descriptions of error cases
In the following APIs, the 'changes-since' query parameter must be
earlier than or equal to 'changes-before' query parameter
otherwise the API returns 400.

* GET /servers
* GET /servers/detail

Add the description in each parameter.

Change-Id: Ieb26275deac2ddee3768a0fad7f37dc5795fb5c0
2018-10-25 07:46:30 +00:00
Takashi NATSUME 8a0e2a1085 api-ref: Remove unnecessary minimum microversion
A minimum microversion description that is the same as
the microversion the API is newly added is redundant
in parameters. So remove them.

Change-Id: I3e1ca88cac3a52a8b44e26f051a51a6db77a3231
Closes-Bug: #1799893
2018-10-25 16:37:21 +09:00
Takashi NATSUME 249174943e Add a hacking rule for deprecated assertion methods
Add a hacking rule for the following deprecated methods(*)
in Python 3.

* assertRegexpMatches
* assertNotRegexpMatches

[N361] assertRegex/assertNotRegex must be used instead of
       assertRegexpMatches/assertNotRegexpMatches.

*: https://docs.python.org/3.6/library/unittest.html#deprecated-aliases

Change-Id: Icfbaf26a7db6986820e264d1888982b985d613a1
2018-10-25 11:49:10 +09:00
Zuul 7a76d0c71d Merge "Add debug logs for when provider inventory changes" 2018-10-24 22:14:34 +00:00
Zuul ec89e487ec Merge "libvirt: fix disk_bus handling for root disk" 2018-10-24 21:22:54 +00:00
Dan Smith 84b7e92934 Make CellDatabases fixture reentrant
The current CellDatabases fixture uses a single lock to protect code
that needs to run with the global database state pointed at a specific
cell. That is a problem if we ever need to recursively target a cell
(which would work in real life but not in tests), as well as when we have
two threads that need to target a cell where other locks are involved
such that they deadlock each other.

This change attempts to convert the fixture to use the existing
writer lock to only protect the code that changes the global database
state, and a reader lock for all threads actually running code that
is targeted. This avoids the deadlock because the schedulable code that
could acquire other locks isn't blocking other threads, except those that
need to change to another cell.

Change-Id: I9cd701dea7356bd3825900e216b163b895299e08
2018-10-24 13:58:10 -07:00
Matt Riedemann c86f309c56 Add more documentation for online_data_migrations CLI
This is a follow up to commit c4c6dc736 to clarify some
confusing comments in the code, add more comments in
the actual runtime code, and also provide an example
in the CLI man page docs along with an explanation of
the output, specifically for the case that $found>0
but done=0 and what that means.

Change-Id: I0691caab2c44d3189504c54e51bb263ecdc5d1d2
Related-Bug: #1794364
2018-10-24 16:14:00 -04:00
Matt Riedemann 45f36cebab Add functional recreate test for bug 1799727
This adds a functional test which recreates the
bug where config-driven reserved and allocation ratio
overrides are not being reflected in resource provider
inventory once initially set.

The reserved and allocation_ratio values set in the
FakeDriver.update_provider_tree method, added in change
I69d760aaf931d46f011cfd229b88f400837662e8, are removed
here otherwise they hard-code the values which get sent
to placement and ResourceTracker._normalize_inventory_from_cn_obj
won't update the reserved / ratios based on config. The
fake virt driver shouldn't really need to hard-code these
values since the RT will provide those based on config.

Change-Id: Ie66d6f4c83a7d6fc64a64dbd752e427cee1356d0
Related-Bug: #1799727
2018-10-24 15:10:52 -04:00
Balazs Gibizer 66297f0c4a Consider allocations invovling child providers during allocation cleanup
Nova calls remove_provider_from_instance_allocation() in couple of
cleanup cases. As the name of the function suggests this does not handle
allocations against RP trees containing child providers. This leads to
leaking allocations on the child providers during cleanup. As the functional
tests in ServerMovingTestsWithNestedResourceRequests previously shown mostly
evacuation is directly affected by this call path because evacuation still
uses the instance_uuid as the consumer for both the source host and the
destination host allocation.

This patch replaces remove_provider_from_instance_allocation() with
remove_provider_tree_from_instance_allocation() and as the name suggests
this call removes allocation against the RP tree. After this change most
of the evacuation functional tests passes without resource leak, except
force evacuation. That will be handled in a subsequent patch.

Also as this change made the scheduler/utils
remove_allocation_from_compute just a proxy call of
remove_provider_from_instance_allocation and therefore
remove_allocation_from_compute is deleted.

Change-Id: I2af45a9540e7ccd60ace80d9fcadc79972da7df7
Blueprint: use-nested-allocation-candidates
2018-10-24 17:38:06 +02:00
zhangbailin 3d4a7021db Add restrictions on updated_at when getting instance action records
If ``changes-before`` is less than ``changes-since`` in requested,
there will get empty data when querying in db, so add the limit of
these parameters. If ``changes-before`` < ``changes-since``, will
be returned 400.

Closes-Bug: #1796009
Change-Id: I44546bc9798708a48a250cc3a21bdbcabe2649e1
2018-10-24 11:14:26 -04:00
zhangbailin a3a0068929 Add restrictions on updated_at when getting migrations
If ``changes-before`` is less than ``changes-since`` in requested,
there will get empty data when querying in db, so add the limit of
these parameters. If ``changes-before`` < ``changes-since``, will
be returned 400.

Closes-Bug: #1796008
Change-Id: I2a39ac5a9fe969d383099b4e766c46ad05d9f67c
2018-10-24 11:11:49 -04:00
Zuul 835faf3f40 Merge "Fix up compute rpcapi version for pike release" 2018-10-24 04:00:41 +00:00
Zuul 297de7fb9f Merge "Use assertRegex instead of assertRegexpMatches" 2018-10-23 11:11:14 +00:00
Zuul 7701eefb81 Merge "Rename tempest-nova job to follow conventions" 2018-10-23 11:08:25 +00:00
Zuul 55405ef31e Merge "Convert legacy-tempest-dsvm-neutron-src-oslo.versionedobjects job" 2018-10-23 08:29:39 +00:00
Zuul 497576bdde Merge "conductor: Recreate volume attachments during a reschedule" 2018-10-23 03:10:48 +00:00
Zuul f97c1ad1e3 Merge "Remove duplicate legacy-tempest-dsvm-multinode-full job" 2018-10-23 02:24:52 +00:00
Zuul d807ed8899 Merge "Remove the extensions framework from wsgi.py" 2018-10-23 02:12:50 +00:00
Zuul b9dc5cc1a1 Merge "Zuul: Update barbican experimental job" 2018-10-23 01:37:28 +00:00
Zuul d74f9d62c5 Merge "Add regression test for bug#1784353" 2018-10-22 22:53:37 +00:00
Zuul dc9b343054 Merge "fixtures: Track volume attachments within CinderFixtureNewAttachFlow" 2018-10-22 22:53:30 +00:00
Zuul 5a1a72c171 Merge "Ensure attachment cleanup on failure in driver.pre_live_migration" 2018-10-22 19:33:16 +00:00
Zuul 3688eae4fe Merge "Move live_migration.pre.start to the start of the method" 2018-10-22 19:33:10 +00:00
Lee Yarwood 41452a5c6a conductor: Recreate volume attachments during a reschedule
When an instance with attached volumes fails to spawn, cleanup code
within the compute manager (_shutdown_instance called from
_build_resources) will delete the volume attachments referenced by
the bdms in Cinder. As a result we should check and if necessary
recreate these volume attachments when rescheduling an instance.

Note that there are a few different ways to fix this bug by
making changes to the compute manager code, either by not deleting
the volume attachment on failure before rescheduling [1] or by
performing the get/create check during each build after the
reschedule [2].

The problem with *not* cleaning up the attachments is if we don't
reschedule, then we've left orphaned "reserved" volumes in Cinder
(or we have to add special logic to tell compute when to cleanup
attachments).

The problem with checking the existence of the attachment on every
new host we build on is that we'd be needlessly checking that for
initial creates even if we don't ever need to reschedule, unless
again we have special logic against that (like checking to see if
we've rescheduled at all).

Also, in either case that involves changes to the compute means that
older computes might not have the fix.

So ultimately it seems that the best way to handle this is:

1. Only deal with this on reschedules.
2. Let the cell conductor orchestrate it since it's already dealing
   with the reschedule. Then the compute logic doesn't need to change.

[1] https://review.openstack.org/#/c/587071/3/nova/compute/manager.py@1631
[2] https://review.openstack.org/#/c/587071/4/nova/compute/manager.py@1667

Change-Id: I739c06bd02336bf720cddacb21f48e7857378487
Closes-bug: #1784353
2018-10-22 15:29:15 -04:00