Commit Graph

52266 Commits

Author SHA1 Message Date
Balazs Gibizer 669d6499ee Follow up for Ib6f95c22ffd3ea235b60db4da32094d49c2efa2a
This patch removes the host parameter of the
notify_about_instance_delete() call as it is always filled with
CONF.host value.

Change-Id: Iff3b605b9410d5a097b53f532870df65780bc1e4
Implements: bp versioned-notification-transformation-stein
2018-09-27 11:33:55 +02:00
Balazs Gibizer e83dbe1205 Send soft_delete from context manager
This patch moves soft_delete notification sending to the
notify_about_instance_delete context manager to unify the
delete notification sending code practice.

Implements: bp versioned-notification-transformation-stein
Change-Id: Ib6f95c22ffd3ea235b60db4da32094d49c2efa2a
2018-08-29 13:40:05 +02:00
Balazs Gibizer d3097e52b3 Transform missing delete notifications
The Iddbe50ce0ad3c14562df800bbc09ec5a7e840485 patch only considered
instance delete in the happy case when the instance is scheduled to a
compute successfully and the compute is available when the delete
action is executed. If the instance is never scheduled to a compute or
the compute is not available when the instance is deleted legacy delete
notifications are emitted from different places, compute.api instead of
compute.manager. The original patch missed these places.

There will be subsequent patch(es) handling the same edge cases
for soft_delete and force_delete.

Change-Id: If0693eab2ed31b5fbfe6cbafa5d67b69c2ed8442
Implements: bp versioned-notification-transformation-stein
2018-08-29 13:39:57 +02:00
Zuul 45fc2326c7 Merge "Document no content on POST /reshaper 204" 2018-08-29 09:22:16 +00:00
Zuul 7116d8daf8 Merge "[placement] Add /reshaper handler for POST" 2018-08-29 09:22:10 +00:00
Zuul 3ff5c9560b Merge "Mention (unused) RP generation in POST /allocs/{c}" 2018-08-29 07:57:21 +00:00
Zuul 4a1146ae0d Merge "Revert "Don't use '_TransactionContextManager._async'"" 2018-08-28 23:34:37 +00:00
Zuul 0b59c0406a Merge "Don't use '_TransactionContextManager._async'" 2018-08-28 23:34:31 +00:00
Eric Fried e1f3a6df39 Mention (unused) RP generation in POST /allocs/{c}
The schema accepts the (resource provider) 'generation' key in the
resource provider section of the allocations dictionary in the POST
/allocations/{consumer_uuid} request payload. As with PUT
/allocations/{consumer_uuid}, it is ignored. This was missing from the
API reference.

Change-Id: I2586875a60aba7a566f6c1b61ded133b8749031c
2018-08-28 17:34:19 -05:00
Zuul 843bffca47 Merge "Optimize global marker re-lookup in multi_cell_list" 2018-08-28 18:23:57 +00:00
Zuul f5849655d3 Merge "Record cell success/failure/timeout in CrossCellLister" 2018-08-28 16:46:23 +00:00
Zuul cce3208cc2 Merge "Make scheduler.utils.setup_instance_group query all cells" 2018-08-28 16:32:04 +00:00
Stephen Finucane 964832d37d Revert "Don't use '_TransactionContextManager._async'"
This reverts commit bd7d991309 and bumps
the minimum version of oslo.db to 4.40.0, as that is the first version
of the library to include the renamed attribute.

Change-Id: Ic9e7864be3af7ef362cad5648dfc7bdecd104465
Related-Bug: #1788833
2018-08-28 17:18:51 +01:00
Stephen Finucane bd7d991309 Don't use '_TransactionContextManager._async'
In commit 2d532963, all instances of 'async' were replaced with 'async_'
in preparation for Python 3.7. However, one of these should not have
been changed as it refers to an oslo.db object attribute. That attribute
has actually been renamed itself but that rename is only present from
oslo.db 4.40.0 [1]. Thankfully, an alias to the older name is provided
so we use that.

[1] https://github.com/openstack/oslo.db/commit/df6bf34

Change-Id: I1afd0ba34a9ebcb63edb91e880ef60580befb32e
Closes-Bug: #1788833
2018-08-28 17:14:19 +01:00
Zuul a674241d0f Merge "Make monkey patch work in uWSGI mode" 2018-08-28 15:38:04 +00:00
Zuul 6bf864df77 Merge "Deprecate Core/Ram/DiskFilter" 2018-08-28 12:47:37 +00:00
Zuul c40bffa530 Merge "Make instance_list perform per-cell batching" 2018-08-28 04:55:17 +00:00
Yikun Jiang 23ba1c6906 Make monkey patch work in uWSGI mode
There was a eventlet.monkey_patch [1] when we launch a nova
process(like nova-api, nova-compute), but it's invalid under
the uwsgi mode.

But there are concurrency requirements in the api service, such
as, when listing instances cross multiple cells we're using
greenthreads and oslo.db does a time.sleep to allow switching
greenthreads [2].

So, in this patch we add the monkey_patch in the uwsgi
application setup and refactor the monkey patching to use common
code.

[1] https://github.com/openstack/nova/blob/233ea58/nova/cmd/__init__.py#L26
[2] https://github.com/openstack/oslo.db/blob/9c66959/oslo_db/sqlalchemy/engines.py#L51

Closes-bug: #1787331

Change-Id: Ie7bf5d012e2ccbcd63c262ddaf739782afcdaf56
2018-08-28 09:49:01 +08:00
Zuul 8ac9956dd2 Merge "privsep: Handle ENOENT when checking for direct IO support" 2018-08-27 20:41:17 +00:00
Zuul 6e09c2ec26 Merge "[placement] split gigantor SQL query, add logging" 2018-08-27 18:16:28 +00:00
Stephen Finucane ecfcf86538 privsep: Handle ENOENT when checking for direct IO support
We've seen a recent issue that suggest direct IO support checks can fail
in other valid ways than EINVAL, namely, failures with ENOENT or the
FileNotFoundError exception, which is a Python 3-only exception type,
can occur. While we can't test for this without breaking Python 2.7
support, we can mimic this by looking for checking for the errno
attribute of the OSError exception. Do this.

Change-Id: I8aab86bb62cbc8ad538c706af037a30437c7964d
Closes-Bug: #1788922
2018-08-27 17:03:46 +01:00
Zuul 194312d47e Merge "Fix create_resource_provider docstring" 2018-08-27 16:01:14 +00:00
Zuul 798ddd74fd Merge "List instances from all cells explicitly" 2018-08-27 14:35:46 +00:00
Jay Pipes b5ab9f5ace [placement] split gigantor SQL query, add logging
This patch modifies the code paths for the non-granular request group
allocation candidates processing. It removes the giant multi-join SQL
query and replaces it with multiple calls to
_get_providers_with_resource(), logging the number of matched providers
for each resource class requested and filter (on required traits,
forbidden traits and aggregate memebership).

Here are some examples of the debug output:

- A request for three resources with no aggregate or trait filters:

 found 7 providers with available 5 VCPU
 found 9 providers with available 1024 MEMORY_MB
 found 5 providers after filtering by previous result
 found 8 providers with available 1500 DISK_GB
 found 2 providers after filtering by previous result

- The same request, but with a required trait that nobody has, shorts
  out quickly:

 found 0 providers after applying required traits filter (['HW_CPU_X86_AVX2'])

- A request for one resource with aggregates and forbidden (but no
  required) traits:

 found 2 providers after applying aggregates filter ([['3ed8fb2f-4793-46ee-a55b-fdf42cb392ca']])
 found 1 providers after applying forbidden traits filter ([u'CUSTOM_TWO', u'CUSTOM_THREE'])
 found 3 providers with available 4 VCPU
 found 1 providers after applying initial aggregate and trait filters

Co-authored-by: Eric Fried <efried@us.ibm.com>
Closes-Bug: #1786519
Change-Id: If9ddb8a6d2f03392f3cc11136c4a0b026212b95b
2018-08-27 13:50:22 +00:00
Dan Smith 072ea634ae Optimize global marker re-lookup in multi_cell_list
Now that we have cell_uuid stashed on RequestContext, we can avoid
looking up the global marker again in the thread that matches
that cell. This removes one unnecessary lookup-by-value of the marker
value we already have and the associated FIXME.

Change-Id: Id1a0538de4cc3235680579ffa0ecf7f4eb24a1dd
2018-08-27 06:44:33 -07:00
Dan Smith 6a3373bb7b Record cell success/failure/timeout in CrossCellLister
This makes it easier to handle the cells that failed from layers
above, by reporting the cell uuids that reported, failed, and
timed out.

Related to blueprint handling-down-cell
Change-Id: Ied2264982a713bd27351eb690dd47f8531485d4f
2018-08-27 06:44:33 -07:00
Dan Smith c3a77f80b1 Make instance_list perform per-cell batching
This makes the instance_list module support batching across cells
with a couple of different strategies, and with room to add more
in the future.

Before this change, an instance list with limit 1000 to a
deployment with 10 cells would generate a query to each cell
database with the same limit. Thus, that API request could end
up processing up to 10,000 instance records despite only
returning 1000 to the user (because of the limit).

This uses the batch functionality in the base code added in
Iaa4759822e70b39bd735104d03d4deec988d35a1
by providing a couple of strategies by which the batch size
per cell can be determined. These should provide a lot of gain
in the short term, and we can extend them with other strategies
as we identify some with additional benefits.

Closes-Bug: #1787977
Change-Id: Ie3a5f5dc49f8d9a4b96f1e97f8a6ea0b5738b768
2018-08-27 06:44:32 -07:00
Zuul a417db21bb Merge "api: Remove unnecessary default parameter" 2018-08-27 10:55:28 +00:00
Zuul 8ee44744e6 Merge "Batch results per cell when doing cross-cell listing" 2018-08-25 17:53:54 +00:00
Zuul 1fb3272e34 Merge "Clarify which context is used by do_query()" 2018-08-25 03:47:59 +00:00
Zuul 4eb4220008 Merge "Make RecordWrapper record RequestContext and expose cell_uuid" 2018-08-25 03:44:51 +00:00
Zuul 1219f19a08 Merge "doc: Note NUMA topology requirements for numa-aware-vswitches" 2018-08-25 02:11:34 +00:00
Zuul 62503de478 Merge "Merge server usage extension response into server view builder" 2018-08-25 00:23:08 +00:00
melanie witt 14f4c502f9 Make scheduler.utils.setup_instance_group query all cells
To check affinity and anti-affinity policies for scheduling instances,
we use the RequestSpec.instance_group.hosts field to check the hosts
that have group members on them. Access of the 'hosts' field calls
InstanceGroup.get_hosts during a lazy-load and get_hosts does a query
for all instances that are members of the group and returns their hosts
after removing duplicates. The InstanceList query isn't targeting any
cells, so it will return [] in a multi-cell environment in both the
instance create case and the instance move case. In the move case, we
do have a cell-targeted RequestContext when setup_instance_group is
called *but* the RequestSpec.instance_group object is queried early in
compute/api before we're targeted to a cell, so a call of
RequestSpec.instance_group.get_hosts() will result in [] still, even
for move operations.

This makes setup_instance_group query all cells for instances that are
members of the instance group if the RequestContext is untargeted, else
it queries the targeted cell for the instances.

Closes-Bug: #1746863

Change-Id: Ia5f5a0d75953b1154a8de3e1eaa15f8042e32d77
2018-08-24 23:44:58 +00:00
Matt Riedemann 243ba85130 Deprecate Core/Ram/DiskFilter
The time has come.

These filters haven't been necessary since Ocata [1]
when the filter scheduler started using placement
to filter on VCPU, DISK_GB and MEMORY_MB. The
only reason to use them with any in-tree scheduler
drivers is if using the CachingScheduler which doesn't
use placement, but the CachingScheduler itself has
been deprecated since Pike [2]. Furthermore, as of
change [3] in Stein, the ironic driver no longer
reports vcpu/ram/disk inventory for ironic nodes
which will make these filters filter out ironic nodes
thinking they don't have any inventory. Also, as
noted in [4], the DiskFilter does not account for
volume-backed instances and may incorrectly filter
out a host based on disk inventory when it would
otherwise be OK if the instance is not using local
disk.

The related aggregate filters are left intact for
now, see blueprint placement-aggregate-allocation-ratios.

[1] Ie12acb76ec5affba536c3c45fbb6de35d64aea1b
[2] Ia7ff98ff28b7265058845e46b277317a2bfc96d2
[3] If2b8c1a76d7dbabbac7bb359c9e572cfed510800
[4] I9c2111f7377df65c1fc3c72323f85483b3295989

Change-Id: Id62136d293da55e4bb639635ea5421a33b6c3ea2
Related-Bug: #1787910
2018-08-24 19:32:27 -04:00
Eric Fried e322a22303 Document no content on POST /reshaper 204
For consistency with other 204-returning operations in the placement API
reference, mention that POST /reshaper returns no body content on
success.

FUP from https://review.openstack.org/#/c/576927/35/placement-api-ref/source/reshaper.inc@45

Change-Id: I50dda0b161404d96dae35a3a96225f1ac4ec7309
2018-08-24 16:48:49 -05:00
Dan Smith 7bc6de3f24 List instances from all cells explicitly
This is a minor refactor with no functional change that will make
the next patch easier. If we're not limiting cell scope during an
instance list, we just let context.scatter_gather_all_cells() load
and use the global list of cells. Since we will want to count
those cells and adjust the batch size, we might as well just always
pass in the list of cells, which will either be a subset or all
of them.

Change-Id: Id167ab6d01698878d783999d54bcf77f37eec020
2018-08-24 13:29:25 -07:00
Dan Smith 0a88916911 Batch results per cell when doing cross-cell listing
This extends the multi_cell_list module with batching support to avoid
querying N*$limit total results when listing resources across cells.
Instead, if our total limit is over a given threshold, we should query
smaller batches in the per-cell thread until we reach the total limit
or are stopped because the sort feeder has found enough across all cells
to satisfy the requirements. In many cases, this can drop the total number
of results we load and process from N*$limit to (best case) $limit+$batch
or (usual case) $limit+(N*$batch).

Since we return a generator from our scatter-gather function, this should
mean we basically finish the scatter immediately after the first batch query
to each cell database, keeping the threads alive until they produce all the
results possible from their cell, or are terminated in the generator loop
by the master loop hitting the total_limit condition. As a result, the
checking over results that we do immediately after the scatter finishes
will no longer do anything since we start running the query code for the
first time as heapq.merge() starts hitting the generators. So, this brings
a query_wrapper() specific to the multi_cell_list code which can mimic the
timeout and error handling abilities of scatter_gather_cells, but inline
as we're processing so that we don't interrupt the merge sort for a
failure.

Related-Bug: #1787977
Change-Id: Iaa4759822e70b39bd735104d03d4deec988d35a1
2018-08-24 13:29:23 -07:00
Zuul ac6e51c10d Merge "Stash the cell uuid on the context when targeting" 2018-08-24 20:09:29 +00:00
Zuul 8f0968d091 Merge "tests: Create functional libvirt test base class" 2018-08-24 18:41:40 +00:00
Zuul a0c2b26e38 Merge "Make CELL_TIMEOUT a constant" 2018-08-24 18:08:51 +00:00
Zuul a421bd2a8c Merge "tests: Move mocking to setUp" 2018-08-24 16:18:30 +00:00
Stephen Finucane eb4e29c613 doc: Note NUMA topology requirements for numa-aware-vswitches
A guest must have a NUMA topology for numa-aware-vswitches to have any
effect. Call this out in the documentation.

Change-Id: Id0a637bcd0cbce29811acd7e56419350695cd3fd
2018-08-24 17:13:49 +01:00
Stephen Finucane bea2ab0379 api: Remove unnecessary default parameter
This was always set by callers and appears to be here purely to avoid
updating tests. Update the tests and ease comprehension of the function.

Change-Id: Ib22b64ca499ffdb1a32d21f52240e80676ae1165
2018-08-24 14:49:06 +01:00
Zuul 1b77604be1 Merge "Remove noisy DEBUG log" 2018-08-24 08:45:24 +00:00
Zuul a34a18b747 Merge "Correct the release notes related to nova-consoleauth" 2018-08-24 07:57:23 +00:00
Zuul f5c8c4f42e Merge "Merge keypair extension response into server view builder" 2018-08-24 04:14:07 +00:00
Zuul c64f8b331b Merge "Add functional test for affinity with multiple cells" 2018-08-23 22:50:28 +00:00
Zuul 7107eff2a2 Merge "[placement] Add functional test to verify presence of policy" 2018-08-23 21:30:55 +00:00
Zuul 6ad5a20d7c Merge "Update contributor guide for Stein" 2018-08-23 21:30:49 +00:00