Change I9269ffa2b80e48db96c622d0dc0817738854f602 in Pike
introduced a race condition where creating multiple
servers concurrently can fail the second instances quota
check which happens in conductor after the instance record
is created in the cell database but its related BDMs and
tags are not stored in the cell DB. When deleting the
server from the API, since the BDMs are not in the cell
database with the instance, they are not "seen" and thus
the volume attachments are not deleted and the volume is
orphaned. As for tags, you should be able to see the tags
on the server in ERROR status from the API before deleting
it.
This change adds a functional regression test to show both
the volume attachment and tag issue when we fail the quota
check in conductor.
Change-Id: I21c2189cc1de6b8e4857de77acd9f1ef8b6ea9f6
Related-Bug: #1806064
This reverts commit bbe88786fc.
The new tests are racy and causing a modest amount of
failures in the gate since the change merged, so it is
probably best to just revert the tests so they can be
robustified.
Change-Id: I18bd68ba6e59aba4c450eb85e6f4450d7044b1e9
Related-Bug: #1806126
If the first argument of assertTrue is True,
the assertion is always passed.
Fix it because it is useless.
Change-Id: Ie954fc770c61956a80d472190e97646a39b7420f
Closes-Bug: #1805800
Previously, there was a just a comment about removing usage on the
destination node. This is incorrect: usage is removed on the compute
host specified by the nodename parameter to the method. This patch
corrects this in a proper docstring.
Change-Id: I2f676966136a78bb9600626852584f838cb08c5b
Introduce an I/O semaphore to limit the number of concurrent
disk-IO-intensive operations. This could reduce disk contention from
image operations like image download, image format conversion, snapshot
extraction, etc.
The new config option max_concurrent_disk_ops can be set in nova.conf
per compute host and would be virt-driver-agnostic. It is default to 0
which means no limit.
blueprint: io-semaphore-for-concurrent-disk-ops
Change-Id: I897999e8a4601694213f068367eae9608cdc7bbb
Signed-off-by: Jack Ding <jack.ding@windriver.com>
This commit adds support for the High Precision Event Timer (HPET) for
x86 guests in the libvirt driver. The timer can be set by image property
'hw_time_hpet'. By default it remains turned off. When it is turned on
the HPET timer is activated in libvirt.
If the image property 'hw_time_hpet' is incorrectly set to a
non-boolean, the HPET timer remains turned off.
blueprint: support-hpet-on-guest
Change-Id: I3debf725544cae245fd31a8d97650392965d480a
Signed-off-by: Jack Ding <jack.ding@windriver.com>
There are cases where ``root_provider_id`` of a resource provider is
set to NULL just after it is upgraded to the Rocky release. In such
cases getting allocation candidates raises a Keyerror.
This patch fixes that bug for cases there is no sharing or nested
providers in play.
Change-Id: I9639d852078c95de506110f24d3f35e7cf5e361e
Closes-Bug:#1799892
The Cinder v1 API was deprecated in Juno on removed completely in
Queens. We no do not support compatibility between Stein Nova and Queens
Cinder, so this checking can be removed.
Change-Id: I947f50e921159f66b425f10e31a08a3e0840228e
Signed-off-by: Sean McGinnis <sean.mcginnis@gmail.com>
At the Stein summit (and previous discussions) the topic of exposing
cellsv2 out of the API came up again. This patch adds two FAQ entries
reflecting my notes from early design decisions about why we did not
want to do that, along with more recent examples, such as FFU.
These are my feelings on the subject and I was asked to put these into
FAQ form for posterity to make the discussion easier in the future. I
would recommend that we agree on these and then codify them here.
Change-Id: I0499e141456fcca63f95bad25503c4e86c6aa369
Conductor RPC calls the scheduler to get hosts during
server create, which in a multi-create request with a
lot of servers and the default rpc_response_timeout, can
trigger a MessagingTimeout. Due to the old
retry_select_destinations decorator, conductor will retry
the select_destinations RPC call up to max_attempts times,
so thrice by default. This can clobber the scheduler and
placement while the initial scheduler worker is still
trying to process the beefy request and allocate resources
in placement.
This has been recreated in a devstack test patch [1] and
shown to fail with 1000 instances in a single request with
the default rpc_response_timeout of 60 seconds. Changing the
rpc_response_timeout to 300 avoids the MessagingTimeout and
retry loop.
Since Rocky we have the long_rpc_timeout config option which
defaults to 1800 seconds. The RPC client can thus be changed
to heartbeat the scheduler service during the RPC call every
$rpc_response_timeout seconds with a hard timeout of
$long_rpc_timeout. That change is made here.
As a result, the problematic retry_select_destinations
decorator is also no longer necessary and removed here. That
decorator was added in I2b891bf6d0a3d8f45fd98ca54a665ae78eab78b3
and was a hack for scheduler high availability where a
MessagingTimeout was assumed to be a result of the scheduler
service dying so retrying the request was reasonable to hit
another scheduler worker, but is clearly not sufficient
in the large multi-create case, and long_rpc_timeout is a
better fit for that HA type scenario to heartbeat the scheduler
service.
[1] https://review.openstack.org/507918/
Change-Id: I87d89967bbc5fbf59cf44d9a63eb6e9d477ac1f3
Closes-Bug: #1795992
The 'locked' query parameter is not supported
in the "List Servers Detailed" API.
So replace examples using the 'locked' query parameter
with examples using another query parameters.
Change-Id: Ibcea6147dd6716ad544e7ac5fa0df17f8c397a28
Closes-Bug: #1801904
Since we're extracting placement, we shouldn't be referring to artifacts
in placement namespaces anymore. This patch removes such a reference
from the libvirt driver unit tests.
Change-Id: Idc0c2a0c0f885a21dff412bea761bac82a029eb5
Because of a change [1] in the tokenize package in the stdlib of
recent pythons, one of the tests for the hacking checks can fail.
This change skips the test on newer pythons and leaves a TODO
to fix it.
[1] see https://bugs.python.org/issue33899 and
https://bugs.python.org/issue35107
Change-Id: I64744a8144fcf630eea609eb2b2d14974f4fd4bb
Related-Bug: #1804062
In [0] the way parameters are passed to the glance client was changed.
Sadly one required argument was dropped during this, we need to insert
it again in order to fix e.g. rbd backend usage.
[0] https://review.openstack.org/614351
Change-Id: I5a4cfb3c9b8125eca4f6c9561d3023537e606a93
Closes-Bug: 1803717
Add the description about custom resource classes and
overriding standard resource classes in the "Flavors" document.
Change-Id: I5b804db70d229696e7b7c5b5db16946cf1f1c49f
Closes-Bug: #1800663
This makes the _instances_cores_ram_count() method only query for instances
in cells that the tenant actually has instances landed in. We do this by
getting a list of cell mappings that have instance mappings owned by the
project and limiting the scatter/gather operation to just those cells.
Change-Id: I0e2a9b2460145d3aee92f7fddc4f4da16af63ff8
Closes-Bug: #1771810
The current check uses an alignment of 512 bytes and will fail when the
underlying device has sectors of size 4096 bytes, as is common e.g. for
NVMe disks. So use an alignment of 4096 bytes, which is a multiple of
512 bytes and thus will cover both cases.
Change-Id: I5151ae01e90506747860d9780547b0d4ce91d8bc
Closes-Bug: 1801702
Co-Authored-By: Alexandre Arents <alexandre.arents@corp.ovh.com>
Add a description about the sort order
in the "List Migrations" (GET /os-migrations) API.
Change-Id: Iaa8e264ca95b69f3c97a6848918862ee22922de1
Closes-Bug: #1801789