Waiting 30 seconds for an evacuate to complete is not enough
time on some slower CI test nodes. This change uses the
same build timeout configuration from tempest to determine
the overall evacuate timeout in our evacuate tests.
Change-Id: Ie5935ae54d2cbf1a4272e93815ee5f67d3ffe2eb
Closes-Bug: #1806925
openstack-dev was decomissioned this night in https://review.openstack.org/621258
Update openstack-dev to openstack-discuss
Change-Id: If51f5d5eb710e06216f6d6981a70d70b6b5783cc
Add secret=true to fixed_key configuration parameter as that value
shouldn't be logged.
Change-Id: Ie6da21e8680b2deb6b1da3add31cd725ba855c1c
Closes-Bug: #1806471
Change I9269ffa2b80e48db96c622d0dc0817738854f602 in Pike
introduced a race condition where creating multiple
servers concurrently can fail the second instances quota
check which happens in conductor after the instance record
is created in the cell database but its related BDMs and
tags are not stored in the cell DB. When deleting the
server from the API, since the BDMs are not in the cell
database with the instance, they are not "seen" and thus
the volume attachments are not deleted and the volume is
orphaned. As for tags, you should be able to see the tags
on the server in ERROR status from the API before deleting
it.
This change adds a functional regression test to show both
the volume attachment and tag issue when we fail the quota
check in conductor.
Change-Id: I21c2189cc1de6b8e4857de77acd9f1ef8b6ea9f6
Related-Bug: #1806064
Some tests weren't calling init_host, so the semaphore was None.
This caused the smoke to come out of nova's tests in ways that
would be less confusing if they'd failed during the testing of
the implementing patch.
Instead, set the semaphore to being unbounded, and then override
that later if the user has in fact specified a limit. This relies
on init_host being called very early, but that should be true
already.
Change-Id: If144be253f78b14cef60200a46aefc02c0e19ced
Closes-Bug: #1806123
This reverts commit bbe88786fc.
The new tests are racy and causing a modest amount of
failures in the gate since the change merged, so it is
probably best to just revert the tests so they can be
robustified.
Change-Id: I18bd68ba6e59aba4c450eb85e6f4450d7044b1e9
Related-Bug: #1806126
gate/post_test_perf_check.sh did some simplistic performance testing of
placement. With the extraction of placement we want it to happen during
openstack/placement CI changes so we remove it here.
The depends-on is to the placement change that turns it on there, using
an independent (and very small) job.
Depends-On: I93875e3ce1f77fdb237e339b7b3e38abe3dad8f7
Change-Id: I30a7bc9a0148fd3ed15ddd997d8dab11e4fb1fe1
An earlier change [1] allowed
[compute]resource_provider_association_refresh to be set to zero to
disable the resource tracker's periodic refresh of its local copy of
provider traits and aggregates. To allow for out-of-band changes to
placement (e.g. via the CLI) to be picked up by the resource tracker in
this configuration (or a configuration where the timer is set to a high
value) this change clears the provider tree cache when SIGHUP is sent to
the compute service. The next periodic will repopulate it afresh from
placement.
[1] Iec33e656491848b26686fbf6fb5db4a4c94b9ea8
Change-Id: I65a7ee565ca5b3ec6c33a2fd9e39d461f7d90ed2
Add a separate method, _guest_needs_pcie(), to check for the
prerequisites for adding PCIe root port entries. And simplify the
monster 'if' conditional into multiple readable ones.
While at it, add a TODO note about an assumption (which can become
invalid in the future) we're making about QEMU machine types.
Change-Id: I05c9168569c4c3eeeb83695a053b2fd94240157c
Suggested-by: Andrea Bolognani <abologna@redhat.com>
Signed-off-by: Kashyap Chamarthy <kchamart@redhat.com>
If the first argument of assertTrue is True,
the assertion is always passed.
Fix it because it is useless.
Change-Id: Ie954fc770c61956a80d472190e97646a39b7420f
Closes-Bug: #1805800
This patch including 2 changes:
1. Change the default values for
CONF.(cpu|ram|disk)_allocation_ratio to ``None``
2. Change the resource tracker to overwrite the compute node's
allocations ratios to the value of the XXX_allocation_ratio if
the value of these options is NOT ``None`` or ``0.0``.
The "0.0" condition is for upgrade impact, and it will be
removed in the next version (T version).
Change-Id: I6893d63dc5f29bc2eb348fe0aa9fbc8490e6eb40
blueprint: initial-allocation-ratios
For some reason we were only reading deleted instances when loading generic
fields and not things like flavor. That weird behavior isn't very helpful,
so this makes us always read deleted for that case. Some of the fields, like
tags, will short-circuit that and just immediately lazy-load an empty set.
But for anything else, we should allow reading that data if it's still there.
With this change, we are able to remove a specific read_deleted='yes' usage
from ComputeManager._destroy_evacuated_instances() which is handled with
the generic solution. TestEvacuateDeleteServerRestartOriginalCompute asserts
that the evacuate scenario is still fixed.
Related-Bug: #1794996
Related-Bug: #1745977
Change-Id: I8ec3a3a697e55941ee447d0b52d29785717e4bf0
This moves _check_allocation_during_evacuate into the
ProviderUsageBaseTestCase base class and drops the
overridden methods from TestEvacuateDeleteServerRestartOriginalCompute.
Change-Id: I6a084031c1d3ffa72b09d2194c44cdd80cc875fa
The _destroy_evacuated_instances method on compute
startup tries to cleanup guests on the hypervisor and
allocations held against that compute node resource
provider by evacuated instances, but doesn't take into
account that those evacuated instances could have been
deleted in the meantime which leads to a lazy-load
InstanceNotFound error that kills the startup of the
compute service.
This change does two things in the _destroy_evacuated_instances
method:
1. Loads the evacuated instances with a read_deleted='yes'
context when calling _get_instances_on_driver(). This
should be fine since _get_instances_on_driver() is already
returning deleted instances anyway (InstanceList.get_by_filters
defaults to read deleted instances unless the filters tell
it otherwise - which we don't in this case). This is needed
so that things like driver.destroy() don't raise
InstanceNotFound while lazy-loading fields on the instance.
2. Skips the call to remove_allocation_from_compute() if the
evacuated instance is already deleted. If the instance is
already deleted, its allocations should have been cleaned
up by its hosting compute service (or the API).
The functional regression test is updated to show the bug is
now fixed.
Change-Id: I1f4b3540dd453650f94333b36d7504ba164192f7
Closes-Bug: #1794996
Previously, there was a just a comment about removing usage on the
destination node. This is incorrect: usage is removed on the compute
host specified by the nodename parameter to the method. This patch
corrects this in a proper docstring.
Change-Id: I2f676966136a78bb9600626852584f838cb08c5b
The note in the cpu/ram/disk allocation ratio config
option help was referring to commit e7840cdf1 from Pike
when the ironic driver reported allocation_ratio=1.0
for VCPU/MEMORY_MB/DISK_GB resource inventory.
That code was removed in commit a985e34cd so we can
remove the related note from the config option help as
it no longer applies.
Change-Id: Ifd9dba0c24fde25d54761077c1374313019af1d8
Add a default bug tag for nova doc in doc/source/conf.py.
The 'doc' tag(*) should be set for document bug
in bug reports by default.
*: https://wiki.openstack.org/wiki/Nova/BugTriage#Tag_Owner_List
TrivialFix
Change-Id: Ib2de207d368d248464770fd0a9452e325f0a0596