The mentality of being able to continuously deliver nova
has been around since the beginning with Rackspace public
cloud trying to CD openstack as close to master as possible.
This has implications for how code series are structured,
reviewed and merged. For the most part this seems to be tribal
knowledge and we don't have anything very obvious in the nova
docs about it, and not all projects in openstack necessarily
subscribe to this mentality anymore, or do so grudgingly, but
it's worth documenting it in nova while still applied here.
Change-Id: Ieff87dbd748318f1b7f879a136ff25081dac321e
The development policies section on code review was linking to the
generic openstack review guidelines but we have nova-specific
guidelines as well so this changes the policies page to link to the
nova code review guidelines, links the general guidelines into the
nova page, and also fixes a formatting issue in the nova code review
guidelines page.
Change-Id: I725570d0d737f18fe8b105dc8382c4abcfdef295
If we're booting from an existing volume but the instance is not being
created in a requested availability zone, and cross_az_attach=False,
we'll fail with a 400 since by default the volume is in the 'nova'
AZ and the instance does not have an AZ set - because one wasn't requested
and because it's not in a host aggregate yet.
This refactors that AZ validation during server create in the API to
do it before calling _validate_bdm so we get the pre-existing volumes
early and if cross_az_attach=False, we validate the volume zone(s) against
the instance AZ. If the [DEFAULT]/default_schedule_zone (for instances) is
not set and the volume AZ does not match the
[DEFAULT]/default_availability_zone then we put the volume AZ in the request
spec as if the user requested that AZ when creating the server.
Since this is a change in how cross_az_attach is used and how the instance
default AZ works when using BDMs for pre-existing volumes, the docs are
updated and a release note is added.
Note that not all of the API code paths are unit tested because the
functional test coverage does most of the heavy lifting for coverage.
Given the amount of unit tests that are impacted by this change, it is
pretty obvious that (1) many unit tests are mocking at too low a level and
(2) functional tests are better for validating these flows.
Closes-Bug: #1694844
Change-Id: Ib31ba2cbff0ebb22503172d8801b6e0c3d2aa68a
This adds a functional test to validate migrate-related behaviors
with multiple cells. We can't migrate across cells yet and this
validates that such an operation is properly blocked.
Co-Authored-By: Matt Riedemann <mriedem.os@gmail.com>
Related to blueprint cells-aware-api
Change-Id: I79736624dbebd7085fd8a5fe810a80312ebb367f
There have been two recent issues [1][2] caused by starting multiple
instances of the same service in tests. This can cause races when the
services' (shared) state conflicts.
With this patch, the nexus of nova service starting,
nova.test.TestCase.start_service, is instrumented to keep track of how
many of each service we are running. If we try to run the scheduler
service more than once, we fail.
We could probably do the same thing for conductor, though that's less
important (for now) because conductor is stateless (for now).
[1] https://bugs.launchpad.net/nova/+bug/1844174
[2] https://review.opendev.org/#/c/681059/ (not a nova service, but same
class of problem)
Change-Id: I56d3cb17260dad8b88f03c0a7b9688efb3258d6f
This test uses the ServersTestBase._wait_for_state_change method
which waits for the status to change *from* what is provided, so
when creating a server and waiting for the status to change from
ACTIVE makes _wait_for_state_change return immediately since the
status starts as BUILD. This can lead to a failure when the test
tries to migrate a server that is in BUILD status rather than
ACTIVE status.
This fixes the test by using this version of _wait_for_state_change
correctly, not to be confused with the same method in
InstanceHelperMixin which is more accurate (it waits for the
terminal status of the server operation).
Change-Id: I56ff050194d0eb465b8c41795fdea2a8b0d764d6
Closes-Bug: #1850514
This attempts to log some statistics about a precache operation so that,
barring something more complicated on the operator side, there is some
useful information in the logs about what is going on.
Related to blueprint image-precache-support
Change-Id: I550afc344eca30c366ba0e5342966bcbaac96bfe
We dropped the tempest-full python2 job from our zuul config awhile
back with change I93e938277454a1fc203b3d930ec1bc1eceac0a1e. Since we
track ceph job health by how its failure rate compares with our basic
tempest full job, this updates our config to run the python3 version
of the ceph job by default and drops the python2 version of the job.
Change-Id: I92e01b896751f7f29a0b2b826c33cb2c74b8ced4
This adds the functional notification sample test for the
aggregate.cache_images.start and aggregate.cache_images.end
versioned notifications.
I also added a comment to the docs builder code since it took
me a bit to figure out how to get the notification sample
linked into the docs, and for whatever reason figured that out
by looking through code rather than our nicely detailed docs
that already explain it.
Part of blueprint image-precache-support
Change-Id: I0869979a1b8a0966f0e7b49e5a5984f76d7d67cd
Change Iba797243d2a137b551223165a1af1a8676bcea02 was a bit
overzealous in using {[testenv]deps} and changed the docs
tox target to also install requirements.txt and
test-requirements.txt which means for docs builds we're
installing things like psycopg2 which we shouldn't be doing
if we don't have the correct native packages installed to
build those types of dependencies.
Change-Id: Ib718911596b93ec6ec7e899210300d2f0d9572ed
Closes-Bug: #1849870
The tempest-slow-py3 job is, well, very slow. It takes
over 2.5 hours sometimes to complete. There are a few
reasons beyond it just running slow tests but it runs
all slow tests serially and for nova it's testing things
we don't care about like network scenario tests like
test_slaac_from_os. The one benefit we get from running
tempest-slow-py3 is that it's multinode and there are
certain slow test scenarios for multinode that are
important for test coverage.
This change drops the tempest-slow-py3 job from our
job list and changes nova-next to be multinode. The
nova-next job runs a select set of tempest compute API
and scenario tests only and runs them concurrently, which
in the gate is 4 workers at a time. The nova-next job
will take a bit longer since we have to setup the subnode
now but overall it should still be faster than the
tempest-slow-py3 job and we'll save on one more node
required from nodepool to run jobs against nova changes.
The USE_PYTHON3 variable can be dropped from the nova-next
job definition now that it extends tempest-multinode-full-py3.
Depends-On: https://review.opendev.org/690469/
Change-Id: I1b7d71e833bf0743f22d7fa0241c9d1bbcd0faac
Previously errors while disconnecting volumes from the source host
during post_live_migration within LibvirtDriver would result in the
overall failure of the migration. This would also mean that while the
instance would be running on the destination it would still be listed as
running on the source within the db.
This change simply ignores any exceptions raised while attempting to
disconnect volumes on the source. These errors can be safely ignored as
they will have no impact on the running instance on the destination.
In the future Nova could wire up the force and ignore_errors kwargs when
calling down into the associated os-brick connectors to help avoid this.
Closes-Bug: #1843639
Change-Id: Ieff5243854321ec40f642845e87a0faecaca8721
Change I1aa3ca6cc70cef65d24dec1e7db9491c9b73f7ab in Queens,
which was backported through to Newton, introduced a regression
when listing deleted servers with a marker because it assumes
that if BuildRequestList.get_by_filters does not raise
MarkerNotFound that the marker was found among the build requests
and does not account for that get_by_filters method short-circuiting
if filtering servers with deleted/cleaned/limit=0. The API code
then nulls out the marker which means you'll continue to get the
marker instance back in the results even though you shouldn't,
and that can cause an infinite loop in some client-side tooling like
nova's CLI:
nova list --deleted --limit -1
This fixes the bug by raising MarkerNotFound from
BuildRequestList.get_by_filters if we have a marker but are
short-circuiting and returning early from the method based on
limit or filters.
Change-Id: Ic2b19c2aa06b3059ab0344b6ac56ffd62b3f755d
Closes-Bug: #1849409
Change I1aa3ca6cc70cef65d24dec1e7db9491c9b73f7ab in Queens,
which was backported through to Newton, introduced a regression
when listing deleted servers with a marker because it assumes
that if BuildRequestList.get_by_filters does not raise
MarkerNotFound that the marker was found among the build requests
and does not account for that get_by_filters method short-circuiting
if filtering servers with deleted/cleaned/limit=0. The API code
then nulls out the marker which means you'll continue to get the
marker instance back in the results even though you shouldn't,
and that can cause an infinite loop in some client-side tooling like
nova's CLI:
nova list --deleted --limit -1
This adds a functional recreate test for the regression which will
be updated when the bug is fixed.
Change-Id: I324193129acb6ac739133c7e76920762a8987a84
Related-Bug: #1849409