Commit Graph

53777 Commits

Author SHA1 Message Date
Matt Riedemann 95e782dfd8 Stop running tempest-multinode-full
The job was added to nova with change:

  05ab017907

As I explained in the mailing list [1], we should
have sufficient test coverage in other voting jobs
that run against nova changes with the use of the
tempest-full-py3, tempest-slow-py3, nova-next, and
nova-(grenade)live-migration jobs. The tempest-full
and nova-next jobs are single-node jobs which run
compute API tests and some scenario tests but no
slow tests and no live migration. The tempest-slow
job is multinode and only runs slow tests. The
live migration jobs run live migration tests over
two nodes (and nova-live-migration also runs
evacuate tests). So tempest-multinode-full is
probably mostly redundant, a waste of time and
resources, especially since it's non-voting.

[1] http://lists.openstack.org/pipermail/openstack-discuss/2019-March/004139.html

Change-Id: Id2f748d4b119002292bda45689472a68254079ee
2019-03-21 23:20:30 -04:00
Zuul c993d4fe2f Merge "Require python-ironicclient>=2.7.0" 2019-03-22 02:55:13 +00:00
Zuul c3386126e0 Merge "Move libvirt calculation of machine type to utils.py" 2019-03-21 18:29:02 +00:00
Zuul c9977f2aed Merge "Fix links to neutron QoS minimum bandwidth doc" 2019-03-21 16:41:35 +00:00
Zuul aa1bfb645f Merge "Add a prelude release note for the 19.0.0 Stein GA" 2019-03-21 15:59:33 +00:00
Zuul acbbc3eb8b Merge "Documentation for bandwidth support" 2019-03-21 15:59:23 +00:00
Balazs Gibizer c4295f87ad Fix links to neutron QoS minimum bandwidth doc
The QoS minimum bandwidth feature will have a separate doc from the
generic QoS neutron doc. This patch updates the links in the release
notes and api version history of the 2.72 microversion

blueprint: bandwidth-resource-provider

Depends-On: https://review.openstack.org/#/c/640390
Change-Id: Ic753112cf73cb10a6e377bc24c6ee51a057c69f8
2019-03-21 11:47:19 +01:00
Chris Dent 0dfbcd7464 Don't register placement opts mutiple times in a test
The test_local_delete_removes_allocations_after_compute_restart test
was trying to register placement config opts 3 times when only once
is necessary, and if there are CLI opts being registered, only once is
allowed. With change I4cd3d637878eb5bb798b78fd73f5be99e141da9d in
placement, those opts gained some CLI opts, causing this test to
fail.

The depends-on is to a change in the placement-side PlacementFixture
to make it possible to not register opts when calling the fixture,
allowing the safe reuse of the already registered config.

Depends-On: I360a306b5d05ada75274733038b73ec2f2bdc4d4
Change-Id: I042e41ac8c41c0e5f0389904eb548e0e97d54c60
Closes-Bug: #1821092
2019-03-20 22:33:32 +00:00
Zuul 59f1f187e5 Merge "docs: Misc cleanups" 2019-03-20 18:30:46 +00:00
Zuul 45d66abe6d Merge "Add known issue for minimum bandwidth resource leak" 2019-03-20 17:16:51 +00:00
melanie witt 70989c3eb5 Add known issue for minimum bandwidth resource leak
Nova will leak minimum bandwidth resources in placement if a user
deletes a bound port from Neutron out-of-band. This adds a note about
how users can work around the issue.

Related-Bug: #1820588

Change-Id: I41f42c1a7595d9e6a73d1261bf1ac1d47ddadcdf
2019-03-20 10:59:13 -04:00
melanie witt 028418788b Add a prelude release note for the 19.0.0 Stein GA
Depends-On: https://review.openstack.org/642064
Depends-On: https://review.openstack.org/644293

Change-Id: I9ef2a5e597345e3919cbd553178ba5c978f52984
2019-03-20 09:51:33 -05:00
Zuul 592658aafc Merge "Move slight bonkers IP management to privsep." 2019-03-20 13:21:29 +00:00
Stephen Finucane ebbd84fbe8 docs: Misc cleanups
Some random cleanups:

- Don't add the root or 'doc/source' directories to PYTHONPATH - it's
  unnecessary since we install nova (ruling out the first) and don't
  import anything from the latter
- Fix weird indentation
- Remove 'sphinx.ext.coverage', which is used to measure API doc
  coverage. This is unnecessary since we don't publish API docs, save
  for the versioned notification docs
- Remove unnecessary settings
  - 'exclude_patterns' referred to directories that haven't existed for
    a long time
  - 'source_suffix', 'add_module_names' and 'show_authors' were set to
    the default value
  - 'release', 'version' and 'html_last_updated_fmt' are all set
    automatically by 'openstackdoctheme' now
  - 'modindex_common_prefix' is useless since we don't expose a module
    index

All rolled into one patch for efficiencies sake.

Change-Id: I0f70c6d71299dedc59884f2bb39c8ea3c2ca8eff
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
2019-03-20 09:42:03 +00:00
Zuul 4f5c8d3f83 Merge "Add docs for compute capabilities as traits" 2019-03-20 05:09:12 +00:00
Zuul b2a149d95e Merge "Remove "Fixing the Scheduler DB model" from schedule evolution doc" 2019-03-20 03:56:22 +00:00
Adam Spiers d2f8995103 Move libvirt calculation of machine type to utils.py
The libvirt driver contains some code to calculate the
default machine type given an architecture, by looking it up
in CONF.libvirt.hw_machine_type.

This code will need to be reused when introducing calls to
libvirt's getDomainCapabilities() API, which requires the
machine type as one of the parameters.  However those calls
will need to be made from nova.virt.libvirt.host.Host which
has no access to the driver, so move the machine type
calculation code into nova.virt.libvirt.utils so that it can
be reused by both classes.

Also add some unit tests, and warn when an invalid config
value is used.

blueprint: amd-sev-libvirt-support
Change-Id: I055918ff16766c5b106d794a111ad8af8ff9ab23
2019-03-19 22:28:33 +00:00
Zuul 05a01a14b7 Merge "Use assertXmlEqual() helper for all XML comparison tests" 2019-03-19 20:23:04 +00:00
Zuul a295324876 Merge "Clarify policy shortcomings in policy enforcement doc" 2019-03-19 18:14:23 +00:00
Zuul b459c58a5b Merge "Remove additional policy configuration details from policy doc" 2019-03-19 18:14:17 +00:00
Zuul 56947658b0 Merge "Remove unnecessary default provider_tree when getting traits" 2019-03-19 15:59:36 +00:00
Matt Riedemann c16c3062e7 Add docs for compute capabilities as traits
Change I15364d37fb7426f4eec00ca4eaf99bec50e964b6 added the
ability for the compute service to report a subset of driver
capabilities as standard COMPUTE_* traits on the compute node
resource provider.

This adds administrator documentation to the scheduler docs
about the feature and how it could be used with flavors. There
are also some rules and semantic behavior around how these traits
work so that is also documented.

Note that for cases #3 and #4 in the "Rules" section the
update_available_resource periodic task in the compute service
may add the compute-owned traits again automatically but it
depends on the [compute]/resource_provider_association_refresh
configuration option, which if set to 0 will disable that auto
refresh and a restart or SIGHUP is required. To avoid confusion
in these docs, I have opted to omit the mention of that option
and just document the action that will work regardless of
configuration which is to restart or SIGHUP the compute service.

Change-Id: Iaeec92e0b25956b0d95754ce85c68c2d82c4a7f1
2019-03-19 10:09:55 -04:00
Zuul f58f73978e Merge "Remove stale aggregates notes from scheduler evolution doc" 2019-03-19 07:56:25 +00:00
Zuul 0962bb3b2f Merge "qemu: Make disk image conversion dramatically faster" 2019-03-19 00:14:47 +00:00
Lance Bragstad c8b02af65a Clarify policy shortcomings in policy enforcement doc
This commit updates the list of issues with policy enforcement and
describe some of the benefits for operators and developers if we fix
these issues.

Change-Id: Ie5ba2375fd32611aca360765af01c1ba6432b45e
2019-03-18 23:50:01 +00:00
Lance Bragstad 5d38069f66 Remove additional policy configuration details from policy doc
This is removing additional details that were originally reviewed in:

  I263b2f72037a588623958baccacf78fb6a6be05d

The policy and docs in code work that nova completed in Newton.

Change-Id: I66105fa90036db50249b62fc34442b667a5ee1db
2019-03-18 23:49:33 +00:00
Adam Spiers c29f7026ca Remove unnecessary default provider_tree when getting traits
I15364d37fb7426f4eec00ca4eaf99bec50e964b6 introduced a new
_get_traits() method in ResourceTracker for getting traits from a
provider tree and then merging with capability traits from the driver.
However it included a default of None for the provider_tree parameter
which was a remnant of earlier iterations of that change.  Since this
method is always passed a ProviderTree, remove the superfluous default
of None.

Change-Id: I1868485912d9a8a330bde50836808accf04c728d
2019-03-18 20:16:00 +00:00
Kashyap Chamarthy e7b64eaad8 qemu: Make disk image conversion dramatically faster
tl;dr: Use 'writeback' instead of 'writethrough' as the cache mode of
the target image for `qemu-img convert`.  Two reasons: (a) if the image
conversion completes succesfully, then 'writeback' calls fsync() to
safely write data to the physical disk; and (b) 'writeback' makes the
image conversion a _lot_ faster.

Back-of-the-envelope "benchmark" (on an SSD)
--------------------------------------------

(Ran both the tests thrice each; version: qemu-img-2.11.0)

With 'writethrough':

    $> time (qemu-img convert -t writethrough -f qcow2 -O raw \
            Fedora-Cloud-Base-29.qcow2 Fedora-Cloud-Base-29.raw)
    real    1m43.470s
    user    0m8.310s
    sys     0m3.661s

With 'writeback':

    $> time (qemu-img convert -t writeback  -f qcow2 -O raw \
            Fedora-Cloud-Base-29.qcow2 5-Fedora-Cloud-Base-29.raw)

    real    0m7.390s
    user    0m5.179s
    sys     0m1.780s

I.e. ~103 seconds of elapsed wall-clock time for 'writethrough' vs. ~7
seconds for 'writeback' -- IOW, 'writeback' is nearly _15_ times faster!

Details
-------

Nova commit e6ce9557f8 ("qemu-img do not
use cache=none if no O_DIRECT support") was introduced to make instances
boot on filesystems that don't support 'O_DIRECT' (which bypasses the
host page cache and flushes data directly to the disk), such as 'tmpfs'.
In doing so it introduced the 'writethrough' cache for the target image
for `qemu-img convert`.

This patch proposes to change that to 'writeback'.

Let's addresses the 'safety' concern:

  "What about data integrity in the event of a host crash (especially
   on shared file systems such as NFS)?"

Answer: If the host crashes mid-way during image conversion, then
neither "data integrity" nor the cache mode in use matters.  But if the
image conversion completes _succesfully_, then 'writeback' will safely
write the data to the physical disk, just as 'writethough' does.

So we are as safe as we can, but with the extra benefit of image
conversion being _much_ faster.

        * * *

The `qemu-img convert` command defaults to 'cache=writeback' for the
source image.  And 'cache=unsafe' for the target, because if `qemu-img`
"crashes during the conversion, the user will throw away the broken
output file anyway and start over"[1].  And `qemu-img convert`
supports[2] fsync() for the target image since QEMU 1.1 (2012).

[1] https://git.qemu.org/?p=qemu.git;a=commitdiff;h=1bd8e175
    -- "qemu-img convert: Use cache=unsafe for output image"
[2] https://git.qemu.org/?p=qemu.git;a=commitdiff;h=80ccf93b
    -- "qemu-img: let 'qemu-img convert' flush data"

Closes-Bug: #1818847

Change-Id: I574be2b629aaff23556e25f8db0d740105be6f07
Signed-off-by: Kashyap Chamarthy <kchamart@redhat.com>
Looks-good-to-me'd-by: Kevin Wolf <kwolf@redhat.com>
2019-03-18 14:53:39 -05:00
Lance Bragstad 0f1b2e3a63 Remove obsolete policy configuration details from docs
The policy-enforcement document was written prior to any of the
policy-in-code or policy documentation efforts took place. This
commit updates the developer reference for policy to remove these
details since they have already been implemented.

Subsequent patches will update details of this document by taking into
account the recent keystone and oslo changes that help fix the
original issues described in this document.

Change-Id: I263b2f72037a588623958baccacf78fb6a6be05d
2019-03-18 13:53:14 +00:00
Zuul 926e584136 Merge "Pass kwargs to exception to get better format of error message" 2019-03-18 10:57:11 +00:00
Balazs Gibizer 47287f6f94 Documentation for bandwidth support
blueprint: bandwidth-resource-provider
Depends-On: https://review.openstack.org/#/c/640390
Depends-On: https://review.openstack.org/#/c/621494

Change-Id: I166787e092e16857d20f89fba965be2a5509fb4b
2019-03-18 11:24:56 +01:00
Michael Still 4b46c3ba88 Move slight bonkers IP management to privsep.
Change-Id: Ifdbee5c9f84211314d57e31ab84419987fec4737
2019-03-17 23:19:08 +00:00
Zuul be8af28a4f Merge "Trivial typo fix for REST API in policy enforcement docs" 2019-03-15 17:48:01 +00:00
Zuul ca39bdd17e Merge "Add descriptions of numbered resource classes and traits" 2019-03-15 17:47:53 +00:00
Zuul e272b280f1 Merge "Remove resize caveat from conductor docs" 2019-03-15 17:47:44 +00:00
Zuul d1797d8f58 Merge "docs: cleanup driver parity scope section" 2019-03-15 17:47:36 +00:00
Matt Riedemann 0c72e63948 Remove "Fixing the Scheduler DB model" from schedule evolution doc
Blueprint detach-service-from-computenode in Kilo decoupled the
compute node and services concepts so this section is no longer
relevant and can be removed from the doc - it's no longer evolving.

Change-Id: Ibba2aa83b0afe2be05415b69a1ff8ae86866b860
Related-Bug: #1820283
2019-03-15 11:44:06 -04:00
Matt Riedemann 18c40cacc1 Remove stale aggregates notes from scheduler evolution doc
Since I901184cb1a4b6eb0d6fa6363bc6ffbcaa0c9d21d in Kilo the
aggregates information about a HostState object (which is a
wrapper over a ComputeNode) is cached in the scheduler, so the
comments in the scheduler evolution doc about not accessing the
aggregates table in the DB from filters/weighers and such is
extremely out of date and should just be removed.

Change-Id: Ibcbad227813d3b37b4e314eddbf3bae6e85652ea
Related-Bug: #1820283
2019-03-15 11:39:33 -04:00
Zuul d40125ef7f Merge "add python 3.7 unit test job" 2019-03-15 14:15:41 +00:00
Zuul c8f7246343 Merge "Avoid crashing while getting libvirt capabilities with unknown arch names" 2019-03-15 13:13:56 +00:00
Matt Riedemann 0a44d3ae0a Trivial typo fix for REST API in policy enforcement docs
Change-Id: If17a910f8a891ce93491d931c95f65d9fd9529e5
2019-03-15 08:33:12 -04:00
Matt Riedemann 1308d644bb Remove resize caveat from conductor docs
This document was written back in the liberty release [1]
and says that conductor is not used for orchestrating the
resize/migrate flow, but given the description of how
conductor is used to orchestrate scheduling and reschedules
during a server create, it is unclear why the doc says that
resize is not used the same way since it is used for rescheduling
when prep_resize fails in a selected dest compute. This removes
the caveat to reflect reality.

[1] Ieb9134302d21a11fe9b9ee876bb7b0dd32b437e1

Change-Id: I932a7ac6870a3f9d26556c23c9074115963b3c27
2019-03-15 08:02:52 -04:00
Matt Riedemann 5de08c0966 docs: cleanup driver parity scope section
This fixes some grammar issues, links to the interop
page and fixes a misuse of tenant.

Change-Id: I3ce0e130e3691240a625c67dfb6123bafe7f48b8
2019-03-15 08:01:19 -04:00
Zuul 5ca858eaa7 Merge "Add functional test to delete a server while in VERIFY_RESIZE" 2019-03-15 04:13:23 +00:00
zhufl 40cbea18e6 Pass kwargs to exception to get better format of error message
If we do not pass kwargs to exception, the parameter will be deemed
as message and msg_fmt is ignored, so the message will be displayed
directly. This is to pass kwargs to some exceptions, to get better
format of error message.

Change-Id: I66677a90430d9e6699619539cb8f575f57b19433
2019-03-15 10:42:18 +08:00
Dan Smith 71df650d0a Avoid crashing while getting libvirt capabilities with unknown arch names
In _get_instance_capabilities() we get a list of host capabilities and then
build a list of arches supported by the virt type of an instance to arrive
at the list of possibilities for the instance. We check each of those
against our enum, but fail to gracefully skip unsupported values should we
encounter one.

This patch makes that graceful, and also introduces an unsupported arch to
the test stub to make sure we always skip it. Note that we do not warn
because this happens once per instance in a periodic task, and since the
situation is caused by a (somewhat permanent) mismatch of libvirt and
nova version support, isn't something that needs to be remedied by an
operator.

Closes-Bug: #1820125
Change-Id: I5d95bd50279a6bf903a5793ad5f3ae9d06f085f4
2019-03-14 14:14:31 -07:00
melanie witt 9b2a7f9e7c Re-enable Ceph in live migration testing
Revert I05182d8fd0df5e8f3f9f4fb11feed074990cdb9f and
Add fix to enable proper OS detection.

Closes-Bug: #1819944

Co-Authored-By: Jens Harbott <j.harbott@x-ion.de>

Change-Id: Iea6288fe6d341ee92f87a35e0b0a59fe564ab96c
2019-03-14 18:48:55 +00:00
Matt Riedemann f9a6321c7b Customize irrelevant-files for nova-live-migration job
I noticed change Iea6288fe6d341ee92f87a35e0b0a59fe564ab96c
was not running the nova-live-migration job even though
it was making changes to nova/tests/live_migration/hooks/run_tests.sh.
The reason is the nova-live-migration job irrelevant-files were
excluding changes to nova/tests/*.

This copies the nova-grenade-live-migration irrelevant-files list
to the nova-live-migration job and defines it as a variable so it
can be re-used in the nova-grenade-live-migration job definition.

Change-Id: I753fda1a83b340f4699c049158e6744b099f55d8
2019-03-14 10:05:23 -04:00
Zuul 63e5cba88a Merge "Migrate legacy jobs to Ubuntu Bionic" 2019-03-14 09:04:56 +00:00
Zuul 5455277e3f Merge "Update compute rpc version alias for stein" 2019-03-14 08:43:59 +00:00