This is a guide I used to help me throughout the cycle and I'm
proposing it to the docs, in case it might help someone.
Change-Id: I4f7600c908bf90395515f690b8eee0f9e7b0c9b0
Add file to the reno documentation build to show release notes for
stable/stein.
Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/stein.
Change-Id: Id69aa796d126ab4687e8f2c3c01c3071b36e9288
Sem-Ver: feature
nova.policies.base.py define a var 'COMPUTE_API' which
is not used anymore.
This was only used in nova/api/openstack/compute/extension_info.py
which has been removed in- I61d8063708731133177534888ba7f5f05a6bd901
Change-Id: I80abee523e58d47da9e55399b3d4ad06b6ccfd69
The QoS minimum bandwidth feature will have a separate doc from the
generic QoS neutron doc. This patch updates the links in the release
notes and api version history of the 2.72 microversion
blueprint: bandwidth-resource-provider
Depends-On: https://review.openstack.org/#/c/640390
Change-Id: Ic753112cf73cb10a6e377bc24c6ee51a057c69f8
The test_local_delete_removes_allocations_after_compute_restart test
was trying to register placement config opts 3 times when only once
is necessary, and if there are CLI opts being registered, only once is
allowed. With change I4cd3d637878eb5bb798b78fd73f5be99e141da9d in
placement, those opts gained some CLI opts, causing this test to
fail.
The depends-on is to a change in the placement-side PlacementFixture
to make it possible to not register opts when calling the fixture,
allowing the safe reuse of the already registered config.
Depends-On: I360a306b5d05ada75274733038b73ec2f2bdc4d4
Change-Id: I042e41ac8c41c0e5f0389904eb548e0e97d54c60
Closes-Bug: #1821092
Nova will leak minimum bandwidth resources in placement if a user
deletes a bound port from Neutron out-of-band. This adds a note about
how users can work around the issue.
Related-Bug: #1820588
Change-Id: I41f42c1a7595d9e6a73d1261bf1ac1d47ddadcdf
Some random cleanups:
- Don't add the root or 'doc/source' directories to PYTHONPATH - it's
unnecessary since we install nova (ruling out the first) and don't
import anything from the latter
- Fix weird indentation
- Remove 'sphinx.ext.coverage', which is used to measure API doc
coverage. This is unnecessary since we don't publish API docs, save
for the versioned notification docs
- Remove unnecessary settings
- 'exclude_patterns' referred to directories that haven't existed for
a long time
- 'source_suffix', 'add_module_names' and 'show_authors' were set to
the default value
- 'release', 'version' and 'html_last_updated_fmt' are all set
automatically by 'openstackdoctheme' now
- 'modindex_common_prefix' is useless since we don't expose a module
index
All rolled into one patch for efficiencies sake.
Change-Id: I0f70c6d71299dedc59884f2bb39c8ea3c2ca8eff
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
The TODO at the end of the method was based on some old
nova-compute behavior which was removed with change
I39d93dbf8552605e34b9f146e3613e6af62a1774 in Rocky.
Instead of logging a warning, ConsumerAllocationRetrievalFailed
is now raised since without the instance consumer allocations
on the source node during a forced evacuate we cannot proceed
with making those allocations on the destination host.
The method is refactored a bit for clarity while in here and
to drop the big nesting.
Remember that this method is *only* called in the case of a
live migration or evacuate operation with a forced target host
and evacuate is the only case where source_allocations are not
provided.
Change-Id: I988d1bd4d7eb1a01d443e3d93964bd09afcc4929
The libvirt driver contains some code to calculate the
default machine type given an architecture, by looking it up
in CONF.libvirt.hw_machine_type.
This code will need to be reused when introducing calls to
libvirt's getDomainCapabilities() API, which requires the
machine type as one of the parameters. However those calls
will need to be made from nova.virt.libvirt.host.Host which
has no access to the driver, so move the machine type
calculation code into nova.virt.libvirt.utils so that it can
be reused by both classes.
Also add some unit tests, and warn when an invalid config
value is used.
blueprint: amd-sev-libvirt-support
Change-Id: I055918ff16766c5b106d794a111ad8af8ff9ab23
Change I15364d37fb7426f4eec00ca4eaf99bec50e964b6 added the
ability for the compute service to report a subset of driver
capabilities as standard COMPUTE_* traits on the compute node
resource provider.
This adds administrator documentation to the scheduler docs
about the feature and how it could be used with flavors. There
are also some rules and semantic behavior around how these traits
work so that is also documented.
Note that for cases #3 and #4 in the "Rules" section the
update_available_resource periodic task in the compute service
may add the compute-owned traits again automatically but it
depends on the [compute]/resource_provider_association_refresh
configuration option, which if set to 0 will disable that auto
refresh and a restart or SIGHUP is required. To avoid confusion
in these docs, I have opted to omit the mention of that option
and just document the action that will work regardless of
configuration which is to restart or SIGHUP the compute service.
Change-Id: Iaeec92e0b25956b0d95754ce85c68c2d82c4a7f1
This removes some old comments about how things worked
prior to Pike where the ResourceTracker would "heal"
allocations on the compute node during the periodic
update_available_resource task. That code was removed
via change I39d93dbf8552605e34b9f146e3613e6af62a1774 (Rocky)
and change If272365e58a583e2831a15a5c2abad2d77921729 (Stein).
There are also old comments in here about not using claim_resources
during forced live migrate and evacuate and instead call the
scheduler with a hint to skip filters, but that hasn't happened yet
so the "In Queens" portion is just removed since it's misleading
(and somewhat embarrassing at this point that it's not fixed yet).
Finally, the NOTE above claim_resources() which says it's only called
in two places is updated since it's actually called in three places
since it's also used in the forced evacuate destination case.
A follow up change will deal with the TODO about erroring out in the
final else case in claim_resources_on_destination().
Change-Id: I1822ac911ed8dbbb482fd4c13591d86b0c518321
This commit updates the list of issues with policy enforcement and
describe some of the benefits for operators and developers if we fix
these issues.
Change-Id: Ie5ba2375fd32611aca360765af01c1ba6432b45e
This is removing additional details that were originally reviewed in:
I263b2f72037a588623958baccacf78fb6a6be05d
The policy and docs in code work that nova completed in Newton.
Change-Id: I66105fa90036db50249b62fc34442b667a5ee1db
I15364d37fb7426f4eec00ca4eaf99bec50e964b6 introduced a new
_get_traits() method in ResourceTracker for getting traits from a
provider tree and then merging with capability traits from the driver.
However it included a default of None for the provider_tree parameter
which was a remnant of earlier iterations of that change. Since this
method is always passed a ProviderTree, remove the superfluous default
of None.
Change-Id: I1868485912d9a8a330bde50836808accf04c728d
tl;dr: Use 'writeback' instead of 'writethrough' as the cache mode of
the target image for `qemu-img convert`. Two reasons: (a) if the image
conversion completes succesfully, then 'writeback' calls fsync() to
safely write data to the physical disk; and (b) 'writeback' makes the
image conversion a _lot_ faster.
Back-of-the-envelope "benchmark" (on an SSD)
--------------------------------------------
(Ran both the tests thrice each; version: qemu-img-2.11.0)
With 'writethrough':
$> time (qemu-img convert -t writethrough -f qcow2 -O raw \
Fedora-Cloud-Base-29.qcow2 Fedora-Cloud-Base-29.raw)
real 1m43.470s
user 0m8.310s
sys 0m3.661s
With 'writeback':
$> time (qemu-img convert -t writeback -f qcow2 -O raw \
Fedora-Cloud-Base-29.qcow2 5-Fedora-Cloud-Base-29.raw)
real 0m7.390s
user 0m5.179s
sys 0m1.780s
I.e. ~103 seconds of elapsed wall-clock time for 'writethrough' vs. ~7
seconds for 'writeback' -- IOW, 'writeback' is nearly _15_ times faster!
Details
-------
Nova commit e6ce9557f8 ("qemu-img do not
use cache=none if no O_DIRECT support") was introduced to make instances
boot on filesystems that don't support 'O_DIRECT' (which bypasses the
host page cache and flushes data directly to the disk), such as 'tmpfs'.
In doing so it introduced the 'writethrough' cache for the target image
for `qemu-img convert`.
This patch proposes to change that to 'writeback'.
Let's addresses the 'safety' concern:
"What about data integrity in the event of a host crash (especially
on shared file systems such as NFS)?"
Answer: If the host crashes mid-way during image conversion, then
neither "data integrity" nor the cache mode in use matters. But if the
image conversion completes _succesfully_, then 'writeback' will safely
write the data to the physical disk, just as 'writethough' does.
So we are as safe as we can, but with the extra benefit of image
conversion being _much_ faster.
* * *
The `qemu-img convert` command defaults to 'cache=writeback' for the
source image. And 'cache=unsafe' for the target, because if `qemu-img`
"crashes during the conversion, the user will throw away the broken
output file anyway and start over"[1]. And `qemu-img convert`
supports[2] fsync() for the target image since QEMU 1.1 (2012).
[1] https://git.qemu.org/?p=qemu.git;a=commitdiff;h=1bd8e175
-- "qemu-img convert: Use cache=unsafe for output image"
[2] https://git.qemu.org/?p=qemu.git;a=commitdiff;h=80ccf93b
-- "qemu-img: let 'qemu-img convert' flush data"
Closes-Bug: #1818847
Change-Id: I574be2b629aaff23556e25f8db0d740105be6f07
Signed-off-by: Kashyap Chamarthy <kchamart@redhat.com>
Looks-good-to-me'd-by: Kevin Wolf <kwolf@redhat.com>
The policy-enforcement document was written prior to any of the
policy-in-code or policy documentation efforts took place. This
commit updates the developer reference for policy to remove these
details since they have already been implemented.
Subsequent patches will update details of this document by taking into
account the recent keystone and oslo changes that help fix the
original issues described in this document.
Change-Id: I263b2f72037a588623958baccacf78fb6a6be05d
Blueprint detach-service-from-computenode in Kilo decoupled the
compute node and services concepts so this section is no longer
relevant and can be removed from the doc - it's no longer evolving.
Change-Id: Ibba2aa83b0afe2be05415b69a1ff8ae86866b860
Related-Bug: #1820283
Since I901184cb1a4b6eb0d6fa6363bc6ffbcaa0c9d21d in Kilo the
aggregates information about a HostState object (which is a
wrapper over a ComputeNode) is cached in the scheduler, so the
comments in the scheduler evolution doc about not accessing the
aggregates table in the DB from filters/weighers and such is
extremely out of date and should just be removed.
Change-Id: Ibcbad227813d3b37b4e314eddbf3bae6e85652ea
Related-Bug: #1820283