Added reference documentation and release note to explain how filtering
of hosts by isolate aggregates works.
Change-Id: I8d8086973039308f9041a36463a834b5275708e3
Implements: blueprint placement-req-filter-forbidden-aggregates
This is a follow-up to the previous SEV commit which enables booting
SEV guests (I659cb77f12a3), making some minor improvements based on
nits highlighted during review:
- Clarify in the hypervisor-kvm.rst documentation that the
num_memory_encrypted_guests option is optional, by rewording and
moving it to the list of optional steps.
- Make things a bit more concise and avoid duplication of information
between the above page and the documentation for the option
num_memory_encrypted_guests, instead relying on appropriate
hyperlinking between them.
- Clarify that virtio-blk can be used for boot disks in newer kernels.
- Hyperlink to a page explaining vhost-user
- Remove an unneeded mocking of a LOG object.
- A few other grammar / spelling tweaks.
blueprint: amd-sev-libvirt-support
Change-Id: I75b7ec3a45cac25f6ebf77c6ed013de86c6ac947
Track compute node inventory for the new MEM_ENCRYPTION_CONTEXT
resource class (added in os-resource-classes 0.4.0) which represents
the number of guests a compute node can host concurrently with memory
encrypted at the hardware level.
This serves as a "master switch" for enabling SEV functionality, since
all the code which takes advantage of the presence of this inventory
in order to boot SEV-enabled guests is already in place, but none of
it gets used until the inventory is non-zero.
A discrete inventory is required because on AMD SEV-capable hardware,
the memory controller has a fixed number of slots for holding
encryption keys, one per guest. Typical early hardware only has 15
slots, thereby limiting the number of SEV guests which can be run
concurrently to 15. nova needs to track how many slots are available
and used in order to avoid attempting to exceed that limit in the
hardware.
Work is in progress to allow QEMU and libvirt to expose the number of
slots available on SEV hardware; however until this is finished and
released, it will not be possible for nova to programatically detect
the correct value with which to populate the MEM_ENCRYPTION_CONTEXT
inventory. So as a stop-gap, populate the inventory using the value
manually provided by the cloud operator in a new configuration option
CONF.libvirt.num_memory_encrypted_guests.
Since this commit effectively enables SEV, also add all the relevant
documentation as planned in the AMD SEV spec[0]:
- Add operation.boot-encrypted-vm to the KVM hypervisor feature matrix.
- Update the KVM section of the Configuration Guide.
- Update the flavors section of the User Guide.
- Add a release note.
[0] http://specs.openstack.org/openstack/nova-specs/specs/train/approved/amd-sev-libvirt-support.html#documentation-impact
blueprint: amd-sev-libvirt-support
Change-Id: I659cb77f12a38a4d2fb118530ebb9de88d2ed30d
After three months since the quality warning change merged [1]
there has still been no progress in finding a maintainer for
the xenapi driver or someone to get the third party CI running
again - which has been off/broken for more than a release.
This change formally deprecates the driver, logging a warning
on startup along with providing a release note and warnings
in the docs and [xenserver] config group help.
Note that this does not mean the driver will absolutely be
removed in the Ussuri release, but it leaves the option open
to do so if the nova team decides that should happen.
This was discussed at the 2019-09-05 nova meeting [2] and
also at the Train PTG.
[1] I7f8eb7d5c5a9b1cb0a8d5e607d719b49a22675d3
[2] http://eavesdrop.openstack.org/meetings/nova/2019/nova.2019-09-05-14.01.log.html#l-227
Change-Id: Ie7e66ff64185d9fd4be2265e040e1afc45b95174
Rename the exist config attribute: [libvirt]/cpu_model to
[libvirt]/cpu_models, which is an orderded list of CPU models the host
supports. The value in the list can be made case-insensitive.
Change logic of method: '_get_guest_cpu_model_config', if cpu_mode is
custom and cpu_models set. It will parse the required traits
associated with the CPU flags from flavor extra_specs and select the
most appropriate CPU model.
Add new method 'get_cpu_model_names' to host.py. It will return a list
of the cpu models that the CPU arch can support.
Update the docs of hypervisor-kvm.
Change-Id: I06e1f7429c056c4ce8506b10359762e457dbb2a0
Implements: blueprint cpu-model-selection
The url option was deprecated in Queens:
I41724a612a5f3eabd504f3eaa9d2f9d141ca3f69
The same functionality is available in the
endpoint_override option so tests and docs
are updated to use that where they were using
url before.
Note that because the logic in the get_client
method changed, some small changes were made to
the test_withtoken and test_withtoken_context_is_admin
unit tests to differentiate from when there is a
context with a token that is not admin and an
admin context that does not have a token which
was otherwise determined by asserting the default
region name.
Change-Id: I6c068a84c4c0bd88f088f9328d7897bfc1f843f1
Change Ifaf596a8572637f843f47daf5adce394b0365676 added a note
about the behavior change in Ocata where allocation ratios
set on host aggregates was ignored because of placement resource
provider allocation ratios being used.
Later, change I7d8e822cd40dccaf5244e2cd95fa1af43fa9ed87 added
a lot more detail about allocation ratios in the scheduler docs
including the initial* allocation ratio config options. The note
from the previous change was moved and as a result leads to some
confusion since the doc starts by saying essentially, "you can
use these aggregate filters to manage allocation ratios on a set
of hosts" and then the immediate note says essentially, "oh btw
that doesn't work since ocata, sorry!".
To avoid the confusion, this simply removes the part about how
the aggregate filters can be used to manage allocation ratios.
Change-Id: I62710b0b8c098cca3f67020f4a6da5e684115414
Related-Bug: #1804125
libvirt has split the CPU feature flags file 'cpu_map.xml' into
a bunch of flag files for each CPU model, which are stored under
directory 'src/cpu_map/'.
Update this change accordingly.
Change-Id: Id45587adb6ecd8e0bdef344c90979eaea61e61b8
With the removal of the Core, Ram and Disk filters in change
I8a0d332877fbb9794700081e7954f2501b7e7c09, there's now only a single
caller for the 'estimate_instance_overhead' function. This call results
in the 'memory_mb_used', 'local_gb_used', 'vcpus_used', 'free_ram_mb'
and 'free_disk_gb' fields of a compute nodes 'ComputeNode' object being
modified when calculating usage as part of the resource tracker to
include driver-specific overhead. However, these fields are no longer
used for for anything except logging and the 'os-hypervisors' API. Since
overhead is not reflected in placement (and therefore the scheduler),
reporting them as part of the various usage values for both logging and
that API is actually a bit of a lie and is likely to cause confusion
among users. Remove the whole thing and make our logs and the
'os-hypervisors' API better match what's stored in placement.
Change-Id: I033e8269194de54432079cbc970431e3dcea7ce5
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
These were deprecated during Stein [1] and can now be removed, lest they
cause hassle with the PCPU work. As noted in [1], the aggregate
equivalents of same are left untouched for now.
[1] https://review.opendev.org/#/c/596502/
Change-Id: I8a0d332877fbb9794700081e7954f2501b7e7c09
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
The api documentation is now published on docs.openstack.org instead
of developer.openstack.org. Update all links that are changed to the
new location.
Note that Neutron publishes to api-ref/network, not networking anymore.
Note that redirects will be set up as well but let's point now to the
new location.
For details, see:
http://lists.openstack.org/pipermail/openstack-discuss/2019-July/007828.html
Change-Id: Id2cf3aa252df6db46575b5988e4937ecfc6792bb
This adds a new mandatory placement request pre-filter
which is used to exclude compute node resource providers
with the COMPUTE_STATUS_DISABLED trait. The trait is
managed by the nova-compute service when the service's
disabled status changes.
Change I3005b46221ac3c0e559e1072131a7e4846c9867c makes
the compute service sync the trait during the
update_available_resource flow (either on start of the
compute service or during the periodic task run).
Change Ifabbb543aab62b917394eefe48126231df7cd503 makes
the libvirt driver's _set_host_enabled callback reflect
the trait when the hypervisor goes up or down out of band.
Change If32bca070185937ef83f689b7163d965a89ec10a will add
the final piece which is the os-services API calling the
compute service to add/remove the trait when a compute
service is disabled or enabled.
Since this series technically functions without the API
change, the docs and release note are added here.
Part of blueprint pre-filter-disabled-computes
Change-Id: I317cabbe49a337848325f96df79d478fd65811d9
Since blueprint return-alternate-hosts in Queens, the scheduler
returns a primary selected host and some alternate hosts based
on the max_attempts config option. The only reschedules we have
are during server create and resize/cold migrate. The list of
alternative hosts are passed down from conductor through compute
and back to conductor on reschedule and if conductor gets a list
of alternate hosts on reschedule it will not call the scheduler
again. This means the RetryFilter is effectively useless now since
it shouldn't ever filter out hosts on the first schedule attempt
and because we're using alternates for reschedules, we shouldn't
go back to the scheduler on a reschedule. As a result this change
deprecates the RetryFilter and removes it from the default list
of enabled filters.
Change-Id: Ic0a03e89903bf925638fa26cca3dac7db710dca3
This fixes a couple of places in the admin scheduler config
docs that were listing out the enabled_filters default value
incorrectly because the ComputeFilter was missing. Rather than
try to keep the docs mirrored with the actual default value,
this change references the config option in one spot and avoids
listing the defaults in another.
Change-Id: I837aefcd37556a7b66b523529c5ca1f3dee8ac7f
Closes-Bug: #1833120
We're going to remove all the code, but first, remove the docs.
Part of blueprint remove-consoleauth
Change-Id: Ie96e18ea7762b93b4116b35d7ebcfcbe53c55527
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
This adds the missing "prefilter" stage to the description of the
scheduling process, and adds information about the image type
filter.
Related to blueprint request-filter-image-types
Change-Id: I07eef048cf2c85a3fdb8adbe38e362878e4e177e
This was put together while working on the mechanism for converting
driver capabilities to traits in I15364d37fb7426f4eec00ca4eaf99bec50e964b6:
https://review.openstack.org/538498
and may help other developers working on this area in the future.
Change-Id: I395e386ee713769d4c105be0dd6e821382945866
There was nothing clearly documented about move operations with
respect to AZ restrictions, i.e. when a server can move between
zones or not, and how forcing a target host during evacuate or
live migration can break tracking of the instance for it's orginally
intended zone. This adds documentation around that topic.
Change-Id: I7466826780ea8a6b3d3df93f0e85f009a437b743
Closes-Bug: #1823043
This commit adds a paragaph to explain the circumstances in which
disk_available_least will have a negative value, and why this behaviour is
preferred.
Change-Id: Iaa33c35a14a6f0dc8b1d11803a885dce26722e52
People hit problems using the JsonFilter from time to time
and at least I always have to re-learn what it does and am
somewhat horrified to find how flexible it is based on using
HostState attributes as filtering variables, not to mention
we don't do any functional testing with it. The docs are also
misleading in stating it only supports a subset of variables
when it's really anything on the HostState object. A common
case is people filtering on the hypervisor_hostname attribute
to schedule directly to a specific baremetal node with ironic.
This change adds a warning recommending to not use the filter
if possible and find alternatives, like traits. It also mentions
the HostState object as defining the variables that can be used
along with adding the commonly-used hypervisor_hostname variable
to the list.
Change-Id: Ib2b1395715b6bdb25f53ee7c68df44e2d84b895b
Related-Bug: #1821764
The API reference and part of the scheduler filter docs for
the JsonFilter query hint are using invalid json strings
in the examples.
This fixes both invalid locations using the same json string
used in the openstack server create command example in the
scheduler admin docs.
Change-Id: Iaab8608c7ffa6fbbea40a838dd02d8096f632f7a
Closes-Bug: #1821764
Change I15364d37fb7426f4eec00ca4eaf99bec50e964b6 added the
ability for the compute service to report a subset of driver
capabilities as standard COMPUTE_* traits on the compute node
resource provider.
This adds administrator documentation to the scheduler docs
about the feature and how it could be used with flavors. There
are also some rules and semantic behavior around how these traits
work so that is also documented.
Note that for cases #3 and #4 in the "Rules" section the
update_available_resource periodic task in the compute service
may add the compute-owned traits again automatically but it
depends on the [compute]/resource_provider_association_refresh
configuration option, which if set to 0 will disable that auto
refresh and a restart or SIGHUP is required. To avoid confusion
in these docs, I have opted to omit the mention of that option
and just document the action that will work regardless of
configuration which is to restart or SIGHUP the compute service.
Change-Id: Iaeec92e0b25956b0d95754ce85c68c2d82c4a7f1
Remove wrong description for auto resize confirm
in the API guide.
Move a description of a configuration option
'resize_confirm_window' from the API guide
to the admin configuration guide.
Add a description of automatic resize confirm
in the user guide.
Change-Id: If739877422d5743e221c57be53ed877475db0647
Closes-Bug: #1816859
As discussed in the mailing list [1] since cells v1
has been deprecated since Pike and the biggest user
of it (CERN as far as we know) moved to cells v2
in Queens, we can start rolling back the cells v1
specific documentation to avoid confusing people
new to nova about what cells is and making them
understand there was an optional v1.
There are still a few mentions of cells v1 left in
here for things like adding a new cell which need
to be re-written and for that I've left a todo.
Users can still get at cells v1 specific docs from
published stable branches and/or rebuilding the
docs from before this change.
[1] http://lists.openstack.org/pipermail/openstack-discuss/2019-February/002569.html
Change-Id: Idaa04a88b6883254cad9a8c6665e1c63a67e88d3
Convert ``option`` to the shiny :oslo.config:option:`section.option`
format in admin/configuration/hypervisore-kvm.
Recognizing this could be done to a lot more files; I just happened to
be looking at this one today.
Change-Id: If1b02ce99152ffd00d4f461dc4539606db1bb13b
This spec proposes to add ability to allow users to use
``Aggregate``'s ``metadata`` to override the global config options
for weights to achieve more fine-grained control over resource
weights.
blueprint: per-aggregate-scheduling-weight
Change-Id: I6e15c6507d037ffe263a460441858ed454b02504
This resolves the TODO from Ocata change:
I8871b628f0ab892830ceeede68db16948cb293c8
By adding a min=0.0 value to the soft affinity
weight multiplier configuration options.
It also removes the deprecated [DEFAULT] group
alias from Ocata change:
I3f48e52815e80c99612bcd10cb53331a8c995fc3
Change-Id: I79e191010adbc0ec0ed02c9d589106debbf90ea8
This adds a new section to the admin scheduler configuration
docs devoted to allocation ratios to call out the differences
between the override config options and the initial ratio
options, and how they interplay with the resource provider
inventory allocation ratio override that can be performed
via the placement REST API directly.
This moves the note about bug 1804125 into the new section
and also links to the docs from the initial allocation ratio
config option help text.
Part of blueprint initial-allocation-ratios
Related-Bug: #1804125
Change-Id: I7d8e822cd40dccaf5244e2cd95fa1af43fa9ed87
This borrows from the release note in change
I01f20f275bbd5451ace5c1e6f41ab38d488dae4e to document the
regression, introduced in Ocata, where allocation ratio settings
in the aggregate core/ram/disk filters are not honored because
of placement being used by the FilterScheduler.
While there is related work going on around this in
blueprint initial-allocation-ratios and
blueprint placement-aggregate-allocation-ratios, it is still
a limitation in the current code base and needs to be called
out in the docs.
Change-Id: Ifaf596a8572637f843f47daf5adce394b0365676
Related-Bug: #1804125
This changes does two things to the admin scheduler configuration
docs:
1. Notes the limitation from bug 1802111 for the older
AggregateMultiTenancyIsolation filter and mentions that
starting in Rocky, using tenant isolation with placement
is better.
2. Notes that when isolating tenants via placement, the metadata
key "filter_tenant_id" can be suffixed to overcome the limitation
in bug 1802111.
Change-Id: I792c5df01b7cbc46c8363e261bc7422b09180e56
Closes-Bug: #1802111