If there are more than one numbered request group in an allocation
candidate query then the group_policy paramtere is mandatory. Numbered
request groups can come from the flavor extra_spec as well as from
neutron port having resource request. This patch adds a warning if the
group_policy is missing to help the troubleshooting. Nova cannot default
the policy in this case as both isolat and None could be a valid policy
for SRIOV ports. Isolate would mean PF anti affinity for SRIOV ports
while None would allow colocation of the VFs.
blueprint bandwidth-resource-provider
Change-Id: I32d9704fe19bc85e06a613b6dffb99f00003315e
In change I9360fe29908, we added the 'numa_policy' field to the
'InstancePCIRequest' object. Unfortunately we did not update the
(de)serialization logic for the 'InstancePCIRequests' object, meaning
this field was never saved to the database. As a result, claiming will
always fail [1].
The resolution is simple - add the (de)serialization logic and tests to
prevent regression.
[1] https://github.com/openstack/nova/blob/18.0.0/nova/compute/resource_tracker.py#L214-L215
Change-Id: Id4d8ecb8fee46b21590ebcc62a2850030cef6508
Closes-Bug: #1805891
Add an early check to validate any PCI aliases in the requested flavor
during a resize. This should ensure the user gets a useful error
message.
blueprint: flavor-extra-spec-image-property-validation
Change-Id: I25454cd408e08589e5cfd6107dcbadd15bbb405f
Signed-off-by: Chris Friesen <chris.friesen@windriver.com>
Some of this code is pretty terrible, but its pre-existing terrible
and should be cleaned up outside of the privsep transition.
Change-Id: I3ace688d26a340dc44e34c7c5369463b9f98a230
Change I8666e0af3f057314f6b06939a108411b8a88d64b in Pike
refactored some code in the FilterScheduler which accidentally
changed how the list of weighed hosts are logged, which caused
the wrapped HostState objects to be logged rather than the
WeighedHost objects, which contain the actual "weight" attribute
which is useful for debugging weigher configuration and
scheduling decisions.
This fixes the regression by logging the weighed hosts before
stripping off the WeighedHost wrapper and adds a simple wrinkle
to an existing test to assert we are logging the correct object.
Change-Id: I528794b4b6f0007efc1238ad28dc402456664f86
Closes-Bug: #1816360
Use virt.images.convert_image instead of invoking 'qemu-img convert'
directly so that cache option can be set properly. i.e. cache=none
when O_DIRECT is supported by the filesystem.
The big advantage of this is that it will bypass the host page cache
so that converting large images won't evict guest data from the cache.
Refactor images.convert_image so that the compression option can be
controlled by the caller.
Change-Id: I4b7be98b5832ca8c580339fcfb7b9203264b5ff8
Signed-off-by: Jack Ding <jack.ding@windriver.com>
Signed-off-by: Chris Friesen <chris.friesen@windriver.com>
Validate the combination of the flavor extra-specs and image properties
as early as possible once they're both known (since you can specify
mutually-incompatible changes in the two places). If validation failed
then synchronously return error to user. We need to do this anywhere
the flavor or image changes, so basically instance creation, rebuild,
and resize.
- Rename _check_requested_image() to _validate_flavor_image() and add
a call from the resize code path. (It's already called for create
and rebuild.)
- In _validate_flavor_image() add new checks to validate numa related
options from flavor and image including CPU policy, CPU thread
policy, CPU topology, memory topology, hugepages, CPU pinning,
serial ports, realtime mask, etc.
- Added new 400 exceptions in Server API correspondent to added
validations.
blueprint: flavor-extra-spec-image-property-validation
Change-Id: I06fad233006c7bab14749a51ffa226c3801f951b
Signed-off-by: Jack Ding <jack.ding@windriver.com>
Signed-off-by: Chris Friesen <chris.friesen@windriver.com>
In the instance creation code path it's possible for the PciInvalidAlias
exception to be raised if the flavor extra-specs have an invalid PCI
alias. This should be converted to HTTPBadRequest along with the other
exceptions stemming from invalid extra-specs.
Without this, it gets reported as an HTTP 500 error.
Change-Id: Ia6921b5cd9253f65ff6904bdbce942759633de95
Closes-Bug: #1818701
Signed-off-by: Chris Friesen <chris.friesen@windriver.com>
A new API microversion, 2.72, is added that enables support for Neutron
ports having resource request during server create.
Note that server delete and port detach operations already handle such
ports and will clean up the allocation properly.
Change-Id: I7555914473e16782d8ba4513a09ce45679962c14
blueprint: bandwidth-resource-provider
A neutron port can be created with direct vnic type and also it can have
bandwidth resource request at the same time. In this case placement will
offer allocation candidates that could fulfill the bandwidth request, then
the nova scheduler's PciPassthroughFilter checks if a PCI device, a VF,
is available for such request. This check is based on the physnet
of the neutron port and the physical_network tag in the
pci/passthrough_whitelist config. It does not consider the actual PF
providing the bandwidth.
The currently unsupported case is when a single compute node has
whitelisted VFs from more than one PF which are connected to the same
physnet. These PFs can have totally different bandwidth inventories
in placement. For example PF2 can have plenty of bandwidth available
and PF3 has no bandwidth configured at all.
In this case the PciPassthroughFilter might accept the host simply
because PF3 still has available VFs even if the bandwidth from the port
is fulfilled from PF2 which in return might not have available VFs any
more.
Moreover the PCI claim has the same logic as the filter so it will claim
the VF from PF3 while the bandwidth was allocated from PF2 in placement.
This patch does not try to solve the issue in the PciPassthroughFilter but
it does solves the issue in the pci claim. This means that after
successful scheduling the pci claim can still fail if bandwidth is
allocated from one PF but a VF is not available from that specific PF
any more. This will lead to re-schedule.
Making the PciPassthroughFilter smart enough is complicated because:
* The filters are not knowing about placement allocation candidates at
all
* The filters are working per compute host not per allocation
candidates. If there are two allocation candidates for the same host
then nova will only try to filter for the first one. [1][2]
This patch applies the following logic:
The compute manager checks the InstancePCIRequest ovos in a given
boot request and maps each of them to the neutron port that requested
the PCI device. Then it maps the neutron port to the physical device
RP in the placement allocation made for this server. Then the spec in
the InstancePCIRequest is extended with the interface name of the PF
from where the bandwidth was allocated from based on the name of the
device RP. Then the PCI claim will enforce that the PF interface
name in the request matches the interface name of the PF from where
the VF is selected from. The PCI claim code knows about the PF
interface name of each available VF from the virt driver reporting the
'parent_ifname' key as part of the return value of the
get_available_resource() driver call.
The PCI claim process is not changed as it already enforces that
every fields from the request matches with the fields of the selected
device pool.
The current patch extends the libvirt driver to provider PF interface
name information. Besides the libvirt driver the xenapi driver also
support SRIOV VF handling but this patch does not extend the xenapi
driver. So for the xenapi the above described configuration currently
kept unsupported.
I know that this feels complicated but it is necessary becase VFs has
not been counted as resources in placement yet.
[1] https://github.com/openstack/nova/blob/f6996903d2ef0fdb40135b506c83ed6517b28e19/nova/scheduler/filter_scheduler.py#L239
[2] https://github.com/openstack/nova/blob/f6996903d2ef0fdb40135b506c83ed6517b28e19/nova/scheduler/filter_scheduler.py#L426
blueprint: bandwidth-resource-provider
Change-Id: I038867c4094d79ae4a20615ab9c9f9e38fcc2e0a
Extend the associated LibvirtConfigCapsGuest class so that it can
parse and store <emulator> elements in a way which can be reused in
subsequent commits by callers of virConnectGetDomainCapabilities().
Ongoing / future work such as SEV[0] and the move of the default machine
type to q35[1] will need to invoke virConnectGetDomainCapabilities()
in order to check whether SEV or q35 respectively are supported.
However this API requires 5 parameters[2]:
1) the path to the emulator binary (e.g. qemu-system-x86_64)
2) the domain architecture
3) the machine type
4) the virtualization type
5) flags (not used yet)
4) is determined by CONF.libvirt.virt_type. The caller of
virConnectGetDomainCapabilities() can decide which combinations of 2)
and 3) type to pass[3], but still needs to know 1). Once 2) and 3)
are known, this can be determined from the <emulator> elements
returned by libvirt's virConnectGetCapabilities() API[4]. nova
already calls this and parses the response into a LibvirtConfigCaps
object. However currently the parser ignores the <emulator> elements.
This patch fixes that.
[0] https://specs.openstack.org/openstack/nova-specs/specs/stein/approved/amd-sev-libvirt-support.html
[1] https://blueprints.launchpad.net/nova/+spec/handle-default-machine-type-as-q35
[2] https://libvirt.org/html/libvirt-libvirt-domain.html#virConnectGetDomainCapabilities
[3] Calling virConnectGetDomainCapabilities for every machine type is
most likely overkill:
https://bugzilla.redhat.com/show_bug.cgi?id=1683471#c7
[4] https://libvirt.org/formatcaps.html#elementGuest
blueprint: amd-sev-libvirt-support
blueprint: handle-default-machine-type-as-q35
Change-Id: Ibdc88bc5e0a214bff36a8351a1b76980f7015d16