Commit Graph

53640 Commits

Author SHA1 Message Date
Zuul eaa29f71ef Merge "Test live migration with config drive" 2019-03-06 20:06:16 +00:00
Zuul 7480eaf61b Merge "Warn if group_policy is missing from flavor" 2019-03-06 18:46:00 +00:00
Zuul 64b4f41b24 Merge "objects: Store InstancePCIRequest.numa_policy in DB" 2019-03-06 16:59:42 +00:00
Zuul 38c96993fd Merge "Support server create with ports having resource request" 2019-03-06 16:59:35 +00:00
Zuul dfaa513fe1 Merge "Further de-dupe os-vif VIF tests" 2019-03-06 13:45:20 +00:00
Zuul 4fa8e44905 Merge "Validate bandwidth configuration for other VIF types" 2019-03-06 13:45:11 +00:00
Balazs Gibizer 9e31f769fc Warn if group_policy is missing from flavor
If there are more than one numbered request group in an allocation
candidate query then the group_policy paramtere is mandatory. Numbered
request groups can come from the flavor extra_spec as well as from
neutron port having resource request. This patch adds a warning if the
group_policy is missing to help the troubleshooting. Nova cannot default
the policy in this case as both isolat and None could be a valid policy
for SRIOV ports. Isolate would mean PF anti affinity for SRIOV ports
while None would allow colocation of the VFs.

blueprint bandwidth-resource-provider

Change-Id: I32d9704fe19bc85e06a613b6dffb99f00003315e
2019-03-06 13:15:50 +00:00
Zuul a6a70a8ddc Merge "Fix WeighedHost logging regression" 2019-03-06 12:25:50 +00:00
Stephen Finucane 59d9463351 objects: Store InstancePCIRequest.numa_policy in DB
In change I9360fe29908, we added the 'numa_policy' field to the
'InstancePCIRequest' object. Unfortunately we did not update the
(de)serialization logic for the 'InstancePCIRequests' object, meaning
this field was never saved to the database. As a result, claiming will
always fail [1].

The resolution is simple - add the (de)serialization logic and tests to
prevent regression.

[1] https://github.com/openstack/nova/blob/18.0.0/nova/compute/resource_tracker.py#L214-L215

Change-Id: Id4d8ecb8fee46b21590ebcc62a2850030cef6508
Closes-Bug: #1805891
2019-03-06 11:02:02 +00:00
Zuul dc1ef24fac Merge "Move additional IP address management to privsep." 2019-03-06 10:21:21 +00:00
Zuul 6304bcf781 Merge "Validate PCI aliases early in resize" 2019-03-06 08:42:39 +00:00
Zuul 14c4c8040c Merge "Flavor extra spec and image properties validation from API" 2019-03-06 06:23:49 +00:00
Zuul 590b8b5512 Merge "Move route management to privsep." 2019-03-06 05:00:31 +00:00
Zuul 7db40a86fe Merge "Convert additional IP management calls to privsep." 2019-03-06 05:00:23 +00:00
Zuul 9550f4a7aa Merge "Move DHCP releasing to privsep." 2019-03-06 05:00:15 +00:00
Zuul 5c0b10102b Merge "Move set_vf_interface_vlan to be with its only caller." 2019-03-06 04:59:41 +00:00
Zuul 0b98660473 Merge "Check hosts have no instances for AZ rename" 2019-03-06 04:25:49 +00:00
Zuul 931c75a454 Merge "Improve libvirt image and snapshot handling" 2019-03-06 01:44:18 +00:00
Zuul d2dc7549ce Merge "Handle missing exception in instance creation code" 2019-03-06 00:08:22 +00:00
Zuul 6c2bcdea4d Merge "api-ref: typo service.disable_reason" 2019-03-06 00:08:11 +00:00
Zuul 25a4b1dd04 Merge "Ensure that bandwidth and VF are from the same PF" 2019-03-06 00:08:02 +00:00
Zuul 83195c3909 Merge "Fix resetting non-persistent fields when saving obj" 2019-03-05 23:57:47 +00:00
Zuul b3488bfbf7 Merge "Correct instance port binding for rebuilds" 2019-03-05 23:57:31 +00:00
Zuul a0e0faea35 Merge "Parse <emulator> elements from virConnectGetCapabilities()" 2019-03-05 23:56:25 +00:00
Chris Friesen fb908e154d Validate PCI aliases early in resize
Add an early check to validate any PCI aliases in the requested flavor
during a resize.  This should ensure the user gets a useful error
message.

blueprint: flavor-extra-spec-image-property-validation
Change-Id: I25454cd408e08589e5cfd6107dcbadd15bbb405f
Signed-off-by: Chris Friesen <chris.friesen@windriver.com>
2019-03-05 22:39:24 +00:00
Michael Still 99ad674c26 Move additional IP address management to privsep.
Change-Id: I8c47fe14773fc97b53141ce4145e7d60545caee3
2019-03-05 22:18:09 +00:00
Michael Still 144205fe1b Move route management to privsep.
Some of this code is pretty terrible, but its pre-existing terrible
and should be cleaned up outside of the privsep transition.

Change-Id: I3ace688d26a340dc44e34c7c5369463b9f98a230
2019-03-05 22:18:08 +00:00
Michael Still 41ada6be35 Convert additional IP management calls to privsep.
Change-Id: I46d610cf2b6c328a3b6430a51477dcb4e33e6134
2019-03-05 22:18:08 +00:00
Michael Still 6b72eb204b Move DHCP releasing to privsep.
Change-Id: Id0403f102b7af445bffb7182b748bdb958429256
2019-03-05 22:18:08 +00:00
Michael Still 4e26f70934 Move set_vf_interface_vlan to be with its only caller.
And then remove the now empty linux_utils module from nova.network.

Change-Id: I4a64c212f1e37586db06972e5811a0fab87bab9d
2019-03-05 22:18:08 +00:00
Matt Riedemann 84533f5eb3 Fix WeighedHost logging regression
Change I8666e0af3f057314f6b06939a108411b8a88d64b in Pike
refactored some code in the FilterScheduler which accidentally
changed how the list of weighed hosts are logged, which caused
the wrapped HostState objects to be logged rather than the
WeighedHost objects, which contain the actual "weight" attribute
which is useful for debugging weigher configuration and
scheduling decisions.

This fixes the regression by logging the weighed hosts before
stripping off the WeighedHost wrapper and adds a simple wrinkle
to an existing test to assert we are logging the correct object.

Change-Id: I528794b4b6f0007efc1238ad28dc402456664f86
Closes-Bug: #1816360
2019-03-05 17:16:23 -05:00
Zuul 8cdb8cc7c5 Merge "Fix an error when generating a host ID" 2019-03-05 20:42:40 +00:00
Zuul 0b3809fec0 Merge "FUP: docs nit" 2019-03-05 18:28:10 +00:00
Zuul 548d435b88 Merge "Add functional test for libvirt vgpu reshape" 2019-03-05 18:28:01 +00:00
Zuul 56d4d12ac1 Merge "libvirt: implement reshaper for vgpu" 2019-03-05 18:27:53 +00:00
Jack Ding 56541244fd Improve libvirt image and snapshot handling
Use virt.images.convert_image instead of invoking 'qemu-img convert'
directly so that cache option can be set properly. i.e. cache=none
when O_DIRECT is supported by the filesystem.

The big advantage of this is that it will bypass the host page cache
so that converting large images won't evict guest data from the cache.

Refactor images.convert_image so that the compression option can be
controlled by the caller.

Change-Id: I4b7be98b5832ca8c580339fcfb7b9203264b5ff8
Signed-off-by: Jack Ding <jack.ding@windriver.com>
Signed-off-by: Chris Friesen <chris.friesen@windriver.com>
2019-03-05 12:18:34 -06:00
Chris Friesen 5e7b840e48 Flavor extra spec and image properties validation from API
Validate the combination of the flavor extra-specs and image properties
as early as possible once they're both known (since you can specify
mutually-incompatible changes in the two places). If validation failed
then synchronously return error to user. We need to do this anywhere
the flavor or image changes, so basically instance creation, rebuild,
and resize.

- Rename _check_requested_image() to _validate_flavor_image() and add
  a call from the resize code path.  (It's already called for create
  and rebuild.)
- In _validate_flavor_image() add new checks to validate numa related
  options from flavor and image including CPU policy, CPU thread
  policy, CPU topology, memory topology, hugepages, CPU pinning,
  serial ports, realtime mask, etc.
- Added new 400 exceptions in Server API correspondent to added
  validations.

blueprint: flavor-extra-spec-image-property-validation
Change-Id: I06fad233006c7bab14749a51ffa226c3801f951b
Signed-off-by: Jack Ding <jack.ding@windriver.com>
Signed-off-by: Chris Friesen <chris.friesen@windriver.com>
2019-03-05 12:05:06 -06:00
Zuul 8ce34fb958 Merge "Stub out port binding create/delete in NeutronFixture" 2019-03-05 17:50:15 +00:00
Chris Friesen cb5ad6d3c1 Handle missing exception in instance creation code
In the instance creation code path it's possible for the PciInvalidAlias
exception to be raised if the flavor extra-specs have an invalid PCI
alias.  This should be converted to HTTPBadRequest along with the other
exceptions stemming from invalid extra-specs.

Without this, it gets reported as an HTTP 500 error.

Change-Id: Ia6921b5cd9253f65ff6904bdbce942759633de95
Closes-Bug: #1818701
Signed-off-by: Chris Friesen <chris.friesen@windriver.com>
2019-03-05 11:04:06 -06:00
Balazs Gibizer 3225fb61f9 Support server create with ports having resource request
A new API microversion, 2.72, is added that enables support for Neutron
ports having resource request during server create.

Note that server delete and port detach operations already handle such
ports and will clean up the allocation properly.

Change-Id: I7555914473e16782d8ba4513a09ce45679962c14
blueprint: bandwidth-resource-provider
2019-03-05 17:48:29 +01:00
Balazs Gibizer c02e213d50 Ensure that bandwidth and VF are from the same PF
A neutron port can be created with direct vnic type and also it can have
bandwidth resource request at the same time. In this case placement will
offer allocation candidates that could fulfill the bandwidth request, then
the nova scheduler's PciPassthroughFilter checks if a PCI device, a VF,
is available for such request. This check is based on the physnet
of the neutron port and the physical_network tag in the
pci/passthrough_whitelist config. It does not consider the actual PF
providing the bandwidth.

The currently unsupported case is when a single compute node has
whitelisted VFs from more than one PF which are connected to the same
physnet. These PFs can have totally different bandwidth inventories
in placement. For example PF2 can have plenty of bandwidth available
and PF3 has no bandwidth configured at all.

In this case the PciPassthroughFilter might accept the host simply
because PF3 still has available VFs even if the bandwidth from the port
is fulfilled from PF2 which in return might not have available VFs any
more.

Moreover the PCI claim has the same logic as the filter so it will claim
the VF from PF3 while the bandwidth was allocated from PF2 in placement.

This patch does not try to solve the issue in the PciPassthroughFilter but
it does solves the issue in the pci claim. This means that after
successful scheduling the pci claim can still fail if bandwidth is
allocated from one PF but a VF is not available from that specific PF
any more. This will lead to re-schedule.

Making the PciPassthroughFilter smart enough is complicated because:
* The filters are not knowing about placement allocation candidates at
  all
* The filters are working per compute host not per allocation
  candidates. If there are two allocation candidates for the same host
  then nova will only try to filter for the first one. [1][2]

This patch applies the following logic:

The compute manager checks the InstancePCIRequest ovos in a given
boot request and maps each of them to the neutron port that requested
the PCI device. Then it maps the neutron port to the physical device
RP in the placement allocation made for this server. Then the spec in
the InstancePCIRequest is extended with the interface name of the PF
from where the bandwidth was allocated from based on the name of the
device RP. Then the PCI claim will enforce that the PF interface
name in the request matches the interface name of the PF from where
the VF is selected from. The PCI claim code knows about the PF
interface name of each available VF from the virt driver reporting the
'parent_ifname' key as part of the return value of the
get_available_resource() driver call.

The PCI claim process is not changed as it already enforces that
every fields from the request matches with the fields of the selected
device pool.

The current patch extends the libvirt driver to provider PF interface
name information. Besides the libvirt driver the xenapi driver also
support SRIOV VF handling but this patch does not extend the xenapi
driver. So for the xenapi the above described configuration currently
kept unsupported.

I know that this feels complicated but it is necessary becase VFs has
not been counted as resources in placement yet.

[1] https://github.com/openstack/nova/blob/f6996903d2ef0fdb40135b506c83ed6517b28e19/nova/scheduler/filter_scheduler.py#L239
[2] https://github.com/openstack/nova/blob/f6996903d2ef0fdb40135b506c83ed6517b28e19/nova/scheduler/filter_scheduler.py#L426

blueprint: bandwidth-resource-provider

Change-Id: I038867c4094d79ae4a20615ab9c9f9e38fcc2e0a
2019-03-05 17:48:29 +01:00
Zuul c43c1d3fb9 Merge "libvirt: Omit needless check on 'CONF.serial_console'" 2019-03-05 14:30:56 +00:00
Zuul 047f8c71c2 Merge "Fix wrong consumer type in logging" 2019-03-05 11:34:18 +00:00
Zuul 7f6c6e8446 Merge "Make move_allocations handle empty source allocations" 2019-03-05 10:20:34 +00:00
Adam Spiers 48d6753d37 Parse <emulator> elements from virConnectGetCapabilities()
Extend the associated LibvirtConfigCapsGuest class so that it can
parse and store <emulator> elements in a way which can be reused in
subsequent commits by callers of virConnectGetDomainCapabilities().

Ongoing / future work such as SEV[0] and the move of the default machine
type to q35[1] will need to invoke virConnectGetDomainCapabilities()
in order to check whether SEV or q35 respectively are supported.
However this API requires 5 parameters[2]:

   1) the path to the emulator binary (e.g. qemu-system-x86_64)
   2) the domain architecture
   3) the machine type
   4) the virtualization type
   5) flags (not used yet)

4) is determined by CONF.libvirt.virt_type.  The caller of
virConnectGetDomainCapabilities() can decide which combinations of 2)
and 3) type to pass[3], but still needs to know 1).  Once 2) and 3)
are known, this can be determined from the <emulator> elements
returned by libvirt's virConnectGetCapabilities() API[4].  nova
already calls this and parses the response into a LibvirtConfigCaps
object.  However currently the parser ignores the <emulator> elements.
This patch fixes that.

[0] https://specs.openstack.org/openstack/nova-specs/specs/stein/approved/amd-sev-libvirt-support.html
[1] https://blueprints.launchpad.net/nova/+spec/handle-default-machine-type-as-q35
[2] https://libvirt.org/html/libvirt-libvirt-domain.html#virConnectGetDomainCapabilities
[3] Calling virConnectGetDomainCapabilities for every machine type is
    most likely overkill:
    https://bugzilla.redhat.com/show_bug.cgi?id=1683471#c7
[4] https://libvirt.org/formatcaps.html#elementGuest

blueprint: amd-sev-libvirt-support
blueprint: handle-default-machine-type-as-q35
Change-Id: Ibdc88bc5e0a214bff36a8351a1b76980f7015d16
2019-03-05 10:07:21 +00:00
Zuul 9692f16b35 Merge "Convert driver supported capabilities to compute node provider traits" 2019-03-05 09:50:12 +00:00
licanwei 14a6767634 api-ref: typo service.disable_reason
'disable_reason' should be 'disabled_reason'

Change-Id: Ie320d7a7eb675bfdba2c907fd44b99c02974d343
2019-03-05 16:58:13 +08:00
Zuul 787a9f27b0 Merge "Improve existing flavor and image metadata validation" 2019-03-05 08:57:41 +00:00
Zuul 6343f48026 Merge "Use a placement conf when testing report client" 2019-03-05 02:27:13 +00:00
Zuul d528e11711 Merge "Add nits from Id2beaa7c4e5780199298f8e58fb6c7005e420a69" 2019-03-04 21:58:23 +00:00