Allow instances to be created with VNIC_TYPE_REMOTE_MANAGED ports.
Those ports are assumed to require remote-managed PCI devices which
means that operators need to tag those as "remote_managed" in the PCI
whitelist if this is the case (there is no meta information or standard
means of querying this information).
The following changes are introduced:
* Handling for VNIC_TYPE_REMOTE_MANAGED ports during allocation of
resources for instance creation (remote_managed == true in
InstancePciRequests);
* Usage of the noop os-vif plugin for VNIC_TYPE_REMOTE_MANAGED ports
in order to avoid the invocation of the local representor plugging
logic since a networking backend is responsible for that in this
case;
* Expectation of bind time events for ports of VNIC_TYPE_REMOTE_MANAGED.
Events for those arrive early from Neutron after a port update (before
Nova begins to wait in the virt driver code, therefore, Nova is set
to avoid waiting for plug events for VNIC_TYPE_REMOTE_MANAGED ports;
* Making sure the service version is high enough on all compute services
before creating instances with ports that have VNIC type
VNIC_TYPE_REMOTE_MANAGED. Network requests are examined for the presence
of port ids to determine the VNIC type via Neutron API. If
remote-managed ports are requested, a compute service version check
is performed across all cells.
Change-Id: Ica09376951d49bc60ce6e33147477e4fa38b9482
Implements: blueprint integration-with-off-path-network-backends
Add a pre-filter for requests that contain VNIC_TYPE_REMOTE_MANAGED
ports in them: hosts that do not have either the relevant compute
driver capability COMPUTE_REMOTE_MANAGED_PORTS or PCI device pools
with "remote_managed" devices are filtered out early. Presence of
devices actually available for allocation is checked at a later
point by the PciPassthroughFilter.
Change-Id: I168d3ccc914f25a3d4255c9b319ee6b91a2f66e2
Implements: blueprint integration-with-off-path-network-backends
In order to support remote-managed ports the following is needed:
* Nova compute driver needs to support this feature;
* For the Libvirt compute driver, a given host needs to have the right
version of Libvirt - the one which supports PCI VPD (7.9.0
https://libvirt.org/news.html#v7-9-0-2021-11-01).
Therefore, this change introduces a new capability to track driver
support for remote-managed ports.
Change-Id: I7ea96fd85d2607e0af0f6918b0b45c58e8bec058
The new version contains changes needed by the multi-architecture
support and off-path SmartNIC DPU support code.
Needed-By: I168d3ccc914f25a3d4255c9b319ee6b91a2f66e2
Needed-By: Ia070a29186c6123cf51e1b17373c2dc69676ae7c
Change-Id: Ic1179f3e5e2c1aeb069972f21edffe5b003eb525
PCI devices may be managed remotely from the perspective of a hypervisor
host (e.g. by a SmartNIC DPU) which means that the VF control plane is
not available to the hypervisor. Depending on the presence of a
remote_managed device attribute in the InstancePCIRequest spec and
available device types in a pool, additional processing needs to be
done:
* Filtering of devices marked as `remote_managed: "true"` in the
whitelist configuration so that they are not used in legacy SR-IOV
and hardware offload requests;
* Early error reporting if PFs marked as remote_managed="true" are
present in the whitelist configuration. This is not supported
explicitly since allocating such PFs would remove the associated
VFs from the pool and an instance with such PF and its VFs will
not have access to the control plane required for representor
interface plugging at the SmartNIC DPU side. This configuration
is not valid which is enforced in the PCIDeviceStats code.
* Checking of the presence of a card serial number in the PCI VPD
capability of a device if it was marked as `remote_managed: "true"`
in the whitelist. The card serial number presence is mandatory
because it is used for identification of a host in the networking
backend that will handle the configuration of a given PCI device at
the remote host side (i.e. representor plugging, flow programming).
For compatibility, all devices not explicitly marked as remote_managed
in the whitelist are assumed to have remote_managed attribute set to
False.
Implements: blueprint integration-with-off-path-network-backends
Change-Id: Ic44d5e206326827d00a751da3cea67afe3929a08
Retrieve PF mac and VF logical number at runtime for
a given VF PCI address and include them in port updates to Neutron.
Implements: blueprint integration-with-off-path-network-backends
Change-Id: I83a128a260acdd8bf78fede566af6881b8b82a9c
If there is a failed resize that also failed the cleanup
process performed by _cleanup_remote_migration() the retry
of the resize will fail because it cannot rename the current
instances directory to _resize.
This renames the _cleanup_failed_migration() that does the
same logic we want to _cleanup_failed_instance_base() and
uses it for both migration and resize cleanup of directory.
It then simply calls _cleanup_failed_instances_base() with
the resize dir path before trying a resize.
Closes-Bug: 1960230
Change-Id: I7412b16be310632da59a6139df9f0913281b5d77
This updates the announce self workaround config opt
description to include info about instance being set
as tainted by libvirt.
Change-Id: I8140c8fe592dd54fc09a9510723892806db49a56
This change adds
tempest.api.compute.servers.test_device_tagging.TaggedAttachmentsTest.test_tagged_attachment
to the tempest exclude regex
over the past few weeks we have noticed this test failing intermitently
and it has not started to become a gate blocker. This test is executed in other
jobs that use the PC machine type and is only failing in the nova-next
job which uses q35. As such while we work out how to address this properly
we skip it in the nova-next.
Change-Id: I845ca5989a8ad84d7c04971316fd892cd29cfe1f
Related-Bug: #1959899
Based on review feedback on [1] and [2].
[1] If39db50fd8b109a5a13dec70f8030f3663555065
[2] I518bb5d586b159b4796fb6139351ba423bc19639
Change-Id: I44920f20213462a3abe743ccd38b356d6490a7b4
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
This change comes as a part of the "Off-path Networking Backends
Support" spec implementation.
https://review.opendev.org/c/openstack/nova-specs/+/787458
* Add VPD capability parsing support
* The XML data from libvirt is parsed and formatted into PCI device
JSON dict that is sent to Nova API and is stored in the extra_info
column of a PciDevice.
The code gracefully handles the lack of the capability since it is
optional or Libvirt may not support it in a particular release.
https://libvirt.org/news.html#v7-9-0-2021-11-01 (VPD capability
was added in 7.9.0).
* Pass the serial number to Neutron in port updates
If a card serial number is present based on the information from PCI
VPD, pass it to Neutron along with other PCI-related information.
Change-Id: I6445433142286728a8c7efadcf80d07082d60bc3
Implements: blueprint integration-with-off-path-network-backends
Specifying a duplicate port ID is currently "allowed" but results in an
integrity error when nova attempts to create a duplicate
'VirtualInterface' entry. Start rejecting these requests by checking for
duplicate IDs and rejecting offending requests. This is arguably an API
change because there isn't a HTTP 5xx error (server create is an async
operation), however, users shouldn't have to opt in to non-broken
behavior and the underlying instance was never actually created
previously, meaning automation that relied on this "feature" was always
going to fail in a later step. We're also silently failing to do what
the user asked (per flow chart at [1]).
[1] https://docs.openstack.org/nova/latest/contributor/microversions.html#when-do-i-need-a-new-microversion
Change-Id: Ie90fb83662dd06e7188f042fc6340596f93c5ef9
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Closes-Bug: #1821088
This reverts commit 7a7a223602.
That commit was added because - tl'dr - upon revert resize, Neutron
with the OVS backend and the iptables security group driver would send
us the network-vif-plugged event as soon as we updated the port
binding.
That behaviour has changed with commit 66c7f00e1d. With that commit,
we started unplugging the vifs on the source compute host when doing a
resize. When reverting the resize, the vifs had to be re-plugged again,
regarldess of the networking backend in use. This renders commit
7a7a223602. pointless, and it can be
reverted.
Conflicts - most have to do with context around this commit's code:
nova/compute/manager.py
a2984b647a added provider_mappings to
_finish_revert_resize_network_migrate_finish()'s signature
750aef54b1 started using
_finish_revert_resize_network_migrate_finish() in
_finish_revert_snapshot_based_resize_at_source()
nova/network/model.py
8b33ac0644 added get_live_migration_plug_time_events() and
has_live_migration_plug_time_event()
7da94440db added has_port_with_allocation()
nova/objects/migration.py
f203da3838 added is_resize() and is_live_migration()
nova/tests/unit/compute/test_compute.py
a0e60feb3e added request_spec to the test
nova/tests/unit/compute/test_compute_mgr.py
be278006a5 added unit tests below ours
nova/tests/unit/network/test_network_info.py
7da94440db (again) added tests for has_port_with_allocation()
nova/tests/unit/virt/libvirt/test_driver.py and
nova/virt/libvirt/driver.py are different in that attempting to
identify individual conflicts is a pointless exercise, as so much has
changed (mdev, vtmp, the recent wait for events during hard reboot
workaround config option, etc). They can be treated as
manual removal of any code that had to do with the bind-time events
logic (though guided by the conflict markers in git).
TODO(artom) There was a follow up commit,
78a08d44ea, that added the migration
parameter to finish_revert_migration(). This is no longer needed, as
the migration was only used to obtain plug-time events. We'll have to
undo that as well.
Closes-bug: 1952003
Change-Id: I3cb39a9ec2c260f422b3c48122b9db512cdd799b
We have a gap in our testing of the exernal events interaction between
Nova and Neutron. The nova-next job tests with the OVS network
backend, and Neutron has jobs that test the OVN network backend, but
nothing tests OVS + the iptables security group firewall driver, aka
"hybrid plug". Add a job to test that.
Related-bug: 1952003
Change-Id: Ie42eaa2a39ef097b0eb69b8863bb342bae007fff
We no longer have a Xen driver. This is an unnecessary dependency.
Change-Id: Ic298fa9ac4a8935ce4e0dc17d8842d399d4eb808
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
We recently added a hard failure to nova service startup for the case
where computes were more than one version old (as indicated by their
service record). This helps to prevent starting up new control
services when a very old compute is still running. However, during an
FFU, control services that have skipped multiple versions will be
started and find the older compute records (which could not be updated
yet due to their reliance on the control services being up) and refuse
to start. This creates a cross-dependency which is not resolvable
without hacking the database.
This patch adds a workaround flag to allow turning that hard fail into
a warning to proceed past the issue. This less-than-ideal solution
is simple and backportable, but perhaps a better solution can be
implemented for the future.
Related-Bug: #1958883
Change-Id: Iddbc9b2a13f19cea9a996aeadfe891f4ef3b0264