This commit introduces the LightOS driver for nova. LightOS is a
software-defined disaggregated clustered storage solution running on
commodity servers with commodity SSDs. It it developed by Lightbits
Labs (https://www.lightbitslabs.com) and is actively developed and
maintained. LightOS is proprietary but the openstack drivers are
licensed under Apache v2.0.
The Cinder driver for LightOS currently supports the following
functionality:
Create volume
Delete volume
Attach volume
Detach volume
Create image from volume
create volume from image
Live migration
Volume replication
Thin provisioning
Multi-attach
Extend volume
Create snapshot
Delete snapshot
Create volume from snapshot
Create volume from volume (clone)
This driver has been developed and has been in use for a couple of
years by Lightbits and our clients. We have tested it extensively
internally with multiple openstack versions, including Queens, Rocky,
Stein, and Train. We have also tested it with master (19.1 xena) and we
are working to extend testing to cover additional openstack releases.
We are glad to join the openstack community and hope to get your
feedback and comments on this driver, and if it is acceptable, to see
it merged into the tree.
Note: the patch depends on os-brick 5.2.0. That version also increased
the lower constraints of several dependencies, thus needs nova to
increase those as well in requirements.txt, lower-constraints.txt and
setup.cfg.
Depends-On: I2e86fa84049053b7c75421d33ad1a1af459ef4e0
Signed-off-by: Yuval Brave yuval@lightbitslabs.com
Change-Id: Ic314b26695d9681d31a18adcec0794c2ff41fe71
Check the features list we get from the firmware descriptor file
to see if we need SMM (requires-smm), if so then enable it as
we aren't using the libvirt built in mechanism to enable it
when grabbing the right firmware.
Closes-Bug: 1958636
Change-Id: I890b3021a29fa546d9e36b21b1111e8537cd0020
Signed-off-by: Imran Hussain <ih@imranh.co.uk>
Currently, all ports attached to an instance must have a fixed IP
address already associated with them ('immediate' IP allocation policy)
or must get one during instance creation ('deferred' IP allocation
policy). However, there are situations where is can be helpful to create
a port without an IP address, for example, when there is an IP address
but it is not managed by neutron (this is unfortunately quite common for
certain NFV applications). The 'vm-without-l3-address' neutron blueprint
[1] added support for these kinds of ports, but until now, nova still
insisted on either a pre-existing IP assignment or deferred IP
assignment. Close the gap and allow nova to use these ports.
Thanks to I438cbab43b45b5f7afc820b77fcf5a0e823d0eff we no longer need to
check after binding to ensure we're on a backend that has
'connectivity' of 'l2'.
[1] https://specs.openstack.org/openstack/neutron-specs/specs/newton/unaddressed-port.html
Change-Id: I3c49f151ff1391e0a72c073d0d9c24e986c08938
Implements-blueprint: vm-boot-with-unaddressed-port
While most of the SR-IOV related documentation resides in the Neutron
repository which is going to have a separate section on the topic of
supporting remote-managed ports and off-path networking backends, there
are still some things specific to Nova which are worth documenting in
Nova docs.
https://docs.openstack.org/neutron/latest/admin/config-sriov.html
Implements: blueprint integration-with-off-path-network-backends
Change-Id: I3c5fe8ec0539e10d07b1b4888e9833bc7ede1d04
This was eventually added in Yoga, not Xena.
Change-Id: I8afe755732c95d023b7c4bd99964507f54d324f1
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Mostly copy-paste from the spec, but at least this is in-tree and
updatable.
Change-Id: I4cad2111065fbc1840d44fc9f4bf6ac585e18db6
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
When trying to attach a volume to an already running instance the nova-api
requests the nova-compute service to create a BlockDeviceMapping. If the
nova-api does not receive a response within `rpc_response_timeout` it will
treat the request as failed and raise an exception.
There are multiple cases where nova-compute actually already processed the
request and just the reply did not reach the nova-api in time (see bug report).
After the failed request the database will contain a BlockDeviceMapping entry
for the volume + instance combination that will never be cleaned up again.
This entry also causes the nova-api to reject all future attachments of this
volume to this instance (as it assumes it is already attached).
To work around this we check if a BlockDeviceMapping has already been created
when we see a messaging timeout. If this is the case we can safely delete it
as the compute node has already finished processing and we will no longer pick
it up.
This allows users to try the request again.
A previous fix was abandoned but without a clear reason ([1]).
[1]: https://review.opendev.org/c/openstack/nova/+/731804
Closes-Bug: 1960401
Change-Id: I17f4d7d2cb129c4ec1479cc4e5d723da75d3a527
Allow instances to be created with VNIC_TYPE_REMOTE_MANAGED ports.
Those ports are assumed to require remote-managed PCI devices which
means that operators need to tag those as "remote_managed" in the PCI
whitelist if this is the case (there is no meta information or standard
means of querying this information).
The following changes are introduced:
* Handling for VNIC_TYPE_REMOTE_MANAGED ports during allocation of
resources for instance creation (remote_managed == true in
InstancePciRequests);
* Usage of the noop os-vif plugin for VNIC_TYPE_REMOTE_MANAGED ports
in order to avoid the invocation of the local representor plugging
logic since a networking backend is responsible for that in this
case;
* Expectation of bind time events for ports of VNIC_TYPE_REMOTE_MANAGED.
Events for those arrive early from Neutron after a port update (before
Nova begins to wait in the virt driver code, therefore, Nova is set
to avoid waiting for plug events for VNIC_TYPE_REMOTE_MANAGED ports;
* Making sure the service version is high enough on all compute services
before creating instances with ports that have VNIC type
VNIC_TYPE_REMOTE_MANAGED. Network requests are examined for the presence
of port ids to determine the VNIC type via Neutron API. If
remote-managed ports are requested, a compute service version check
is performed across all cells.
Change-Id: Ica09376951d49bc60ce6e33147477e4fa38b9482
Implements: blueprint integration-with-off-path-network-backends
Add a pre-filter for requests that contain VNIC_TYPE_REMOTE_MANAGED
ports in them: hosts that do not have either the relevant compute
driver capability COMPUTE_REMOTE_MANAGED_PORTS or PCI device pools
with "remote_managed" devices are filtered out early. Presence of
devices actually available for allocation is checked at a later
point by the PciPassthroughFilter.
Change-Id: I168d3ccc914f25a3d4255c9b319ee6b91a2f66e2
Implements: blueprint integration-with-off-path-network-backends
In order to support remote-managed ports the following is needed:
* Nova compute driver needs to support this feature;
* For the Libvirt compute driver, a given host needs to have the right
version of Libvirt - the one which supports PCI VPD (7.9.0
https://libvirt.org/news.html#v7-9-0-2021-11-01).
Therefore, this change introduces a new capability to track driver
support for remote-managed ports.
Change-Id: I7ea96fd85d2607e0af0f6918b0b45c58e8bec058
The new version contains changes needed by the multi-architecture
support and off-path SmartNIC DPU support code.
Needed-By: I168d3ccc914f25a3d4255c9b319ee6b91a2f66e2
Needed-By: Ia070a29186c6123cf51e1b17373c2dc69676ae7c
Change-Id: Ic1179f3e5e2c1aeb069972f21edffe5b003eb525
PCI devices may be managed remotely from the perspective of a hypervisor
host (e.g. by a SmartNIC DPU) which means that the VF control plane is
not available to the hypervisor. Depending on the presence of a
remote_managed device attribute in the InstancePCIRequest spec and
available device types in a pool, additional processing needs to be
done:
* Filtering of devices marked as `remote_managed: "true"` in the
whitelist configuration so that they are not used in legacy SR-IOV
and hardware offload requests;
* Early error reporting if PFs marked as remote_managed="true" are
present in the whitelist configuration. This is not supported
explicitly since allocating such PFs would remove the associated
VFs from the pool and an instance with such PF and its VFs will
not have access to the control plane required for representor
interface plugging at the SmartNIC DPU side. This configuration
is not valid which is enforced in the PCIDeviceStats code.
* Checking of the presence of a card serial number in the PCI VPD
capability of a device if it was marked as `remote_managed: "true"`
in the whitelist. The card serial number presence is mandatory
because it is used for identification of a host in the networking
backend that will handle the configuration of a given PCI device at
the remote host side (i.e. representor plugging, flow programming).
For compatibility, all devices not explicitly marked as remote_managed
in the whitelist are assumed to have remote_managed attribute set to
False.
Implements: blueprint integration-with-off-path-network-backends
Change-Id: Ic44d5e206326827d00a751da3cea67afe3929a08
For some reason, we have two lineages of quota-related exceptions in
Nova. We have QuotaError (which sounds like an actual error), from
which all of our case-specific "over quota" exceptions inhert, such
as KeypairLimitExceeded, etc. In contrast, we have OverQuota which
lives outside that hierarchy and is unrelated. In a number of places,
we raise one and translate to the other, or raise the generic
QuotaError to signal an overquota situation, instead of OverQuota.
This leads to places where we have to catch both, signaling the same
over quota situation, but looking like there could be two different
causes (i.e. an error and being over quota).
This joins the two cases, by putting OverQuota at the top of the
hierarchy of specific exceptions and removing QuotaError. The latter
was only used in a few situations, so this isn't actually much change.
Cleaning this up will help with the unified limits work, reducing the
number of potential exceptions that mean the same thing.
Related to blueprint bp/unified-limits-nova
Change-Id: I17a3e20b8be98f9fb1a04b91fcf1237d67165871
Cells mean NUMA cells below in text.
By default, first instance's cell are placed to the host's cell with
id 0, so it will be exhausted first. Than host's cell with id 1 will
be used and exhausted. It will lead to error placing instance with
number of cells in NUMA topology equal to host's cells number if
some instances with one cell topology are placed on cell with id 0
before. Fix will perform several sorts to put less used cells at
the beginning of host_cells list based on PCI devices, memory and
cpu usage when packing_host_numa_cells_allocation_strategy is set
to False (so called 'spread strategy'), or will try to place all
VM's cell to the same host's cell untill it will be completely
exhausted and only after will start to use next available host's
cell (so called 'pack strategy'), when the configuration option
packing_host_numa_cells_allocation_strategy is set to True.
Partial-Bug: #1940668
Change-Id: I03c4db3c36a780aac19841b750ff59acd3572ec6
Retrieve PF mac and VF logical number at runtime for
a given VF PCI address and include them in port updates to Neutron.
Implements: blueprint integration-with-off-path-network-backends
Change-Id: I83a128a260acdd8bf78fede566af6881b8b82a9c
If there is a failed resize that also failed the cleanup
process performed by _cleanup_remote_migration() the retry
of the resize will fail because it cannot rename the current
instances directory to _resize.
This renames the _cleanup_failed_migration() that does the
same logic we want to _cleanup_failed_instance_base() and
uses it for both migration and resize cleanup of directory.
It then simply calls _cleanup_failed_instances_base() with
the resize dir path before trying a resize.
Closes-Bug: 1960230
Change-Id: I7412b16be310632da59a6139df9f0913281b5d77
This updates the announce self workaround config opt
description to include info about instance being set
as tainted by libvirt.
Change-Id: I8140c8fe592dd54fc09a9510723892806db49a56
This change adds
tempest.api.compute.servers.test_device_tagging.TaggedAttachmentsTest.test_tagged_attachment
to the tempest exclude regex
over the past few weeks we have noticed this test failing intermitently
and it has not started to become a gate blocker. This test is executed in other
jobs that use the PC machine type and is only failing in the nova-next
job which uses q35. As such while we work out how to address this properly
we skip it in the nova-next.
Change-Id: I845ca5989a8ad84d7c04971316fd892cd29cfe1f
Related-Bug: #1959899
Based on review feedback on [1] and [2].
[1] If39db50fd8b109a5a13dec70f8030f3663555065
[2] I518bb5d586b159b4796fb6139351ba423bc19639
Change-Id: I44920f20213462a3abe743ccd38b356d6490a7b4
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Virtually all of the code for parsing 'hw:'-prefixed extra specs and
'hw_'-prefix image metadata properties lives in the 'nova.virt.hardware'
module. It makes sense for these to be included there. Do that.
Change-Id: I1fabdf1827af597f9e5fdb40d5aef244024dd015
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>