Previously, if _ensure_resource_provider encountered any error from the
placement REST API, it would (sometimes log a message and) return None.
Furthermore, a name conflict while creating the provider was treated the
same as a UUID conflict, which would actually result in None being
returned.
With this change set, the error paths that previously returned None now
raise one of the new ResourceProviderRetrievalFailed or
ResourceProviderCreationFailed exceptions; and the name conflict path is
detected and treated as an error condition.
Note: This change set only touches the SchedulerReportClient side of
these error conditions - it makes no attempt to add error handling to
its callers. Case in point, the API samples tests needed fixing because
they were previously running into the name conflict error condition, but
not noticing. As currently implemented, the new exceptions will
percolate up to ComputeManager.update_available_resource_for_node like
any others coming from SchedulerReportClient, where they will be logged
and ignored.
Change-Id: I0c4ca6a81f213277fe7219cb905a805712f81e36
Closes-Bug: #1735430
Remove log translation macros from nova/scheduler/client/report.py.
Also correct a couple of docstring typos.
Change-Id: I8d6ab08781fd132afc4db9da4c4c179617d6659f
This function enables users to specify a target host
when cold migrating a VM instance.
This patch modifies the migration API.
APIImpact
Add an optional parameter 'host' in cold migration action.
Change-Id: Iee356c4dd097c846b6ca8617ead6a061300c83f8
Implements: blueprint cold-migration-with-target-queens
Old pci_devices records might not have a uuid value set
and when we load those out of the database, the
PciDevice._from_db_object code was blindly trying to set
the PciDevice.uuid field to None, which fails because the
PciDevice.uuid field is not nullable.
This change fixes the problem by skipping the 'uuid' field
if it's not set in the db record so that we can auto-generate
a uuid later and update the object with it, which also performs
our online data migration.
This is similar to how we handle the uuid online migration for
other objects like compute nodes, services and migrations.
Change-Id: I5de0979e280004c1ce0acc99d69cc96089a704f8
Closes-Bug: #1735188
When a user calls the volume-update API, we swap_volume in the libvirt
driver from the old volume attachment to the new volume attachment.
Currently, we're saving the domain XML with the old configuration prior
to updating the volume and upon a soft-reboot request, it results in an
error:
Instance soft reboot failed: Cannot access storage file <old path>
and falls back to a hard reboot, which is like pulling the power cord,
possibly resulting in file system inconsistencies.
This changes to saving the new, updated domain XML after the volume
swap.
Closes-Bug: #1713857
Change-Id: I166cde5ad8b00699e4ec02609f0d7b69236d855d
Adds initial support for storing the relationship between parent and
child resource providers. Nested resource providers are essential for
expressing certain types of resources -- in particular SR-IOV physical
functions and certain SR-IOV fully-programmable gate arrays. The
resources that these providers expose are of resource class
SRIOV_NET_VF and we will need a way of indicating that the physical
function providing these virtual function resources is tagged with
certain traits (representing vendor_id, product_id or the physical
network the PF is attached to).
The compute host is a resource provider which has an SR-IOV-enabled
physical function (NIC) as a child resource provider. The physical
function has an inventory containing some total amount of SRIOV_NET_VF
resources. These SRIOV_NET_VF resources are allocated to zero or more
consumers (instances) on the compute host.
compute host (parent resource provider)
|
|
SR-IOV PF (child resource provider)
:
/ \
/ \
VF1 VF2 (inventory of child provider)
The resource provider model gets two new fields:
- root_provider_uuid: The "top" or "root" of the tree of nested
providers
- parent_provider_uuid: The immediate parent of the provider, or None
if the provider is a root provider.
The database schema adds two new columns to the resource_providers
table that contain the internal integer IDs that correspond to the
user-facing UUID values:
- root_provider_id
- parent_provider_id
The root_provider_uuid field is non-nullable in the ResourceProvider
object definition, and this code includes an online data migration to
automatically populate the root_provider_id field with the value of the
resource_providers.id field for any resource providers already in the
DB.
The root_provider_id field value is populated automatically when a
provider is created. If the parent provider UUID is set, then the
root_provider_id is set to the root_provider_id value of the parent. If
parent is unset, root_provider_id is set to the value of the id
attribute of the provider being created. The corresponding UUID values
for root and parent provider are fetched in the queries that retrieves
resource provider data using two self-referential joins.
The root_provider_id column allows us to do extremely quick lookups of
an entire tree of providers without needing to perform any recursive
database queries.
Logic in this patch ensures that no resource provider can be deleted if
any of its children have any allocations active on them. We also check
to ensure that when created or updated, a resource provider's parent
provider UUID actually points to an existing provider.
It's important to point out that qualitative trait information is only
associated with a resource provider entity, not the resources that
resource provider has in its inventory. This is the reason why nested
resource providers are necessary. In the case of things like NUMA nodes
or SRIOV physical functions, if a compute host had multiple SRIOV
physical functions, each associated with a different network trait,
there would be no way to differentiate between the SRIOV_NET_VF
resources that those multiple SRIOV physical functions provided if the
containing compute host had a single inventory item containing the
total number of VFs exposed by both PFs.
Change-Id: I2d8df57f77a03cde898d9ec792c5d59b75f61204
blueprint: nested-resource-providers
Co-Authored-By: Moshe Levi <moshele@mellanox.com>
The TrustedFilter and the related trusted_computing config options
were deprecated in Pike:
If6e53feeb97e6050c1eb7962110ed89504c952fc
Co-Authored-By: Matt Riedemann <mriedem.os@gmail.com>
Change-Id: I0a7ab3a4fb2cfad567a8644bed4de574393ee11a
In the review of I49f5680c15413bce27f2abba68b699f3ea95dcdc, a few
non-blocking nits were identified. This change addresses some of
those nits, fixing some typos, clarifying method names and what
microversion is in use at particular times.
Change-Id: Iff15340502ce43eba3b98db26aa0652b1da24504
This provides microversion 1.13 of the placement API, giving the
ability to POST to /allocations to set (or clear) allocations for
more than one consumer uuid.
It builds on the recent work to support a dict-based JSON format
when doing a PUT to /allocations/{consumer_uuid}.
Being able to set allocations for multiple consumers in one request
helps to address race conditions when cleaning up allocations during
move operations in nova.
Clearing allocations is done by setting the 'allocations' key for a
specific consumer to an empty dict.
Updates to placement-api-ref, rest version history and a reno are
included.
Change-Id: I239f33841bb9fcd92b406f979674ae8c5f8d57e3
Implements: bp post-allocations
We currently don't record lock/unlock instance
actions. This is useful for auditing and debugging.
This patch adds instance lock/unlock actions.
Change-Id: I09fadf79aac1a74465af48015ef97d9e9d4ac580
partial-implements: blueprint fill-the-gap-for-instance-action-records
We currently don't record volume attach/detach/swap instance
actions. This is useful for auditing and debugging.
This patch adds volume attach/detach/swap actions.
Change-Id: I0a3d15f3e3d0d8d920a79b519e17e3228e99f293
partial-implements: blueprint fill-the-gap-for-instance-action-records
Commit 984dd8ad6a makes a rebuild
with a new image go through the scheduler again to validate the
image against the instance.host (we rebuild to the same host that
the instance already lives on).
The problem is that change introduced a regression where the
FilterScheduler is going to think it's doing a resize to the same
host and double the allocations for the instance (and usage for the
compute node provider) in Placement, which is wrong since the
flavor is the same.
This adds a regression test to show the bug.
Change-Id: Ie0949b4e6101f0b29ec4542146d523a07a683991
Related-Bug: #1664931
This aims to fix the issue described in bug 1664931 where a rebuild
fails to validate the existing host with the scheduler when a new
image is provided. The previous attempt to do this could cause rebuilds
to fail unnecessarily because we ran _all_ of the filters during a
rebuild, which could cause usage/resource filters to prevent an otherwise
valid rebuild from succeeding.
This aims to classify filters as useful for rebuild or not, and only apply
the former during a rebuild scheduler check. We do this by using an internal
scheduler hint, indicating our intent. This should (a) filter out
all hosts other than the one we're running on and (b) be detectable by
the filtering infrastructure as an internally-generated scheduling request
in order to trigger the correct filtering behavior.
Closes-Bug: #1664931
Change-Id: I1a46ef1503be2febcd20f4594f44344d05525446