Commit Graph

59884 Commits

Author SHA1 Message Date
Zuul a28afc29db Merge "Ignore PCI devs with physical_network tag" 2022-08-26 04:51:12 +00:00
Zuul 8d202c13a0 Merge "Reject mixed VF rc and trait config" 2022-08-26 04:46:36 +00:00
Zuul e55e3657ea Merge "Reject PCI dependent device config" 2022-08-26 04:42:58 +00:00
Zuul baef27f504 Merge "Extend device_spec with resource_class and traits" 2022-08-25 22:54:02 +00:00
Zuul ff9041e962 Merge "Basics for PCI Placement reporting" 2022-08-25 22:33:22 +00:00
Zuul ccc06ac808 Merge "Trigger reschedule if PCI consumption fail on compute" 2022-08-25 18:06:28 +00:00
Zuul 9316a156d4 Merge "Reproduce bug 1986838" 2022-08-25 12:23:06 +00:00
Zuul 76d95d5efc Merge "Keep legacy admin behaviour in new RBAC" 2022-08-25 09:12:16 +00:00
Balazs Gibizer 10ba714125 Ignore PCI devs with physical_network tag
The first version of the PCI tracking in placement feature will not
handle Neutron based SRIOV devices. So those are now ignored during
placement inventory reporting.

blueprint: pci-device-tracking-in-placement
Change-Id: Ie24969d60c84379673c5450863f4cf58cf09207c
2022-08-25 10:00:10 +02:00
Balazs Gibizer 07f2bf8035 Reject mixed VF rc and trait config
If two VFs from the same PF are configured by two separate
[pci]device_spec entries then it is possible to define contradicting
resource classes or traits. This patch detects and rejects such
configuration.

blueprint: pci-device-tracking-in-placement
Change-Id: I623ab24940169991a400eba854c9619a11662a91
2022-08-25 10:00:10 +02:00
Balazs Gibizer 0d526d1f4b Reject PCI dependent device config
The PCI tracking in placement does not support the configuration where
both a PF and its children VFs are configured for nova usage. This patch
adds logic to detect and reject such configuration. To be able to kill
the service if started with such config special exception handling is
added for the update_available_resource code path, similarly how a
failed reshape is handled.

blueprint: pci-device-tracking-in-placement
Change-Id: I708724465d2afaa37a65c231c64da88fc8b458eb
2022-08-25 10:00:10 +02:00
Balazs Gibizer 2722038946 Extend device_spec with resource_class and traits
Each [pci]device_spec entry can specify the two new resource_class and
traits tags.

If the resource_class is specified then it will be used as the RC in the
placement inventory of the PCI devices matching the spec. If not
specified then the RC is defaulted CUSTOM_PCI_<vendor_id>_<product_id>.

The traits tag is a comma separated list of trait names. Nova will
report these traits to RP representing the matching PCI devices.

blueprint: pci-device-tracking-in-placement
Change-Id: I71b7a2fb8b03a3679733a98958b2f6d447ed5004
2022-08-25 10:00:10 +02:00
Balazs Gibizer 953f1eef19 Basics for PCI Placement reporting
A new PCI resource handler is added to the update_available_resources
code path update the ProviderTree with PCI device RPs, inventories and
traits.

It is a bit different than the other Placement inventory reporter. It
does not run in the virt driver level as PCI is tracked in a generic way
in the PCI tracker in the resource tracker. So the virt specific
information is already parsed and abstracted by the resource tracker.

Another difference is that to support rolling upgrade the PCI handler
code needs to be prepared for situations where the scheduler does not
create PCI allocations even after some of the compute already started
reporting inventories and started healing PCI allocations. So the code
is not prepared to do a single, one shot, reshape at startup, but
instead to do a continuous healing of the allocations. We can remove
this continuous healing after the PCI prefilter will be made mandatory
in a future release.

The whole PCI placement reporting behavior is disabled by default while
it is incomplete. When it is functionally complete a new
[pci]report_in_placement config option will be added to allow enabling
the feature. This config is intentionally not added by this patch as we
don't want to allow enabling this logic yet.

blueprint: pci-device-tracking-in-placement
Change-Id: If975c3ec09ffa95f647eb4419874aa8417a59721
2022-08-25 10:00:10 +02:00
Balazs Gibizer 2b447b7236 Trigger reschedule if PCI consumption fail on compute
The PciPassthroughFilter logic checks each InstancePCIRequest
individually against the available PCI pools of a given host and given
boot request. So it is possible that the scheduler accepts a host that
has a single PCI device available even if two devices are requested for
a single instance via two separate PCI aliases. Then the PCI claim on
the compute detects this but does not stop the boot just logs an ERROR.
This results in the instance booted without any PCI device.

This patch does two things:
1) changes the PCI claim to fail with an exception and trigger a
   re-schedule instead of just logging an ERROR.
2) change the PciDeviceStats.support_requests that is called during
   scheduling to not just filter pools for individual requests but also
   consume the request from the pool within the scope of a single boot
   request.

The fix in #2) would not be enough alone as two parallel scheduling
request could race for a single device on the same host. #1) is the
ultimate place where we consume devices under a compute global lock so
we need the fix there too.

Closes-Bug: #1986838
Change-Id: Iea477be57ae4e95dfc03acc9368f31d4be895343
2022-08-25 10:00:10 +02:00
Balazs Gibizer 2aeb0a96b7 Reproduce bug 1986838
Related-Bug: #1986838
Change-Id: I374b21fafff1a2f359d3cf887a9c271449f83635
2022-08-25 10:00:10 +02:00
Zuul 3862cfc649 Merge "Add VDPA support for suspend and livemigrate" 2022-08-24 21:22:53 +00:00
Ghanshyam Mann 909b0b0247 Keep legacy admin behaviour in new RBAC
While discussing the new RBAC (scope_type and project admin vs
system admin things) with operators in berlin ops meetup and
via emails, and policy popup meetings, we got the feedback that
we need to keep the legacy admin behaviour same as it is otherwise
it is going to be a big breaking change for many of the operators.
Same feedback for scope_type.

- https://etherpad.opendev.org/p/BER-2022-OPS-SRBAC
- https://etherpad.opendev.org/p/rbac-operator-feedback

By considering the feedback, we decided to postpone the
system scope implementation, release project reader
role and not to change the legacy admin behaviour.

To keep the legacy admin behaviour unchanged, we need to
modify our policy new default so that legacy admin continue
to have the access to the APIs they are able to access in
old RBAC. Basically the below changes:

- PROJECT_ADMIN -> ADMIN (legacy admin who can do things in all projects)
- PROJECT_MEMBER -> PROJECT_MEMBER_OR_ADMIN (give access to legacy admin too)
- PROJECT_READER -> PROJECT_READER_OR_ADMIN (give access to legacy admin too)

Complete direction on RBAC is updated in community wide goal
- https://review.opendev.org/c/openstack/governance/+/847418/13

Change-Id: I37e706f75a36fb27da1bdd5fba671cb1bcadc745
2022-08-24 16:33:27 +00:00
Dan Smith 066e1e69d1 Remove system scope from all APIs
In line with the recent RBAC working group discussion and operator
feedback, this converts all our APIs back to project-only. It leaves
the actual scope_types in place, with them all set to project. This
allows an operator to turn on scope checking to *ensure* that only
project-scoped tokens are used, in case system scope is in use
elsewhere in the deployment (i.e. for keystone or ironic). Without
this, system scoped tokens will fail some operations in strange
(read: 500 and "database error") ways.

Change-Id: I951a11affa1d1e42863967cdc713618ff0a74814
2022-08-24 13:12:16 +00:00
Zuul 9a82a90993 Merge "Fix suspend for non hostdev sriov ports" 2022-08-24 12:20:52 +00:00
Zuul 5875777196 Merge "Add source dev parsing for vdpa interfaces" 2022-08-24 12:20:45 +00:00
Zuul 94065763d3 Merge "nova-live-migration tests not needed for Ironic" 2022-08-23 18:47:50 +00:00
Zuul 3c7d6e4c9c Merge "Revert "Test attached volume extend actions in the nova-next job"" 2022-08-23 18:08:09 +00:00
Zuul cbc9b516fb Merge "Alphabetizes objects" 2022-08-23 10:43:41 +00:00
Balazs Gibizer 0f82a6465a Revert "Test attached volume extend actions in the nova-next job"
This reverts commit a669f9150a.

Reason for revert: The test is unstable. This patch actually failed the check queue one 877f0e19083e47968b3fc12cae82eba3 but was rechecked and merged. The enabled testcase now failing in the gate frequently: https://paste.opendev.org/show/bDiRRVFsYa5MAND6Bhjm/

Change-Id: Ieb1ca2ea647c25e64e5a9f72402195469349695a
2022-08-23 10:34:18 +00:00
Sean Mooney 0aad338b1c Add VDPA support for suspend and livemigrate
This change append vnic-type vdpa to the list
of passthough vnic types and removes the api blocks

This should enable the existing suspend and live migrate
code to properly manage vdpa interfaces enabling
"hot plug" live migrations similar to direct sr-iov.

Implements: blueprint vdpa-suspend-detach-and-live-migrate
Change-Id: I878a9609ce0d84f7e3c2fef99e369b34d627a0df
2022-08-23 09:32:00 +01:00
Sean Mooney 51a970af37 Fix suspend for non hostdev sriov ports
change I3a45b1fb41e8e446d1f25d7a1d77991c8bf2a1ed
tried to fix bug #1563874 by using _detach_pci_device
to remove hostdev pci devices however that breaks
other usecase so we attempt to fix that by only
calling _detach_pci_device for devices it can
handle and use detach_interface for the rest.

Related-bug: #1563874
Related-bug: #1970467
Change-Id: I351d58d6922ca169b641500c12ffd6f91829df90
2022-08-22 14:57:21 +01:00
Sean Mooney 6f1c7ab2e7 Add source dev parsing for vdpa interfaces
This change extends the guest xml parsing such that
the source device path can be extreacted from interface
elements of type vdpa.

This is required to identify the interface to remove when
detaching a vdpa port from a domain.

This change fixes a latent bug in the libvirt fixutre
related to the domain xml generation for vdpa interfaces.

Change-Id: I5f41170e7038f4b872066de4b1ad509113034960
2022-08-22 14:57:21 +01:00
René Ribaud 49605f8829 Alphabetizes objects
Alphabetizes objects as suggested in the code comment.
This change was part of the manila virtiofs series.
But as indicated by gibi, this part is now an individual patch to not
mix topics and makes the initial patch easier to read.

Change-Id: I765ff142b2b4b6beffa35d5fc2a9985907307640
2022-08-22 15:39:36 +02:00
Takashi Natsume 18d9c85aa4 Fix a deprecation warning about threading.Thread
Fix the following deprecation warning.

* DeprecationWarning: setDaemon() is deprecated,
  set the daemon attribute instead

Change-Id: I208bd1bef002ce91e57798f3475bbe64d3b81329
Closes-Bug: 1987191
Signed-off-by: Takashi Natsume <takanattie@gmail.com>
2022-08-21 15:05:11 +09:00
Zuul ddcc286ee1 Merge "enable blocked VDPA move operations" 2022-08-20 15:37:54 +00:00
Takashi Natsume 07022c7791 doc: Update a PTL guide
It does not need anymore to add database migration placeholders
because of using alembic.
So remove the description in the PTL guide.

Change-Id: If958dd78ff82e2239be1af3835a51a1a3551c5d9
Signed-off-by: Takashi Natsume <takanattie@gmail.com>
2022-08-20 06:18:58 +00:00
Zuul 85c9544444 Merge "block_device: Add encryption attributes to image and ephemeral disks" 2022-08-19 16:23:36 +00:00
Zuul 13e4dd76f6 Merge "block_device: Add DriverImageBlockDevice to block_device_info" 2022-08-19 15:34:31 +00:00
Zuul 5f2331c2d1 Merge "Add reno for fixing bug 1941005" 2022-08-19 13:38:35 +00:00
Zuul a52f35f40e Merge "Test attached volume extend actions in the nova-next job" 2022-08-19 08:02:36 +00:00
Zuul 8860495e6a Merge "scheduler: Add an ephemeral encryption pre filter" 2022-08-19 03:47:53 +00:00
Zuul 3c4a473909 Merge "Avoid n-cond startup abort for keystone failures" 2022-08-19 02:48:42 +00:00
Zuul 4d130cb9c5 Merge "Unify placement client singleton implementations" 2022-08-19 02:48:33 +00:00
Zuul 378d178cee Merge "virt: Add ephemeral encryption flag" 2022-08-18 20:29:58 +00:00
Dan Smith 232684b440 Avoid n-cond startup abort for keystone failures
Conductor creates a placement client for the potential case where
it needs to make a call for certain operations. A transient network
or keystone failure will currently cause it to abort startup, which
means it is not available for other unrelated activities, such as
DB proxying for compute.

This makes conductor test the placement client on startup, but only
abort startup on errors that are highly likely to be permanent
configuration errors, and only warn about things like being unable
to contact keystone/placement during initialization. If a non-fatal
error is encountered at startup, later operations needing the
placement client will retry initialization.

Closes-Bug: #1846820
Change-Id: Idb7fcbce0c9562e7b9bd3e80f2a6d4b9bc286830
2022-08-18 07:37:42 -07:00
Dan Smith c178d93606 Unify placement client singleton implementations
We have many places where we implement singleton behavior for the
placement client. This unifies them into a single place and
implementation. Not only does this DRY things up, but may cause us
to initialize it fewer times and also allows for emitting a common
set of error messages about expected failures for better
troubleshooting.

Change-Id: Iab8a791f64323f996e1d6e6d5a7e7a7c34eb4fb3
Related-Bug: #1846820
2022-08-18 07:22:37 -07:00
Zuul 3af84811c8 Merge "compute: Update bdms with ephemeral encryption details when requested" 2022-08-18 12:31:45 +00:00
Zuul 625d62acd4 Merge "BlockDeviceMapping: Add is_local property" 2022-08-18 12:31:37 +00:00
Zuul aba410196b Merge "BlockDeviceMapping: Add encryption fields" 2022-08-18 12:31:29 +00:00
Zuul c952c52cc9 Merge "image_meta: Add ephemeral encryption properties" 2022-08-18 12:31:22 +00:00
Zuul 64995bbe9f Merge "imagebackend: default by_name image_type to config correctly" 2022-08-18 12:31:14 +00:00
Zuul 76528c36df Merge "libvirt: Remove defunct comment" 2022-08-18 06:57:44 +00:00
Zuul e8705a3c1e Merge "libvirt: Improve creating images INFO log" 2022-08-18 06:52:51 +00:00
Zuul 4c7d81ebca Merge "block_device_info: Add swap to inline" 2022-08-18 03:10:54 +00:00
Jay Faulkner c7b865c79b nova-live-migration tests not needed for Ironic
Ironic does not support live migration, so we will skip these tests
if the only changed files are in Ironic virt driver to ensur we
don't waste resources or time trying to run unneeded tests.

Change-Id: Ieb5ac3bb93af6a950acff4d76d0276096a6a24dd
2022-08-17 11:31:39 -07:00