Currently when providing existing direct port, nova-compute
will overwrite the binding-profile information with pci_vendor_info
and pci_slot. The binding-profile will be used to request
NIC capabilities for SR-IOV ports [1]. This also allows to distinguish
which neutron mechanism driver will bind the port [2].
This patch updates the behaviour that on update port it will update,
rather than overwrite, the binding-profile information with
pci_vendor_info and pci_slot. And on unbind port it will remove
only the pci_vendor_info and pci_slot from the port binding-profile
rather than unsetting the entire field.
[1] https://review.openstack.org/#/c/435954/
[2] https://review.openstack.org/#/c/499203/
Closes-Bug: #1719327
Change-Id: I80106707a037d567d0f690570f2cf9cfcd30d594
This change set pulls AllocationCandidatesTestCase out of
test_resource_provider and into its own test_allocation_candidates
module.
There is no change to the code. This is just a refactor. We're going
to add a bunch more test cases for allocation candidates, and the
test_resource_provider module was already getting out of hand.
Change-Id: Iedfb712d4668a2d34112449aa6ef0263d02e24a4
Co-Authored-By: Gábor Antal <antal@inf.u-szeged.hu>
Change-Id: I27608f60dd5f8458e476286c6991c47dba7852b1
Implements: bp versioned-notification-transformation-queens
When we notice that an instance was deleted after scheduling, we punt on
instance creation. When that happens, the scheduler will have created
allocations already so we need to delete those to avoid leaking resources.
Related-Bug: #1679750
Change-Id: I54806fe43257528fbec7d44c841ee4abb14c9dff
The resource tracker's _remove_deleted_instances_allocations() assumes that
InstanceNotFound means that an instance was deleted. That's not quite accurate,
as we would also see that in the window between creating allocations and actually
creating the instance in the cell database. So, the code now will kill
allocations for those instances before they are created.
This change makes us look up the instance with read_deleted=yes, and if we find
it with deleted=True, then we do the allocation removal. This does mean that
someone running a full DB archive at the instant an instance is deleted in some
way that didn't result in allocation removal as well could leak those. However,
we can log that (unlikely) situation.
Closes-Bug: #1729371
Change-Id: I4482ac2ecf8e07c197fd24c520b7f11fd5a10945
This change set bases the extra_specs processing in
resources_from_flavor and resources_from_request_spec on the new
ResourceRequest class and its from_extra_specs factory method.
resources_from_flavor is used by consumers who assume the compute node
is the one and only resource provider, so here we fold all the resources
together, even if they're coming from multiple groupings in the
extra_specs. This will have to be fixed as soon as we have anything
using nested or shared RPs.
resources_from_request_spec now produces a ResourceRequest object which
is passed directly into the scheduler's get_allocation_candidates
method. However, get_allocation_candidates currently just extracts the
(unnumbered) 'resources' dict out, resulting in precisely the
pre-existing behavior. Later in the series, when GET
/allocation_candidates is prepped to accept it, this will be changed to
process the whole dict into the query string for GET
/allocation_candidates.
_process_extra_specs is no longer used, and is removed.
blueprint: granular-resource-requests
Change-Id: Ic940faabd32c2c7e6673fccab6694c5eedab272f
This change introduces nova.scheduler.utils.ResourceRequest and its
factory method, from_extra_specs, which parses and validates granular
resource and trait specifications out of an extra_specs dictionary per
the spec [1]. The ResourceRequest object will be passed to the
scheduler later in this series.
[1] https://specs.openstack.org/openstack/nova-specs/specs/queens/approved/granular-resource-requests.html#syntax-in-flavors
blueprint: granular-resource-requests
Change-Id: I30296fc82f7a29c713137aedc95502c8379af7e1
In anticipation of accepting numbered groupings in placement's GET
/allocation_candidates API, this change adds a util method to parse a
single traits list query parameter value; and another to parse an entire
querystring into a list of RequestGroup instances.
Change-Id: I720630155e9aa53e049e932b9722aaa4ff4247db
blueprint: granular-resource-requests
This change introduces a new module, nova.api.openstack.placement.lib,
with a class RequestGroup, which is to be used both by the placement API
internally (when parsing the querystring to GET /allocation_candidates)
and nova (when parsing extra_specs and producing said querystring for
said API).
The decision to put this into a placement-specific package is
deliberate: when placement gets split out, this should go with it, so it
can be imported by any placement API consumer. In the meantime, we
accept that it will feel a little weird for nova code to import from a
placement API package; but this should be the only module for which that
happens - i.e. any other "shared" classes/utils should go in this same
module.
Change-Id: I8a70347d06d032788d27d1af8967a65d530c15fe
blueprint: granular-resource-requests
If we specify block migration, but there are no disks which actually
require block migration we call libvirt's migrateToURI3() with
VIR_MIGRATE_NON_SHARED_INC in flags and an empty migrate_disks in
params. Libvirt interprets this to be the default block migration
behaviour of "block migrate all writeable disks". However,
migrate_disks may only be empty because we filtered attached volumes
out of it, in which case libvirt will block migrate attached volumes.
This is a data corruptor.
This change addresses the issue at the point we call migrateToURI3().
As we never want the default block migration behaviour, we can safely
remove the flag if the list of disks to migrate is empty.
Change-Id: I9b545ca8aa6dd7b41ddea2d333190c9fbed19bc1
Resolves-bug: #1719362
The FilterScheduler._schedule method should be returning
a list of list of selected hosts. When include_alternatives
is False in _legacy_find_hosts, it was only returning back
a list of hosts, which would result in an IndexError when
select_destinations() tries to take the first entry from each
item in the list.
Change-Id: Ia6c87900605d3604beb74b942b0e30575b814112
Closes-Bug: #1729445
When trying to recreate hundreds of instance action
events for scale testing with the FakeDriver, a nice
simple way to do that is by stopping those instances
and starting them again.
However, since power_off/on aren't implemented, once
you "stop" them the sync_instance_power_state periodic
task in the compute manager thinks they are still running
on the "hypervisor" and will stop them again via the API,
which records yet another instance action and set of
events.
This just toggles the power state bit on the fake instance
in the FakeDriver to make the periodic task do the right
thing.
As a result, we also have more realistic API and
notification samples.
Change-Id: Ie621686053ad774c4ae4f22bb2a455f98900b611
The same pattern as the others, except with a generated command line.
Change-Id: Icfbe3566d8cb82e6878ab4097ed747b18fd5e28a
blueprint: hurrah-for-privsep
The same as the mellanox example, but for midonet.
Disturbingly midonet appears to not have unit tests either, but
again I feel that correcting that is outside the scope of the privsep
blueprint.
Change-Id: I672534691a94a0ac294410ea12dd4ba2c327c0e0
blueprint: hurrah-for-privsep
This code isn't well labelled, but I am pretty sure it is for
Mellanox Infifiband VIFs. Same pattern as the others.
As best as I can see these methods had no test coverage, but I think
that's outside the scope of the current privsep work to fix.
Change-Id: I323399643c9978a115fdc1213876da2d85dcd8db
blueprint: hurrah-for-privsep
There is no support today to backport a versioned notification to older
version. Therefore implementing the obj_make_compatible call on payload
objects are not necessary and actually dead code. This patch removes
the only obj_make_compatible call that was implemented on the payloads.
Change-Id: I27d85d67536867b05758456e9be11ef50eef6aaa
When confirming a resize, the libvirt driver on the source host checks
to see if the instance base directory (which contains the domain xml
files, etc) exists and if the root disk image does not, it removes the
instance base directory.
However, the root image disk won't exist on local storage for a
volume-backed instance and if the instance base directory is on shared
storage, e.g. NFS or Ceph, between the source and destination host, the
instance base directory is incorrectly deleted.
This adds a check to see if the instance is volume-backed when checking
to see if the instance base directory should be removed from the source
host when confirming a resize.
Change-Id: I29fac80d08baf64bf69e54cf673e55123174de2a
Closes-Bug: #1728603
This removes the custom retry-on-conflict code from claim_resources()
and makes it use the retres decorator for consistency.
Change-Id: I6426482a104976dfe14d590a5e23b31b301affb3
This adds a retries decorator to the scheduler report client
and modifies put_allocations() so that it will detect a concurrent
update, raising the Retry exception to trigger the decorator.
This should be usable by other methods in the client easily, but
this patch only modifies put_allocations() to fix the bug.
Change-Id: Ic32a54678dd413668f02e77d5e6c4195664ac24c
Closes-Bug: #1728722
If a gabbi file sets a default microversion by setting a header
'OpenStack-API-Version' with a value like 'placement latest' and then
later overrides that in an individual test with a header of
'openstack-api-version' the difference in case can lead to failure.
In the best case the failure is consistent.
In the worst case it can sometimes work, because the header shows up
twice in the request, and the last header wins, order in the headers
and the resulting list dependent on the vagrancies of python ordering.
The solution is to always use the same case, so this change updates
all use to be lowercase, to establish a precedent that future people
will be able to use as an example.
Note that gabbi is case sensitive here in part because of the
implementation but also because it provides the control and possibility
to test exactly this problem.
Change-Id: I1e89e231cf0d46d211d360cda091b33520f85027
Closes-Bug: #1728934