Commit Graph

59678 Commits

Author SHA1 Message Date
Zuul b8cc570455 Merge "Simulate bug 1969496" 2022-04-29 16:46:56 +00:00
Zuul 32ca7dec0d Merge "Remove unavailable but not reported PCI devices at startup" 2022-04-29 16:46:48 +00:00
Zuul 0d637e240c Merge "Isolate PCI tracker unit tests" 2022-04-29 16:34:01 +00:00
Zuul 3ac9948f98 Merge "VMware: StableMoRefProxy for moref recovery" 2022-04-29 16:33:54 +00:00
Zuul a58c32a56c Merge "Fix LM rollback w/o multi port bindings extension" 2022-04-29 13:17:21 +00:00
Zuul 17755cc438 Merge "Store pf_mac_address and vf_num in extra_info" 2022-04-29 12:25:42 +00:00
Zuul 1b1e60f61f Merge "Reproduce live migration rollback w/o multi port bindings error" 2022-04-29 12:08:35 +00:00
Zuul e3a2104009 Merge "List auth plugin parameters for [keystone] section" 2022-04-29 12:08:27 +00:00
Zuul 833022a74b Merge "Add nova-emulation to the experimental queue too" 2022-04-29 11:55:48 +00:00
Fabian Wiesel 56055ede03 VMware: StableMoRefProxy for moref recovery
The vmwareapi driver uses Managed-Object references throughout the code
with the assumption that they are stable. It is however a database id,
which may change during the runtime of the compute node. e.g. If an
instance is unregistered and re-registerd in the vcenter, the moref will
change. By wrapping a moref in a proxy object, with an additional method
to resolve the openstack object to a moref, we can hide those changes
from a caller.

MoRef implementation with closure - should ease the transition to stable
mo-refs One simply has to pass the search function as a closure to the
MoRef instance, and the very same method will be called when an
exception is raised for the stored reference.

Stable Volume refs - The connection_info['data'] contains the
managed-object reference (moref) as well as the uuid of the volume.
When the moref become invalid for some reason, we can recover it by
searching for the volume-uuid as the `config.instanceUuid` attribute
of the shadow-vm.

Stable VM Ref - By encapsulating all the parameters for searching for
the vm-ref again, we can move the retry logic to the session object,
where we can try to recover the vm-ref should it result in a
ManagedObjectNotFound exception.

Use refs as index for fakedb -  It was previously using the object-id
to lookup an object, meaning that you couldn't pass a newly created
Managed-object-reference like you could over the vmware-api. Now the
lookup happens over the ref-id string, and in turn some functions
were refactored to take that into account.

Partial-Bug: #1962771

Change-Id: I2a3ddf95b7fe07630855b06e732f8764efb13e91
2022-04-29 08:14:39 +00:00
Zuul c625698823 Merge "db: Resolve additional SAWarning warnings" 2022-04-29 05:25:46 +00:00
Elod Illes 494e8d7db6 [CI] Install dependencies for docs target
When tox 'docs' target is called, first it installs the dependencies
(listed in 'deps') in 'installdeps' phase, then it installs nova (with
its requirements) in 'develop-inst' phase. In the latter case 'deps' is
not used so that the constraints defined in 'deps' are not used.
This could lead to failures on stable branches when new packages are
released that break the build. To avoid this, the simplest solution is
to pre-install requirements, i.e. add requirements.txt to 'docs' tox
target.

Change-Id: I4471d4488d336d5af0c23028724c4ce79d6a2031
2022-04-28 17:17:47 +02:00
Balazs Gibizer 9ee5d2c662 Simulate bug 1969496
As If9ab424cc7375a1f0d41b03f01c4a823216b3eb8 stated there is a way for
the pci_device table to become inconsistent. Parent PF can be in
'available' state while children VFs are still in 'unavailable' state.
In this situation the PF is schedulable but the PCI claim will fail to
when try to mark the dependent VFs unavailable.

This patch adds a test case that shows the error.

Related-Bug: #1969496

Change-Id: I7b432d7a32aeb1ab765d1f731691c7841a8f1440
2022-04-28 16:01:53 +02:00
Balazs Gibizer 284ea72e96 Remove unavailable but not reported PCI devices at startup
We saw in the field that the pci_devices table can end up in
inconsistent state after a compute node HW failure and re-deployment.
There could be dependent devices where the parent PF is in available
state while the children VFs are in unavailable state. (Before the HW
fault the PF was allocated hence the VFs was marked unavailable).

In this state this PF is still schedulable but during the
PCI claim the handling of dependent devices in the PCI tracker fill fail
with the error: "Attempt to consume PCI device XXX from empty pool".

The reason of the failure is that when the PF is claimed, all the
children VFs are marked unavailable. But if the VF is already
unavailable such step fails.

One way the deployer might try to recover from this state is to remove
the VFs from the hypervisor and restart the compute agent. The compute
startup already has a logic to delete PCI devices that are unused and
not reported by the hypervisor. However this logic only removed devices
in 'available' state and ignored devices in 'unavailable' state.

If a device is unused and the hypervisor is not reporting the device any
more then it is safe to delete that device from the PCI tracker. So this
patch extends the logic to allow deleting 'unavailable' devices. There
is a small window when dependent PCI device is in 'unclaimable' state.
From cleanup perspective this is an analogous state. So it is also
added to the cleanup logic.

Related-Bug: #1969496
Change-Id: If9ab424cc7375a1f0d41b03f01c4a823216b3eb8
2022-04-28 16:01:38 +02:00
Balazs Gibizer c58376db75 Isolate PCI tracker unit tests
During the testing If9ab424cc7375a1f0d41b03f01c4a823216b3eb8 we noticed
that the unit test cases of PciTracker._set_hvdev are changing and
leaking global state leading to unstable tests.

To reproduce on master, duplicate the
test_set_hvdev_remove_tree_maintained_with_allocations test case and run
PciDevTrackerTestCase serially. The duplicated test case will fail with

  File "/nova/nova/objects/pci_device.py", line 238, in _from_db_object
  setattr(pci_device, key, db_dev[key])
  KeyError: 'id'

This is caused by the fact that the test data is defined on module
level, both _create_tracker and _set_hvdevs modifies the devices
passed to them, and some test mixes passing db dicts to _set_hvdevs
that expects pci dicts from the hypervisor.

This patch fixes multiple related issues:
* always deepcopy what _create_tracker takes as that list is later
  returned to the PciTracker via a mock and the tracker might modify
  what it got

* ensure that _create_tracker takes db dicts (with id field) while
  _set_hvdevs takes pci dicts in the hypervisor format (without id
  field)

* always deepcopy what is passed to _set_hvdevs as the PciTracker modify
  what it gets.

* normalize when the deepcopy happens to give a safe patter for future
  test cases

Change-Id: I20fb4ea96d5dfabfc4be3b5ecec0e4e6c5b3a318
2022-04-28 16:01:28 +02:00
Artom Lifshitz aa1b0a7ccb Fix LM rollback w/o multi port bindings extension
Previously, the libvirt driver's live migration rollback code would
unconditionally refer to migrate_data.vifs. This field would only be
set if the Neutron multiple port bindings extension was in use. When
it is not in use, the reference would fail with a NotImplementedError.
This patch wraps the migrate_data.vifs reference in a conditional that
checks if the vifs field is actually set. This is the only way to do
it, as in the libvirt driver we do not have access to the network
API's has_port_binding_extension() helper.

Closes-bug: 1969980
Change-Id: I48ca6a77de38e3afaa44630e6ae1fd41d2031ba9
2022-04-27 14:53:05 -04:00
Artom Lifshitz 5181bae923 Reproduce live migration rollback w/o multi port bindings error
When the libvirt driver does live migration rollback of an instance
with network interfaces, it unconditionally refers to
migrate_data.vifs. These will only be set when Neutron has the
multiple port bindings extension. We don't handle the case of the
extension not being present, and currently the rollback will fail with
a "NotImplementedError: Cannot load 'vifs' in the base class" error.

Related-bug: 1969980
Change-Id: Ieef773453ed9f3ced564c1a352fbefbcc6a653ec
2022-04-27 14:52:52 -04:00
Ghanshyam Mann 5f5551448d Move centos stream testing to centos-9-stream
In Zed cycle testing runtime, we are targetting the centos 9 stream
- https://governance.openstack.org/tc/reference/runtimes/zed.html

With dropping the python 3.6 support, project started adding python 3.8
as minimum, example nova:
- https://github.com/openstack/nova/blob/56b5aed08c6a3ed81b78dc216f0165ebfe3c3350/setup.cfg#L13

with that, centos 8 stream job is failing 100%
- https://zuul.openstack.org/build/970d029dc96742c3aa0f6932a35e97cf
- https://zuul.openstack.org/builds?job_name=tempest-integrated-compute-centos-8-stream&skip=0

QA is dropping the centos-8-stream testing and support
- https://review.opendev.org/q/topic:drop-c8s-testing

This commit replace tempest-integrated-compute-centos-8-stream job to
tempest-integrated-compute-centos-9-stream.

Depends-On: https://review.opendev.org/c/openstack/tempest/+/839274
Change-Id: I516b9d732ccea6e12904a1612530bce273d06587
2022-04-25 18:11:12 -05:00
Zuul 56b5aed08c Merge "Deprecate [api] use_forwarded_for" 2022-04-25 19:39:53 +00:00
Takashi Kajinami cf906cdcc2 Deprecate [api] use_forwarded_for
... because functionality of this parameter is effectively duplicate of
the HTTPProxyToWSGI middleware in oslo.middleware library.

Closes-Bug: #1967686
Change-Id: Ifebcfb6b5c1594c075bb9c152a06aa7af7c61bc8
2022-04-23 16:15:15 +00:00
Fabian Wiesel 03fd208c56 VMware: Split out VMwareAPISession
The VMwareAPISession object is not only used by the driver, but in
practically all modules of vmwareapi. It reduces a bit the scope of
the driver module itself.

Partial-Bug: #1962771

Change-Id: I4094b6031872bd3b5c871b9a82c7e01280a3352d
2022-04-23 12:54:56 +00:00
Zuul 1ff89a09d4 Merge "db: Close connection on early return" 2022-04-23 02:50:13 +00:00
Zuul 6ccbf6ac33 Merge "db: Don't rely on autocommit behavior" 2022-04-22 17:29:12 +00:00
Zuul 2fce46443e Merge "db: Replace use of Column.copy() method" 2022-04-22 17:29:04 +00:00
Zuul 246c7b2a7d Merge "db: Remove inplicit coercion of SELECTs" 2022-04-22 17:23:15 +00:00
Zuul f33f5cdad1 Merge "VMware: Early fail spawn if memory is not multiple of 4." 2022-04-22 16:29:54 +00:00
Stephen Finucane 78e3a6e610 db: Close connection on early return
During review of change Ic43c21038ee682f9733fbde42c6d24f8088815fc, we
noticed that we were leaking connections if we had an early return from
'_archive_deleted_rows_for_table'. Correct this.

Change-Id: I748d962b6c7012e9bc2b8c91519da99d2d4bd240
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
2022-04-22 10:24:34 +01:00
Zuul 0873b7c417 Merge "enable locking test fixture" 2022-04-22 08:58:04 +00:00
Sean Mooney 8dafea25e3 enable locking test fixture
os-brick 5.1 and later now uses file system locks
by default which were introduced by
I6f7f7d19540361204d4ae3ead2bd6dcddb8fcd68

As such we should be enabling the locking fixture
in the compute manager test class and block device
unit tests.

Change-Id: I184ed3ad3d578780524fbaa3a0392607d1a50cdc
Related-Bug: #1947370
2022-04-22 07:36:01 +00:00
Ghanshyam Mann 7c8b800867 Update python testing as per zed cycle teting runtime
In Zed cycle, we have dropped the python 3.6/3.7[1] testing
and its support. Removing the py36 centos8 job as well as
updating the python classifier also to reflect the same.

[1] https://governance.openstack.org/tc/reference/runtimes/zed.html

Change-Id: Iba5074ea6f981a7527e86cfc98edd1ed7dd3086f
2022-04-21 22:06:22 +00:00
Zuul dc30a7e6d1 Merge "Sync rootwrap.conf from oslo.rootwrap" 2022-04-21 11:34:04 +00:00
Zuul 840c48fea7 Merge "Follow up for nova-manage image property commands" 2022-04-21 09:43:55 +00:00
Zuul da4addc2e6 Merge "db: Replace use of Connection.connect() method" 2022-04-20 16:21:16 +00:00
Zuul 4a38993282 Merge "db: Remove use of empty 'and_()'" 2022-04-20 16:21:08 +00:00
Zuul 83f88cc6a6 Merge "db: Replace use of strings in join, defer operations" 2022-04-20 16:21:00 +00:00
Zuul d0d4d67bc5 Merge "db: Trivial rewrapping of warning filters" 2022-04-20 10:55:41 +00:00
Zuul 94e12b6917 Merge "db: Narrow down deprecation warning filter" 2022-04-20 10:55:33 +00:00
Kiran Pawar 08e8bdf271 VMware: Early fail spawn if memory is not multiple of 4.
If instance memory is not multiple of 4, creating instance on ESXi
fails with error "['GenericVmConfigFault'] VimFaultException: Memory
(RAM) size is invalid.". However this is after instance is built and
tried to launch on ESXi. Add check in prepare_for_spawn to trigger
failure early and avoid further steps i.e. build as well as launch.

Closes-Bug: #1966987
Change-Id: I7ed8ac986283cd455e54e3f18ab955f43b3248d0
2022-04-19 15:47:35 +00:00
Dmitrii Shcherbakov 2234b179b5 Store pf_mac_address and vf_num in extra_info
Remote-managed port support relies on placing additional data into
the "binding:profile" attribute of a port: a mac address of a PF
associated with a VF and the VF number. This data is currently
retrieved via sysfs at the time when it needs to be placed into
binding:profile initially or when it needs to be updated during
migration processes.

To avoid having extra sysfs dependencies in the manager and neutron
modules, those attributes are now stored in the extra-info of a given
VF's PciDevice and persisted in the DB.

The PF mac is stored for each VF since PFs are not guaranteed to be
present in the whitelist and so may not be present in the database in
the first place. A by-product of that is that it is easier to access
this data by just looking at a given VF's extra-info dict.

Change-Id: I2ed738f87fed952f60849cc22bde7291ec52d286
2022-04-14 15:44:40 +03:00
Zuul a1f006d799 Merge "refactor: remove duplicated logic" 2022-04-14 12:10:37 +00:00
Zuul 83e66870cd Merge "Fix the PCI device capability dict creation" 2022-04-14 12:10:30 +00:00
Stephen Finucane 8142b9dc47 db: Resolve additional SAWarning warnings
Resolving the following SAWarning warnings:

  Coercing Subquery object into a select() for use in IN(); please pass
  a select() construct explicitly

  SELECT statement has a cartesian product between FROM element(s)
  "foo" and FROM element "bar". Apply join condition(s) between each
  element to resolve.

While the first of these was a trivial fix, the second one is a little
more involved. It was caused by attempting to build a query across
tables that had no relationship as part of our archive logic. For
example, consider the following queries, generated early in
'_get_fk_stmts':

  SELECT instances.uuid
  FROM instances, security_group_instance_association
  WHERE security_group_instance_association.instance_uuid = instances.uuid
    AND instances.id IN (__[POSTCOMPILE_id_1])

  SELECT security_groups.id
  FROM security_groups, security_group_instance_association, instances
  WHERE security_group_instance_association.security_group_id = security_groups.id
    AND instances.id IN (__[POSTCOMPILE_id_1])

While the first of these is fine, the second is clearly wrong: why are
we filtering on a field that is of no relevance to our join? These were
generated because we were attempting to archive one or more instances
(in this case, the instance with id=1) and needed to find related tables
to archive at the same time. A related table is any table that
references our "source" table - 'instances' here - by way of a foreign
key. For each of *these* tables, we then lookup each foreign key and
join back to the source table, filtering by matching entries in the
source table. The issue here is that we're looking up every foreign key.
What we actually want to do is lookup only the foreign keys that point
back to our source table. This flaw is why we were generating the second
SELECT above: the 'security_group_instance_association' has two foreign
keys, one pointing to our 'instances' table but also another pointing to
the 'security_groups' table. We want the first but not the second.

Resolve this by checking if the table that each foreign key points to is
actually the source table and simply skip if not. With this issue
resolved, we can enable errors on SAWarning warnings in general without
any filters.

Change-Id: I63208c7bd5f9f4c3d5e4a40bd0f6253d0f042a37
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
2022-04-08 15:57:10 +01:00
Stephen Finucane 612b83ee5d db: Don't rely on autocommit behavior
Resolve the following RemovedIn20Warning warning:

  The current statement is being autocommitted using implicit
  autocommit, which will be removed in SQLAlchemy 2.0. Use the .begin()
  method of Engine or Connection in order to use an explicit transaction
  for DML and DDL statements.

I genuinely expected this one to be more difficult to resolve, but we
weren't using this as much as expected (thank you, non-legacy
enginefacade).

With this change, we appear to be SQLAlchemy 2.0 ready.

Change-Id: Ic43c21038ee682f9733fbde42c6d24f8088815fc
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
2022-04-08 15:57:06 +01:00
Stephen Finucane b02166c91f db: Replace use of Column.copy() method
Resolve the following RemovedIn20Warning warning:

  The Column.copy() method is deprecated and will be removed in a future
  release.

The recommended solution here (by zzzeek himself) is to use the private
method. This method isn't perfect (hence why the public version was
deprecated) but it's more than okay for what we want. The alternative is
to effectively vendor a variant of the 'Column.copy()' code, which means
we'll lose out on any future bug fixes.

Change-Id: Ia663251dfa7cf8f7d33f19902a92bcc586ae9f43
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
2022-04-08 15:57:03 +01:00
Stephen Finucane 287ef8d689 db: Remove inplicit coercion of SELECTs
Resolve the following RemovedIn20Warning warning:

  Implicit coercion of SELECT and textual SELECT constructs into FROM
  clauses is deprecated; please call .subquery() on any Core select or
  ORM Query object in order to produce a subquery object.

This one was easy.

Change-Id: Ifeab2aa8cef7ad151d5d5f92937e90ab34b96e8a
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
2022-04-08 15:56:59 +01:00
Stephen Finucane 440fa6ab00 db: Replace use of Connection.connect() method
Resolve the following RemovedIn20Warning warning:

  The Connection.connect() method is considered legacy as of the 1.x
  series of SQLAlchemy and will be removed in 2.0.

Once again, we actually just need to remove the warning filter since
this is already fixed elsewhere.

Change-Id: Id395ef0778b9a4e956ef9564e301a8b855ca7f5d
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
2022-04-08 15:56:56 +01:00
Stephen Finucane 982e15980c db: Remove use of empty 'and_()'
Remove the following RemovedIn20Warning warning:

  Invoking and_() without arguments is deprecated, and will be
  disallowed in a future release. For an empty and_() construct, use
  and_(True, *args)

I say resolve, but we apparently already did this and I just need to
remove the warning filter. You won't see me complaining...

Change-Id: I46218c1366af383d27fe500232a6815923441c46
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
2022-04-08 15:56:50 +01:00
Stephen Finucane 0939b3c4d1 db: Replace use of strings in join, defer operations
Resolve the following RemovedIn20Warning warnings:

  Using strings to indicate column or relationship paths in loader
  options is deprecated and will be removed in SQLAlchemy 2.0. Please
  use the class-bound attribute directly.

  Using strings to indicate relationship names in Query.join() is
  deprecated and will be removed in SQLAlchemy 2.0. Please use the
  class-bound attribute directly.

This is rather tricky to resolve. In most cases, we can simply make use
of getattr to fetch the class-bound attribute, however, there are a
number of places were we were doing "nested" joins, e.g.
'instances.info_cache' on the 'SecurityGroup' model. These need a little
more thought.

Change-Id: I1355ac92202cb504a7814afaa1338a4a511f9b54
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
2022-04-08 15:56:47 +01:00
Stephen Finucane 523297bdfa db: Trivial rewrapping of warning filters
This was annoying me. I don't "fix" the warnings that I'm going to
remove in a future change.

Change-Id: Ia1da21577d859885838de10110dd473f72af285d
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
2022-04-08 15:56:45 +01:00
Stephen Finucane f7a1be8ddd db: Narrow down deprecation warning filter
One of our 'SADeprecationWarning' warning filters is a bit of an odd
duck: unlike all the other filters, this one is applied to all modules
and not just nova. We can't fix issues caused by code that isn't nova
(at least, not in the nova tree) so this is a silly approach. Remove it.

Change-Id: I803d31117d0536df2e436a2f64144e4029c9073c
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
2022-04-08 15:56:43 +01:00