The vmwareapi driver uses Managed-Object references throughout the code
with the assumption that they are stable. It is however a database id,
which may change during the runtime of the compute node. e.g. If an
instance is unregistered and re-registerd in the vcenter, the moref will
change. By wrapping a moref in a proxy object, with an additional method
to resolve the openstack object to a moref, we can hide those changes
from a caller.
MoRef implementation with closure - should ease the transition to stable
mo-refs One simply has to pass the search function as a closure to the
MoRef instance, and the very same method will be called when an
exception is raised for the stored reference.
Stable Volume refs - The connection_info['data'] contains the
managed-object reference (moref) as well as the uuid of the volume.
When the moref become invalid for some reason, we can recover it by
searching for the volume-uuid as the `config.instanceUuid` attribute
of the shadow-vm.
Stable VM Ref - By encapsulating all the parameters for searching for
the vm-ref again, we can move the retry logic to the session object,
where we can try to recover the vm-ref should it result in a
ManagedObjectNotFound exception.
Use refs as index for fakedb - It was previously using the object-id
to lookup an object, meaning that you couldn't pass a newly created
Managed-object-reference like you could over the vmware-api. Now the
lookup happens over the ref-id string, and in turn some functions
were refactored to take that into account.
Partial-Bug: #1962771
Change-Id: I2a3ddf95b7fe07630855b06e732f8764efb13e91
When tox 'docs' target is called, first it installs the dependencies
(listed in 'deps') in 'installdeps' phase, then it installs nova (with
its requirements) in 'develop-inst' phase. In the latter case 'deps' is
not used so that the constraints defined in 'deps' are not used.
This could lead to failures on stable branches when new packages are
released that break the build. To avoid this, the simplest solution is
to pre-install requirements, i.e. add requirements.txt to 'docs' tox
target.
Change-Id: I4471d4488d336d5af0c23028724c4ce79d6a2031
As If9ab424cc7375a1f0d41b03f01c4a823216b3eb8 stated there is a way for
the pci_device table to become inconsistent. Parent PF can be in
'available' state while children VFs are still in 'unavailable' state.
In this situation the PF is schedulable but the PCI claim will fail to
when try to mark the dependent VFs unavailable.
This patch adds a test case that shows the error.
Related-Bug: #1969496
Change-Id: I7b432d7a32aeb1ab765d1f731691c7841a8f1440
We saw in the field that the pci_devices table can end up in
inconsistent state after a compute node HW failure and re-deployment.
There could be dependent devices where the parent PF is in available
state while the children VFs are in unavailable state. (Before the HW
fault the PF was allocated hence the VFs was marked unavailable).
In this state this PF is still schedulable but during the
PCI claim the handling of dependent devices in the PCI tracker fill fail
with the error: "Attempt to consume PCI device XXX from empty pool".
The reason of the failure is that when the PF is claimed, all the
children VFs are marked unavailable. But if the VF is already
unavailable such step fails.
One way the deployer might try to recover from this state is to remove
the VFs from the hypervisor and restart the compute agent. The compute
startup already has a logic to delete PCI devices that are unused and
not reported by the hypervisor. However this logic only removed devices
in 'available' state and ignored devices in 'unavailable' state.
If a device is unused and the hypervisor is not reporting the device any
more then it is safe to delete that device from the PCI tracker. So this
patch extends the logic to allow deleting 'unavailable' devices. There
is a small window when dependent PCI device is in 'unclaimable' state.
From cleanup perspective this is an analogous state. So it is also
added to the cleanup logic.
Related-Bug: #1969496
Change-Id: If9ab424cc7375a1f0d41b03f01c4a823216b3eb8
During the testing If9ab424cc7375a1f0d41b03f01c4a823216b3eb8 we noticed
that the unit test cases of PciTracker._set_hvdev are changing and
leaking global state leading to unstable tests.
To reproduce on master, duplicate the
test_set_hvdev_remove_tree_maintained_with_allocations test case and run
PciDevTrackerTestCase serially. The duplicated test case will fail with
File "/nova/nova/objects/pci_device.py", line 238, in _from_db_object
setattr(pci_device, key, db_dev[key])
KeyError: 'id'
This is caused by the fact that the test data is defined on module
level, both _create_tracker and _set_hvdevs modifies the devices
passed to them, and some test mixes passing db dicts to _set_hvdevs
that expects pci dicts from the hypervisor.
This patch fixes multiple related issues:
* always deepcopy what _create_tracker takes as that list is later
returned to the PciTracker via a mock and the tracker might modify
what it got
* ensure that _create_tracker takes db dicts (with id field) while
_set_hvdevs takes pci dicts in the hypervisor format (without id
field)
* always deepcopy what is passed to _set_hvdevs as the PciTracker modify
what it gets.
* normalize when the deepcopy happens to give a safe patter for future
test cases
Change-Id: I20fb4ea96d5dfabfc4be3b5ecec0e4e6c5b3a318
Previously, the libvirt driver's live migration rollback code would
unconditionally refer to migrate_data.vifs. This field would only be
set if the Neutron multiple port bindings extension was in use. When
it is not in use, the reference would fail with a NotImplementedError.
This patch wraps the migrate_data.vifs reference in a conditional that
checks if the vifs field is actually set. This is the only way to do
it, as in the libvirt driver we do not have access to the network
API's has_port_binding_extension() helper.
Closes-bug: 1969980
Change-Id: I48ca6a77de38e3afaa44630e6ae1fd41d2031ba9
When the libvirt driver does live migration rollback of an instance
with network interfaces, it unconditionally refers to
migrate_data.vifs. These will only be set when Neutron has the
multiple port bindings extension. We don't handle the case of the
extension not being present, and currently the rollback will fail with
a "NotImplementedError: Cannot load 'vifs' in the base class" error.
Related-bug: 1969980
Change-Id: Ieef773453ed9f3ced564c1a352fbefbcc6a653ec
... because functionality of this parameter is effectively duplicate of
the HTTPProxyToWSGI middleware in oslo.middleware library.
Closes-Bug: #1967686
Change-Id: Ifebcfb6b5c1594c075bb9c152a06aa7af7c61bc8
The VMwareAPISession object is not only used by the driver, but in
practically all modules of vmwareapi. It reduces a bit the scope of
the driver module itself.
Partial-Bug: #1962771
Change-Id: I4094b6031872bd3b5c871b9a82c7e01280a3352d
During review of change Ic43c21038ee682f9733fbde42c6d24f8088815fc, we
noticed that we were leaking connections if we had an early return from
'_archive_deleted_rows_for_table'. Correct this.
Change-Id: I748d962b6c7012e9bc2b8c91519da99d2d4bd240
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
os-brick 5.1 and later now uses file system locks
by default which were introduced by
I6f7f7d19540361204d4ae3ead2bd6dcddb8fcd68
As such we should be enabling the locking fixture
in the compute manager test class and block device
unit tests.
Change-Id: I184ed3ad3d578780524fbaa3a0392607d1a50cdc
Related-Bug: #1947370
In Zed cycle, we have dropped the python 3.6/3.7[1] testing
and its support. Removing the py36 centos8 job as well as
updating the python classifier also to reflect the same.
[1] https://governance.openstack.org/tc/reference/runtimes/zed.html
Change-Id: Iba5074ea6f981a7527e86cfc98edd1ed7dd3086f
If instance memory is not multiple of 4, creating instance on ESXi
fails with error "['GenericVmConfigFault'] VimFaultException: Memory
(RAM) size is invalid.". However this is after instance is built and
tried to launch on ESXi. Add check in prepare_for_spawn to trigger
failure early and avoid further steps i.e. build as well as launch.
Closes-Bug: #1966987
Change-Id: I7ed8ac986283cd455e54e3f18ab955f43b3248d0
Remote-managed port support relies on placing additional data into
the "binding:profile" attribute of a port: a mac address of a PF
associated with a VF and the VF number. This data is currently
retrieved via sysfs at the time when it needs to be placed into
binding:profile initially or when it needs to be updated during
migration processes.
To avoid having extra sysfs dependencies in the manager and neutron
modules, those attributes are now stored in the extra-info of a given
VF's PciDevice and persisted in the DB.
The PF mac is stored for each VF since PFs are not guaranteed to be
present in the whitelist and so may not be present in the database in
the first place. A by-product of that is that it is easier to access
this data by just looking at a given VF's extra-info dict.
Change-Id: I2ed738f87fed952f60849cc22bde7291ec52d286
Resolving the following SAWarning warnings:
Coercing Subquery object into a select() for use in IN(); please pass
a select() construct explicitly
SELECT statement has a cartesian product between FROM element(s)
"foo" and FROM element "bar". Apply join condition(s) between each
element to resolve.
While the first of these was a trivial fix, the second one is a little
more involved. It was caused by attempting to build a query across
tables that had no relationship as part of our archive logic. For
example, consider the following queries, generated early in
'_get_fk_stmts':
SELECT instances.uuid
FROM instances, security_group_instance_association
WHERE security_group_instance_association.instance_uuid = instances.uuid
AND instances.id IN (__[POSTCOMPILE_id_1])
SELECT security_groups.id
FROM security_groups, security_group_instance_association, instances
WHERE security_group_instance_association.security_group_id = security_groups.id
AND instances.id IN (__[POSTCOMPILE_id_1])
While the first of these is fine, the second is clearly wrong: why are
we filtering on a field that is of no relevance to our join? These were
generated because we were attempting to archive one or more instances
(in this case, the instance with id=1) and needed to find related tables
to archive at the same time. A related table is any table that
references our "source" table - 'instances' here - by way of a foreign
key. For each of *these* tables, we then lookup each foreign key and
join back to the source table, filtering by matching entries in the
source table. The issue here is that we're looking up every foreign key.
What we actually want to do is lookup only the foreign keys that point
back to our source table. This flaw is why we were generating the second
SELECT above: the 'security_group_instance_association' has two foreign
keys, one pointing to our 'instances' table but also another pointing to
the 'security_groups' table. We want the first but not the second.
Resolve this by checking if the table that each foreign key points to is
actually the source table and simply skip if not. With this issue
resolved, we can enable errors on SAWarning warnings in general without
any filters.
Change-Id: I63208c7bd5f9f4c3d5e4a40bd0f6253d0f042a37
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
Resolve the following RemovedIn20Warning warning:
The current statement is being autocommitted using implicit
autocommit, which will be removed in SQLAlchemy 2.0. Use the .begin()
method of Engine or Connection in order to use an explicit transaction
for DML and DDL statements.
I genuinely expected this one to be more difficult to resolve, but we
weren't using this as much as expected (thank you, non-legacy
enginefacade).
With this change, we appear to be SQLAlchemy 2.0 ready.
Change-Id: Ic43c21038ee682f9733fbde42c6d24f8088815fc
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
Resolve the following RemovedIn20Warning warning:
The Column.copy() method is deprecated and will be removed in a future
release.
The recommended solution here (by zzzeek himself) is to use the private
method. This method isn't perfect (hence why the public version was
deprecated) but it's more than okay for what we want. The alternative is
to effectively vendor a variant of the 'Column.copy()' code, which means
we'll lose out on any future bug fixes.
Change-Id: Ia663251dfa7cf8f7d33f19902a92bcc586ae9f43
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
Resolve the following RemovedIn20Warning warning:
Implicit coercion of SELECT and textual SELECT constructs into FROM
clauses is deprecated; please call .subquery() on any Core select or
ORM Query object in order to produce a subquery object.
This one was easy.
Change-Id: Ifeab2aa8cef7ad151d5d5f92937e90ab34b96e8a
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
Resolve the following RemovedIn20Warning warning:
The Connection.connect() method is considered legacy as of the 1.x
series of SQLAlchemy and will be removed in 2.0.
Once again, we actually just need to remove the warning filter since
this is already fixed elsewhere.
Change-Id: Id395ef0778b9a4e956ef9564e301a8b855ca7f5d
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
Remove the following RemovedIn20Warning warning:
Invoking and_() without arguments is deprecated, and will be
disallowed in a future release. For an empty and_() construct, use
and_(True, *args)
I say resolve, but we apparently already did this and I just need to
remove the warning filter. You won't see me complaining...
Change-Id: I46218c1366af383d27fe500232a6815923441c46
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
Resolve the following RemovedIn20Warning warnings:
Using strings to indicate column or relationship paths in loader
options is deprecated and will be removed in SQLAlchemy 2.0. Please
use the class-bound attribute directly.
Using strings to indicate relationship names in Query.join() is
deprecated and will be removed in SQLAlchemy 2.0. Please use the
class-bound attribute directly.
This is rather tricky to resolve. In most cases, we can simply make use
of getattr to fetch the class-bound attribute, however, there are a
number of places were we were doing "nested" joins, e.g.
'instances.info_cache' on the 'SecurityGroup' model. These need a little
more thought.
Change-Id: I1355ac92202cb504a7814afaa1338a4a511f9b54
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
This was annoying me. I don't "fix" the warnings that I'm going to
remove in a future change.
Change-Id: Ia1da21577d859885838de10110dd473f72af285d
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
One of our 'SADeprecationWarning' warning filters is a bit of an odd
duck: unlike all the other filters, this one is applied to all modules
and not just nova. We can't fix issues caused by code that isn't nova
(at least, not in the nova tree) so this is a silly approach. Remove it.
Change-Id: I803d31117d0536df2e436a2f64144e4029c9073c
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>