Inheritance of image properties from the image an instance was booted
from to an image created from that instance is governed by the
non_inheritable_image_properties configuration option. However, there
are some image properties (for example, those used for image signature
validation or to reference a cinder encryption key id) which it makes
no sense to inherit under any circumstances. Additionally,
misconfiguration of the non-inheritable properties can lead to data
loss under the circumstances described in Bug #1852106. So it would
be better if these properties were not subject to configuration.
The initial set of absolutely non-inheritable image properties
consists of those associated with cinder encryption keys and image
signature validation.
Change-Id: I4332b9c343b6c2b50226baa8f78396c2012dabd1
Closes-bug: #1852106
It was discovered that default= on a Column definition in a schema migration
will attempt to update the table with the provided value, instead of just
translating on read, which is often the assumption. The Instance.hidden=False
change introduced in Train[1] used such a default on the new column, which caused
at least one real-world deployment to time out rewriting the instances table
due to size. Apparently SQLAlchemy-migrate also does not consider such a timeout
to be a failure and proceeds on. The end result is that some existing instances
in the database have hidden=NULL values, and the DB model layer does not convert
those to hidden=False when we read/query them, causing those instances to be
excluded from the API list view.
This change alters the 399 schema migration to remove the default=False
specification. This does not actually change the schema, but /will/ prevent
users who have not yet upgraded to Train from rewriting the table.
This change also makes the instance_get_all_by_filters() code handle hidden
specially, including false and NULL in a query for non-hidden instances.
A future change should add a developer trap test to ensure that future migrations
do not add default= values to new columns to avoid this situation in the future.
[1] Iaffb27bd8c562ba120047c04bb62619c0864f594
Change-Id: Iace3f653b42c20887b40ee0105c8e9a4edeff1f7
Closes-Bug: #1862205
This method only checks if a specific path is shared between two hosts
and has been renamed accordingly to avoid confusion.
Additionally the shared_storage variable used to store the returned
value from this method within migrate_disk_and_power_off is renamed to
shared_instance_path.
Change-Id: I426de20864321d664d3fe0e08a14e1af509c8a2b
Floating IPs don't have to have an associated port, so there's no reason
to error out when this is the case.
Change-Id: I50c79843bf819b731c597dbfe72090cdf02c7641
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
Closes-bug: #1861876
It adds a bit more verbosity to cell update command, so that it
is more obvious for an operator which values are being used as
transport URL or database connection URL for a cell.
Change-Id: Ie567ae7da4508a4b6f797d4bc77347c84702a74e
In the worst case scenario, we could list N floating IPs, each of which
has a different network. This would result in N additional calls to
neutron - one for each of the networks. Avoid this by calling neutron
once for all networks associated with the floating IPs.
Change-Id: If067a730b0fcbe3f59f4472b00c690cc43be4b3b
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
Previous patches in the blueprint implemented the support for live
migraton with qos ports and added functional test coverage for the
various live migration scenarios. So this patch removes the API check
that rejected such operation and document the new feature.
Change-Id: Ib9ef18fff28c463c9ffe3607d93428b689dc89fb
blueprint: support-move-ops-with-qos-ports-ussuri
I1222fc21bde4158df1db70370c7f3bd319ec081f added a common helper for
server creation. This patch updated the existing qos tests to use that
helper.
Change-Id: I017163c6cdf8727be9913a6870cd91fec5f4d568
blueprint: support-move-ops-with-qos-ports-ussuri
During creating or moving of an instance with qos SRIOV port the PCI
device claim on the destination compute needs to be restricted to select
PCI VFs from the same PF where the bandwidth for the qos port is
allocated from. This is achieved by updating the spec part of the
InstancePCIRequest with the device name of the PF by calling
update_pci_request_spec_with_allocated_interface_name(). Until now
such update of the instance object was directly persisted by the call.
During code review it was came up that the instance.save() in the util
is not appropriate as the caller has a lot more context to decide when
to persist the changes.
The original eager instance.save was introduced when support added to
the server create flow. Now I realized that the need for such save was
due to a mistake in the original ResourceTracker.instance_claim() call
that loads the InstancePCIRequest from the DB instead of using the
requests through the passed in instance object. By removing the extra DB
call the need for eagerly persisting the PCI spec update is also
removed. It turned out that both the server create code path and every
server move code paths eventually persist the instance object either
during at the end of the claim process or in case of live migration in
the post_live_migration_at_destination compute manager call. This means
that the code now can be simplified. Especially the live migration cases.
In the live migrate abort case we don't need to roll back the eagerly
persisted PCI change as now such change is only persisted at the end
of the migration but still we need to refresh pci_requests field of
the instance object during the rollback as that field might be stale,
containing dest host related PCI information.
Also in case of rescheduling during live migrate if the rescheduling
failed the PCI change needed to be rolled back to the source host by a
specific code. But now those change are never persisted until the
migration finishes so this rollback code can be removed too.
Change-Id: Ied8f96b4e67f79498519931cb6b35dad5288bbb8
blueprint: support-move-ops-with-qos-ports-ussuri
This uses the COMPUTE_SAME_HOST_COLD_MIGRATE trait in the API during a
cold migration to filter out hosts that cannot support same-host cold
migration, which is all of them except for the hosts using the vCenter
driver.
For any nodes that do not report the trait, we won't know if they don't
because they don't support it or if they are not new enough to report
it, so the API has a service version check and will fallback to old
behavior using the config if the node is old. That compat code can be
removed in the next release.
As a result of this the FakeDriver capabilities are updated so the
FakeDriver no longer supports same-host cold migration and a new fake
driver is added to support that scenario for any tests that need it.
Change-Id: I7a4b951f3ab324c666ab924e6003d24cc8e539f5
Closes-Bug: #1748697
Related-Bug: #1811235
Metadata service uses the provider id to identify the requesting
instance.
However, when provider query doesn't find any networks, the request
should fail.
The same goes to the case where multiple ports are found.
Closes-Bug: #1841933
Change-Id: I8ce3703ca86a3a0769edd42a790d82796d1071d7
tl;dr: We're adding the default VirtIO-RNG device to ensure guests are
not starved of entropy (and thus not hang) during boot time.
Background
----------
From Nova Git history, commit b94550f419 ("libvirt: configuration
element for a random number generator device") _did_ add a default RNG
device (but with its entropy source to the undesirable '/dev/random').
However, the default RNG device was immediately removed in another
commit (605677c -- "libvirt: remove explicit /dev/random rng default"),
with this rationale:
libvirt (or rather qemu) will default to /dev/random if no rng device
path is specified [...]
It's preferable for us to not duplicate this default to allow for a
future where libvirt or the hypervisor needs to make more intelligent
decisions about the default device to use.
The above reasoning doesn't hold up, because:
(a) libvirt does not make "policy" decisions, such as choosing an
entropy source (or any other such). Therefore Nova, as a management
application, should make the decision here.
(b) More importantly, when QEMU exposes a VirtIO-RNG device to the
guest, that device needs a source of entropy; and QEMU by default
uses the legacy and problematic `/dev/random` as the source —
instead of the preferred `/dev/urandom`. So QEMU's default for
VirtIO-RNG devices is not sufficient, and Nova should not rely on
it. (Discussion[+] on 'qemu-devel' list to consider changing QEMU's
default.)
* * *
In this patch:
- Make Nova configure a VirtIO-RNG device by default for guests.
(Which will be using `/dev/urandom` as the default entropy source.)
This will also work for Windows guests, when using VirtIO-Win
drivers[*] on the Linux host.
- The 'hw_rng_model' image metadata property is now rendered
(temporarily) useless -- as it's not used anywhere outside the
_add_rng_device() method. But we don't want to deprecate it yet, as
we may extend it (see code comment for details); docucment that.
[*] https://docs.pagure.org/docs-fedora/create-windows-vms-using-virtio.html
[+] https://lists.nongnu.org/archive/html/qemu-devel/2018-09/msg02724.html
-- "[RFC] Virtio RNG: Consider changing the default entropy source to
/dev/urandom?"
Closes-Bug: #1789868
Change-Id: I28e66c9640c38d23b8c0dbd0b05f5260bfcf6d30
Signed-off-by: Kashyap Chamarthy <kchamart@redhat.com>
As highlighted in I77b1cfeab3c00c6c3d7743ba51e12414806b71d2, filtering
either floating IPs or floating IP pools by floating IP name will
actually fallback to filtering by ID. Update the API ref to reflect
this.
Change-Id: I00443ae111cbd1e1ec4d2c2ae1828ddaa095fd1a
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
The failed case covers the situation when the IntancePCIRequest cannot be
updated with the PF device names on the target host due to incorrect
naming of the device RPs in placement.
The abort case covers when the API user cancels the running live
migration.
Change-Id: I1222fc21bde4158df1db70370c7f3bd319ec081f
blueprint: support-move-ops-with-qos-ports-ussuri
In change I475ea0fa5f2d5b197118f0ced5a0ff6907411972, we changed how we
generated flavor names and IDs to stop basing them on existing flavors.
However, the call to 'randint(10000)' that we used instead has proven
problematic since there is a high chance of collisions that will only
increase as the number of tests increase. We could switch back to the
previous scheme but that's unnecessary since there's actually no reason
we need to set 'Flavor.id' in the first place...so don't.
A now-unused remnant of the "old way" is also removed, since it was
spotted while fixing this.
Change-Id: Iab6245bc5ed8f95dae9c384b96e6bef0add7dca6
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
Closes-bug: #1860417
This was missed in change Ie01ab1c3a1219f1d123f0ecedc66a00dfb2eb2c1.
Change-Id: I7c56d5ecc1131c20324f41c83dc97f41e4628d4d
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
This is a *very* common pattern. Centralize it. Only a few users are
added for now, but we can migrate more later (it's rather tedious work).
Change-Id: I84c58de90dad6d86271767363aef90ddac0f1730
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>