Commit Graph

56696 Commits

Author SHA1 Message Date
Lee Yarwood b097959c1c nova-live-migration: Ensure subnode is fenced during evacuation testing
As stated in the forced-down API [1]:

> Setting a service forced down without completely fencing it will
> likely result in the corruption of VMs on that host.

Previously only the libvirtd service was stopped on the subnode prior to
calling this API, allowing n-cpu, q-agt and the underlying guest domains
to continue running on the host.

This change now ensures all devstack services are stopped on the subnode
and all active domains destroyed.

It is hoped that this will resolve bug #1813789 where evacuations have
timed out due to VIF plugging issues on the new destination host.

[1] https://docs.openstack.org/api-ref/compute/?expanded=update-forced-down-detail#update-forced-down

Related-Bug: #1813789
Change-Id: I8af2ad741ca08c3d88efb9aa817c4d1470491a23
2020-03-19 11:34:13 +00:00
Zuul d83bc6fed0 Merge "nova-live-migration: Wait for n-cpu services to come up after configuring Ceph" 2020-03-19 10:30:40 +00:00
Zuul a1aacd972f Merge "Refine and introduce correct parameters for test_get_guest_config_numa_host_instance_topo_cpu_pinning" 2020-03-19 01:05:17 +00:00
Zuul b65cb21ab8 Merge "Lowercase ironic driver hash ring and ignore case in cache" 2020-03-18 20:50:55 +00:00
Zuul c9f5b583b6 Merge "libvirt: Check the guest support UEFI" 2020-03-18 18:05:01 +00:00
Zuul dc83da79d6 Merge "db: Remove unused ec2 DB APIs" 2020-03-18 17:53:31 +00:00
Zuul 6c27f4e9cf Merge "VMware VMDK detach: get adapter type from instance VM" 2020-03-18 12:09:00 +00:00
Zuul 25e6b3259b Merge "bug-fix: Reject live migration with vpmem" 2020-03-18 11:55:14 +00:00
Zuul ca4226cb87 Merge "FUP: Remove noqa and tone down an exception" 2020-03-18 01:28:57 +00:00
Zuul 96f6622316 Merge "libvirt: Report storage bus traits" 2020-03-17 18:20:59 +00:00
Zuul e31a59179f Merge "images: Move qemu-img info calls into privsep" 2020-03-17 17:23:48 +00:00
Zuul 31aa4a6d7f Merge "Clarify fitting hugepages log message" 2020-03-17 14:56:21 +00:00
LuyaoZhong db93b704ce bug-fix: Reject live migration with vpmem
Reject live migration if there are virtual persistent memory resources.
Otherwise, if dest host has same vpmem backend file name as that used
by instance, live migration will succeed and these files will be used
but not tracked in Nova; if dest host has no those vpmems, it will
trigger an error.

Change-Id: I900f74d482fc87da5b1b5ec9db2ad5aefcfcfe7a
Closes-bug: #1863605
Implements: blueprint support-live-migration-with-virtual-persistent-memory
2020-03-17 07:18:13 +00:00
Zuul 0f81adfaa3 Merge "Ensures that COMPUTE_RESOURCE_SEMAPHORE usage is fair" 2020-03-17 02:25:39 +00:00
Zuul 7b254fb7a8 Merge "Use fair locks in resource tracker" 2020-03-17 02:25:34 +00:00
Wang Huaqiang 1c0479db09 Refine and introduce correct parameters for test_get_guest_config_numa_host_instance_topo_cpu_pinning
'sockets_per_cell', 'cores_per_socket' and 'threads_per_core'
are not the initializeing arguments of 'NUMATopology' object.

In test test_get_guest_config_numa_host_instance_topo_cpu_pinning
the code path does not care about the correctness of a host
numa toplogy due to being an instance with 'DEDICATED' cpu
policy, but We'd better refine it and create a correct numa
topology to avoid misunderstanding of code.

Change-Id: Ic1c9cf16c482939d0761d0cdab66c8eac07cad7b
2020-03-17 07:55:33 +08:00
Balazs Gibizer eefe3ec2ee Ensures that COMPUTE_RESOURCE_SEMAPHORE usage is fair
This patch poisons the synchronized decorator in the unit test
to prevent adding you synchronized methods without the fair=True
flag.

Change-Id: I739025dacbcaa0f7adbe612c064f979bf6390880
Related-Bug: #1864122
2020-03-16 20:10:17 +01:00
Zuul 73d1be3fe5 Merge "Fix intermittently failing regression case" 2020-03-16 17:23:29 +00:00
Zuul 5b13eb5954 Merge "Cleanup test for system reader and reader_or_owner rules" 2020-03-16 14:40:44 +00:00
Balazs Gibizer 1dfb72e048 Fix intermittently failing regression case
The test_unshelve_offloaded_fails_due_to_neutron could fail due to race
condition. The test case only waits for the first instance.save() call
at [1] but the allocation delete happens after it. This causes that the
test case can still see the allocation of the offloaded server in
placement.

The fix makes sure that the test waits for the second instance.save() by
checking for the host of the instance.

[1] https://github.com/openstack/nova/blob/stable/rocky/nova/compute/manager.py#L5274-L5288

Related-Bug #1862633

Change-Id: Ic1c3d35749fbdc7f5b6f6ec1e16b8fcf37c10de8
2020-03-16 15:30:32 +01:00
Lee Yarwood e23c3c2c8d nova-live-migration: Wait for n-cpu services to come up after configuring Ceph
Previously the ceph.sh script used during the nova-live-migration job
would only grep for a `compute` process when checking if the services
had been restarted. This check was bogus and would always return 0 as it
would always match itself. For example:

2020-03-13 21:06:47.682073 | primary | 2020-03-13 21:06:47.681 | root
29529  0.0  0.0   4500   736 pts/0    S+   21:06   0:00 /bin/sh -c ps
       aux | grep compute
2020-03-13 21:06:47.683964 | primary | 2020-03-13 21:06:47.683 | root
29531  0.0  0.0  14616   944 pts/0    S+   21:06   0:00 grep compute

Failures of this job were seen on the stable/pike branch where slower CI
nodes appeared to struggle to allow Libvirt to report to n-cpu in time
before Tempest was started. This in-turn caused instance build failures
and the overall failure of the job.

This change resolves this issue by switching to pgrep and ensuring
n-cpu services are reported as fully up after a cold restart before
starting the Tempest test run.

Closes-Bug: 1867380
Change-Id: Icd7ab2ca4ddbed92c7e883a63a23245920d961e7
2020-03-16 12:37:45 +00:00
Lee Yarwood 03d6eb500f images: Move qemu-img info calls into privsep
This is mostly code motion from the nova.virt.images module into privsep
to allow for both privileged and unprivileged calls to be made.

A privileged_qemu_img_info function is introduced allowing QEMU to
access devices requiring root privileges, such as host block devices.

Change-Id: I5ac03f923d9d181d22d44d8ec8fbc31eb0c3999e
2020-03-16 09:45:31 +00:00
Zuul e483ca1cd9 Merge "Catch exception when use invalid architecture of image" 2020-03-13 04:09:26 +00:00
Zuul 140adda60f Merge "[Trivial] Fix code comment of admin password tests" 2020-03-12 19:00:21 +00:00
Zuul 2c673f4375 Merge "Run sdk functional tests on nova changes" 2020-03-12 19:00:16 +00:00
Zuul 2f88e09609 Merge "Deprecate the vmwareapi driver" 2020-03-12 16:27:59 +00:00
Zuul e20e731630 Merge "nova-net: Remove unused nova-network objects" 2020-03-11 19:03:29 +00:00
Zuul fc159ac91b Merge "nova-net: Remove unnecessary exception handling, mocks" 2020-03-11 19:03:20 +00:00
Zuul 0dfb69eb37 Merge "Handle unset 'connection_info'" 2020-03-11 19:03:13 +00:00
Ghanshyam Mann 24b6fb1591 Cleanup test for system reader and reader_or_owner rules
While introducing PROJECT_READER_OR_SYSTEM_READER in policy
- https://review.opendev.org/#/c/706672
I added those rules in system_reader list in test_policy.

Let's maintain them in separate list for easy read. There
are going to be more policies in this list.

Partial implement blueprint policy-defaults-refresh

Change-Id: Ice3d79f0803efad60236e55b55bebd681056564c
2020-03-11 12:18:13 -05:00
Zuul 57459c3429 Merge "Don't overwrite greenthread-local context in host manager" 2020-03-11 00:28:21 +00:00
Monty Taylor 2092c3e714 Run sdk functional tests on nova changes
The SDK API tests provide another set of verification of nova
behavior. In a perfect world, as we add new microversions to
nova, we'd add support for the microversion first to SDK along
with a test that should work if the MV isn't there, and then
adding the microversion the test should still work - which would
allow us to make sure SDK is always up to date with the very
latest and greatest nova api.

Change-Id: I2406bd6d9e69e33e57b715ff0812c5770b1b53d8
2020-03-10 16:11:00 -05:00
Dan Smith d98d728285 Deprecate the vmwareapi driver
As of now, questions on the mailing list are going unanswered, and the
Nova team does not have a clear representative owner for the driver
to which bugs and other reports can be directed. There does not appear
to be a CI system running tests for the driver anymore, and the latest
indication from the community[1] points to it being potentially broken with
devstack.

This patch starts the deprecation timer for the driver and/or serves
as a flare to gauge interest (or lack thereof) in continuing to maintain
the driver.

1: http://lists.openstack.org/pipermail/openstack-discuss/2020-March/013066.html

Change-Id: Ie39e9605dc8cebff3795a29ea91dc08ee64a21eb
2020-03-10 08:09:28 -07:00
Zuul 7d30ad26ae Merge "libvirt: don't log error if guest gone during interface detach" 2020-03-09 18:44:56 +00:00
Zuul 4ef99ac453 Merge "nit: Fix NOTE error of fatal=False" 2020-03-09 18:17:53 +00:00
Zuul fa16a330f5 Merge "Validate id as integer for os-aggregates" 2020-03-09 18:17:44 +00:00
Jason Anderson 1ed9f9dac5 Use fair locks in resource tracker
When the resource tracker has to lock a compute host for updates or
inspection, it uses a single semaphore. In most cases, this is fine, as
a compute process only is tracking one hypervisor. However, in Ironic, it's
possible for one compute process to track many hypervisors. In this
case, wait queues for instance claims can get "stuck" briefly behind
longer processing loops such as the update_resources periodic job. The
reason this is possible is because the oslo.lockutils synchronized
library does not use fair locks by default. When a lock is released, one
of the threads waiting for the lock is randomly allowed to take the lock
next. A fair lock ensures that the thread that next requested the lock
will be allowed to take it.

This should ensure that instance claim requests do not have a chance of
losing the lock contest, which should ensure that instance build
requests do not queue unnecessarily behind long-running tasks.

This includes bumping the oslo.concurrency dependency; fair locks were
added in 3.29.0 (I37577becff4978bf643c65fa9bc2d78d342ea35a).

Change-Id: Ia5e521e0f0c7a78b5ace5de9f343e84d872553f9
Related-Bug: #1864122
2020-03-09 11:03:17 -05:00
Zuul abd1f05a0b Merge "trivial: Use 'from foo import bar'" 2020-03-09 15:41:45 +00:00
Matthew Booth 8defe34e28 trivial: Use 'from foo import bar'
In some tests, we were doing an import with a full module path. This has
the side effect of importing every submodule on that path, which led to
some confusing side effects. Use 'import foo from bar' syntax instead
and clean up the damage.

Change-Id: I91a289630f31674dec1d785d67b5acda173b7d7e
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
2020-03-09 14:56:02 +00:00
Zuul 6800aa0339 Merge "Add new default roles in os-atttach-inerfaces policies" 2020-03-09 12:25:25 +00:00
Zuul f3abaf4bec Merge "trivial: Rename directory for os-keypairs samples" 2020-03-09 12:25:18 +00:00
Zuul d5b75845a6 Merge "Fix os-keypairs pagination links" 2020-03-09 12:25:11 +00:00
Zuul 618abecd28 Merge "hyper-v: update support matrix" 2020-03-09 12:25:05 +00:00
Zuul 8e7b3839d0 Merge "Add new default roles in os-deferred_delete policies" 2020-03-09 11:13:10 +00:00
Zuul c7fe3b4bcb Merge "Introduce scope_types in os-deferred_delete" 2020-03-09 10:45:54 +00:00
Matt Riedemann 6c3e8bc48e libvirt: don't log error if guest gone during interface detach
Similar to change I8ae352ff3eeb760c97d1a6fa9d7a59e881d7aea1, if
we're processing a network-vif-deleted event while an instance
is being deleted, the asynchronous interface detach could fail
because the guest is gone from the hypervisor. The existing code
for handling this case was using a stale guest object so this
change tries to refresh the guest from the hypervisor and if the
guest is gone, the Host.get_guest() method should raise an
InstanceNotFound exception which we just trap, log and return.

Change-Id: Ic4c870cc5078d3f7ac6b2f96f8904c2a47de418e
Closes-Bug: #1797966
2020-03-09 09:57:47 +00:00
Ghanshyam Mann 88485d1f69 [Trivial] Fix code comment of admin password tests
Change-Id: I4e3fc5c1449f483cad5076a553fad65129abc701
2020-03-09 09:50:22 +00:00
Zuul bb53370197 Merge "Fix hypervisors paginted collection_name." 2020-03-08 17:58:35 +00:00
zhangbailin 1ad2f558c5 nit: Fix NOTE error of fatal=False
Partial implement blueprint policy-defaults-refresh

Change-Id: I2ab6f42150afb9351bd4548b270c6a3b19909a32
2020-03-07 10:18:50 +08:00
melanie witt 7145100ee4 Lowercase ironic driver hash ring and ignore case in cache
Recently we had a customer case where attempts to add new ironic nodes
to an existing undercloud resulted in half of the nodes failing to be
detected and added to nova. Ironic API returned all of the newly added
nodes when called by the driver, but half of the nodes were not
returned to the compute manager by the driver.

There was only one nova-compute service managing all of the ironic
nodes of the all-in-one typical undercloud deployment.

After days of investigation and examination of a database dump from the
customer, we noticed that at some point the customer had changed the
hostname of the machine from something containing uppercase letters to
the same name but all lowercase. The nova-compute service record had
the mixed case name and the CONF.host (socket.gethostname()) had the
lowercase name.

The hash ring logic adds all of the nova-compute service hostnames plus
CONF.host to hash ring, then the ironic driver reports only the nodes
it owns by retrieving a service hostname from the ring based on a hash
of each ironic node UUID.

Because of the machine hostname change, the hash ring contained, for
example: {'MachineHostName', 'machinehostname'} when it should have
contained only one hostname. And because the hash ring contained two
hostnames, the driver was able to retrieve only half of the nodes as
nodes that it owned. So half of the new nodes were excluded and not
added as new compute nodes.

This adds lowercasing of hosts that are added to the hash ring and
ignores case when comparing the CONF.host to the hash ring members
to avoid unnecessary pain and confusion for users that make hostname
changes that are otherwise functionally harmless.

This also adds logging of the set of hash ring members at level DEBUG
to help enable easier debugging of hash ring related situations.

Closes-Bug: #1866380

Change-Id: I617fd59de327de05a198f12b75a381f21945afb0
2020-03-06 23:25:53 +00:00