Commit Graph

61230 Commits

Author SHA1 Message Date
Takashi Kajinami 9b2b82396c Drop direct dependency on iso8601
iso8601.iso8601.UTC has been equivalent to datetime.timezone.utc in
Python 3. Because python 2 is no longer supported, we can directly use
the built-in implementation.

Also replace iso8601.parse_date by the common function in oslo.utils .

Change-Id: I933cc5dd1fa76e320cd96cd8e9b9a7963ff70375
2025-03-01 16:57:52 +09:00
Zuul d9f72b2432 Merge "libvirt: allow direct SPICE connections to qemu" 2025-02-25 03:23:01 +00:00
Zuul 120495ca56 Merge "libvirt: direct SPICE console database changes" 2025-02-25 01:26:13 +00:00
Zuul a5f43db9a9 Merge "libvirt: direct SPICE console object changes" 2025-02-25 00:35:04 +00:00
Zuul 4611973c99 Merge "libvirt: Fix regression of listDevices() return type" 2025-02-24 23:25:54 +00:00
Zuul d25de45a4b Merge "Reproducer for bug 2098892" 2025-02-24 21:25:48 +00:00
Michael Still cbc263f6bc libvirt: allow direct SPICE connections to qemu
This patch adds a new console type, "spice-direct", which provides
the connection information required to talk the native SPICE protocol
directly to qemu on the hypervisor. This is intended to be fronted
by a proxy which will handle authentication separately.

A new microversion is introduced which adds the type "spice-direct"
to the existing "spice" protocol.

An example request:

POST /servers/<uuid>/remote-consoles
{
  "remote_console": {
    "protocol": "spice",
    "type": "spice-direct"
  }
}

An example response:

{
  "remote_console": {
    "protocol": "spice",
    "type": "spice-direct",
    "url": "http://localhost:13200/nova?token=XXX";
  }
}

This token can then be used to lookup connection details for the
console using a request like this:

GET /os-console-auth-tokens/<consoletoken>

Which returns something like this:

{
  "console": {
    "instance_uuid": <uuid>,
    "host": <hypervisor>,
    "port": <a TCP port number>,
    "tls_port": <another TCP port number>,
    "internal_access_path": null
  }
}

APIImpact

Change-Id: I1e701cbabc0e2c435685e31465159eec09e3b1a0
2025-02-22 08:25:38 +11:00
Michael Still d8e95078cd libvirt: direct SPICE console database changes
This patch makes just the schema changes required for the
implementation of SPICE direct consoles, as requested during
review. See change I1e701cbabc0e2c435685e31465159eec09e3b1a0
for the related feature implementation.

APIImpact

Change-Id: I838ad1a8a74f47544226d3da0e7f1a2f5585b7cb
2025-02-22 08:25:37 +11:00
Michael Still 253dfc76eb libvirt: direct SPICE console object changes
This patch changes just the objects required for the implementation
of SPICE direct consoles, as requested during review. See change
I1e701cbabc0e2c435685e31465159eec09e3b1a0 for the related feature
implementation.

APIImpact

Change-Id: I8ec513fd818362d2f041c679a84e900952af61a2
2025-02-22 08:25:37 +11:00
melanie witt 2c07aa0645 libvirt: Fix regression of listDevices() return type
This a partial revert of change
I60d6f04d374e9ede5895a43b7a75e955b0fea3c5 which added tpool.Proxy
wrapping to the listDevices() and listAllDevices() methods.

The regression was caught during downstream testing with vGPUs and the
update_available_resource() periodic task was failing with:

  TypeError: virNodeDeviceLookupByName() argument 2 must be str or
    None, not Proxy

It turns out that while the listAllDevices() method returns a list of
virNodeDevice objects [1], the listDevices() method returns a list of
string names [2] and is generated from the corresponding function in C
[3].

The error was not caught by unit or functional testing because those
test environments intentionally do not import the libvirt Python
module -- so mocked code in the LibvirtFixture runs instead. Also, the
update_available_resource() method has a 'except Exception:' at the end
which logs an error but does not re-raise. So it would not cause a
functional test to fail.

This reverts the change that caused the regression, updates potentially
confusing docstrings, adds type annotations to the methods that use
listDevices(), and moves the nodeDeviceLookupByName type checking into
the LibvirtFixture.

Closes-Bug: #2098892

[1] https://github.com/libvirt/libvirt-python/blob/408815a/libvirt-override-virConnect.py#L520-L524
[2] https://github.com/libvirt/libvirt-python/blob/408815a/libvirt-override-api.xml#L448-L453
[3] https://libvirt.org/html/libvirt-libvirt-nodedev.html#virNodeListDevices

Change-Id: Ib5befdd3c13367daa208ff969f66cba693ae2c76
2025-02-20 22:42:16 +00:00
melanie witt 3cf6667c50 Reproducer for bug 2098892
Change I60d6f04d374e9ede5895a43b7a75e955b0fea3c5 added tpool.Proxy
wrapping to the listDevices() and listAllDevices() methods but
introduced a regression for listDevices() that led to an error in
update_available_resource():

  TypeError: virNodeDeviceLookupByName() argument 2 must be str or
    None, not Proxy

The error was not caught by unit or functional testing because those
test environments intentionally do not import the libvirt Python
module -- so mocked code in the LibvirtFixture runs instead. Also, the
update_available_resource() method has a 'except Exception:' at the end
which logs an error but does not re-raise. So it would not cause a
functional test to fail.

This adds a functional test to reproduce the bug and adds a keyword arg
to the test _run_periodics() method to specify whether it should raise
an exception if an error is logged.

Related-Bug: #2098892

Change-Id: I3a3dda57f2181b24bd6589ac7bb8160014ab2396
2025-02-20 22:31:21 +00:00
Zuul 375d95565d Merge "FUP for reno issues" 2025-02-20 20:16:45 +00:00
Zuul d00a4d4f0f Merge "move nova-ovs-hybrid-plug to deploy with spice and fix qxl default" 2025-02-20 20:16:38 +00:00
Zuul 43eaed3016 Merge "Add a new ImagePropertiesWeigher" 2025-02-20 18:02:27 +00:00
Sean Mooney 08cbf0f4b1 FUP for reno issues
this change addresses nits in the
make-virtio-the-default-spice-video releasenote

Change-Id: I1d8782cf91375b88c1c119ef4de8a9868b7a60f1
2025-02-20 17:43:13 +00:00
Zuul f97505f9d2 Merge "Add fill_metadata() to InstanceList" 2025-02-20 17:35:03 +00:00
Sylvain Bauza acd6c733c6 Add a new ImagePropertiesWeigher
This weigher will check how many instances in the host have the image
properties that are requested and will prefer by default to pack
instances with the same properties.

Implements blueprint: image-metadata-props-weigher

Change-Id: I3bfed44bd089c6b226d13c3ac4a0003411737cbd
2025-02-20 02:49:08 +00:00
Sean Mooney d4f40976d0 move nova-ovs-hybrid-plug to deploy with spice and fix qxl default
In centos/rhel 9 qemu supprot for the qxl model was removed
with the removal of spice support.
In ubuntu 24.04 qemu support for qxl and spice supprot
has now also been removed.
debian 12 bookworm still support spice in there qemu package.

When we updated the default video model to virtio for x86 we
left a config driven special case for spice to default to qxl
since that nolonger works on centos or ubuntu based distos that
default is not helpful so this change removes the special case
making virtio the default for x86 regardless of the console used.

This change also updates the nova-ovs-hybrid-plug to test with spice
so that we have at least one job using it. to enable that the job is
moved to debian.

Closes-Bug: #2097529
Change-Id: I265ad2ced3729bed41bf53c58dcebadb775ce1f7
2025-02-19 17:09:09 +00:00
Zuul 88a36a5a00 Merge "Respect supplied arguments in novncproxy_base_url" 2025-02-19 16:00:27 +00:00
Dan Smith 420050cf33 Add fill_metadata() to InstanceList
This adds a non-remotable method to InstanceList which will batch-
fill system_metadata for all the instances in the list that are
missing it in as efficient of a manner as possible. This does not
require an object bump because no remotable methods or fields are
changed.

Related to blueprint image-metadata-props-weigher

Change-Id: Icc47de2b677b3d212a7f6faa61a85ea9bff9f412
2025-02-19 07:45:20 -08:00
Zuul ae87118f98 Merge "Add unit test coverage of get_machine_ips" 2025-02-18 10:48:39 +00:00
Zuul b6ceea8e7c Merge "allow discover host to be enabled in multiple schedulers" 2025-02-18 09:38:48 +00:00
Zuul 12bc65f942 Merge "Disable the heal instance info cache periodic task" 2025-02-17 20:46:07 +00:00
Zuul 707967b4ae Merge "Add support for showing image properties in server show response" 2025-02-17 20:45:59 +00:00
Zuul 5771284c67 Merge "Update InstanceNUMACell version in more cases" 2025-02-17 20:45:49 +00:00
Zuul 25d224aa42 Merge "Update InstanceNUMACell version after data migration" 2025-02-17 20:07:47 +00:00
Zuul 466bac1608 Merge "Reproduce bug/2097359" 2025-02-17 20:07:40 +00:00
Zuul 2a5fac7c61 Merge "Drop dependency on netifaces" 2025-02-17 20:07:30 +00:00
Takashi Kajinami 9b9534c2f2 Add unit test coverage of get_machine_ips
This is follow-up of change If9268cab8c2b3098d757571c6cab07d13d34a2c2
and adds the missing test coverage of the utility method.

Change-Id: I5e30e351d87475a0b56c3facbbe3c9c858e2783a
2025-02-14 10:48:03 +00:00
Takashi Kajinami 46435daf5c Drop dependency on netifaces
The netifaces library was abandoned and retired. Replace it by psutil
which is already part of the dependencies.

Note that localhost_supports_ipv6 from openstacksdk should be mocked
now because it uses psutil in recent versions.

Closes-Bug: #2071596
Change-Id: If9268cab8c2b3098d757571c6cab07d13d34a2c2
2025-02-14 19:46:32 +09:00
Zuul 3ddfa2c79f Merge "Bump os-traits to 3.3.0 in requirements" 2025-02-13 14:16:02 +00:00
Balazs Gibizer 9507b7b92f Update InstanceNUMACell version in more cases
The data migration of InstanceNUMACell 1.4 to 1.5 only moved the data to
the new pcpuset field but does not update the ovo version string of the
object in the DB. The previous patch added the missing version update
logic. However it only fixes the issue if the data is not already "half"
migrated to the new structure. So this patch adds logic to also do the
right thing if the wrong data migration already happened.

At the end the solution needs to consider multiple scenarios:
* data is never migrated to the new schema so the new code needs
  to migrate it and update the version string to match the new schema.
  (done by the previous patch)
* data is half migrated by the buggy code and the new code need to
  finish the migration by stamping the version in the DB.
* data is half migrated and then further modified to use the new 1.6
  feature cpu_policy mixed.
* data version is older in the DB than we can meaningfully upgrade

Closes-Bug: #2097359
Change-Id: I10ecfa7841b15637dea3e4736e90faa5f33ddff3
2025-02-12 13:55:47 +01:00
Rajesh Tailor 05c6b6cdbb Add support for showing image properties in server show response
This change adds a new api microversion to add support for
including image properties in ``server show`` and ``server list
--long`` responses as well as in response for ``server rebuild``
instance action.

Implements: blueprint image-properties-in-server-show
Change-Id: Ic135389954e43e6478288c0cdcffd780915cdb40
2025-02-12 10:57:03 +05:30
Zuul 32326d4894 Merge "trivial: Remove legacy API artifact" 2025-02-10 16:46:39 +00:00
Zuul e36ec1b36b Merge "api: Allow min/max_version arguments to response" 2025-02-10 16:46:32 +00:00
René Ribaud c161914a4a Bump os-traits to 3.3.0 in requirements
Change-Id: Icf4aef5175c745b1d6feda309338e86277d79549
2025-02-07 15:56:20 +01:00
Balazs Gibizer 643a6a8a57 Update InstanceNUMACell version after data migration
The data migration of InstanceNUMACell 1.4 to 1.5 only moved the data to
the new pcpuset field but does not update the ovo version string of the
object in the DB. This resulted in an 1.6 data with an 1.4 version
string in the DB which subsequently causes that an old compute running
1.4 ovo version will think it got an old ovo even though the data is
already in the new format. This leads to instance lifecycle errors and
if the nova-compute saves the instance then it also leads to permanent
data loss.

So this change modified the data migration to also update the ovo version
in the DB so that the version string in the DB matches the schema the
data uses in the DB.

Related-Bug: #2097359
Change-Id: I9a99f10526f8e466ac04b035121b24be70a23dae
2025-02-07 11:41:55 +01:00
Sean Mooney e98393c5c2 allow discover host to be enabled in multiple schedulers
This change allows the nova-scheduler `discover hosts` periodic task
to be enabled on multiple scheduler instances without racing.

This is done by modifying the periodic to do a very simple form
of leader election.

implements: blueprint distributed-host-discovery
Change-Id: I96d74981e1cff8cc1fce4a74c8db3f7d58e20f33
2025-02-06 13:53:21 +00:00
Zuul e27bbe72e0 Merge "Cleanup RBAC jobs in check/gate pipeline" 2025-02-05 20:25:50 +00:00
Sean Mooney b3f8815720 Disable the heal instance info cache periodic task
The _heal_instance_info_cache periodic task predates
the introduction of the server external events API
which is now the canonical way to refresh the cache.

This change updates the default value of
``[compute]heal_instance_info_cache_interval``
to -1 disabling it by default.

The nova-ovs-hybrid-plug job is extended to test the
legacy configuration value and the config override is removed
from nova-next

Closes-Bug: #1996094
Related-Bug: #2089225
Change-Id: I33ac91bb4f3ead51af2f7005002d5eb5078540d9
2025-02-05 19:49:01 +00:00
Ghanshyam Mann bb8ee15106 Cleanup RBAC jobs in check/gate pipeline
This commit does the following changes in gate testing:

- Test tempest-integrated-compute-rbac-old-defaults (Test the
  RBAC old defaults which are deprecated but still supported)
  in periodic weekly pipeline instead of check/gate pipeline.

  Reason: Old defaults are deprecated and it will be rare to
  have any changes in those. These should be ok to run weekly
  to know if anything is broken for old defaults.

- Remove tempest-integrated-compute-enforce-scope-new-defaults
  This test the new defaults which are enabled by default and
  tested in every job.

  Reason: We kept this job for cinder because their new defaults
  are not enabled by default. To test nova new defaults combination
  with cinder new defaults, we have Tempest job running in tempest
  gate (tempest-full-enforce-scope-new-defaults).

Change-Id: I3a1634ff71b39c722401dea8e77092228f9cc064
2025-02-04 19:37:17 +00:00
Ghanshyam Mann b8b11c69bd [Trivial] Fix the typo error
Change-Id: I8234bf6ee7973ef246d57c6da7fd95dbf35a2c0f
2025-02-04 11:33:48 -08:00
Balazs Gibizer ae2f9bd573 Reproduce bug/2097359
In Victoria InstanceNUMACell ovo got the new pcpuset field and a
connected in place data migration. If the cpu_policy is dedicated the
content of the cpuset is moved to to the new pcpuset field during the
load of the data from the DB and the ovo is persisted back to the DB.
However during this data migration the version of the ovo is not bumped
to the latest, 1.6, version. Therefore the DB contains an inconsistent
object as it has the new pcpuset field from 1.6 but the
nova_object.version is still set to 1.4. It turned out that this can
cause that an old compute node having ovo 1.4 code will not request a
back levelling of the ovo even if it is already data migrated to 1.6
causing data loss from the compute perspective. Also if the compute
saves the object back to DB then the data loss becomes persistent.

Related-Bug: #2097359
Change-Id: I76ee9d59abc39e29c54be7217491e911b88a0621
2025-02-04 17:55:23 +01:00
Zuul d267ede98f Merge "api: Allow min/max_version arguments to expected_errors" 2025-01-31 00:43:59 +00:00
Zuul 00c8012acf Merge "ironic: Fix ConflictException when deleting server" 2025-01-30 00:31:34 +00:00
Zuul 03ddf02dba Merge "Add ServersViewBuilderTestV296 unit test class" 2025-01-29 11:29:22 +00:00
Mark Goddard 6ebd8a56d1 ironic: Fix ConflictException when deleting server
When unprovision works via Ironic, all operations in _cleanup_deploy
have already been completed. Previous to this patch, we continue
attempting all the clean up steps, which eventually errors out with
BadRequest, or similar, and we complete the delete.

Howerver, if cleaning has started, we hit a conflict exception,
so we don't hit the expected error above.

Prior to moving to the SDK, that landed in Caracal,
we would retry on conflict errors. You can tweak the
config to keep retrying for the length of time cleaning
usually takes in your enviroment.

After this patch:
Ieda5636a5e80ea4af25db2e90be241869759e30c

We now hard fail with this error:

openstack.exceptions.ConflictException:
Failed to detach VIF xxx from bare metal node yyy
...
Node yyy is locked by host zzz,
please retry after the current operation is completed.

This change simply skips calling the operations that
will always error out, avoiding the need to wait for
cleainging to complete before getting the expected
error message.

Closes-Bug: #2019977
Related-Bug: #1815799
Change-Id: I60971b58cf1f24bdb5d40f668e380ebfee2ac495
2025-01-29 09:59:58 +00:00
Zuul 8c953d4d25 Merge "zuul: Add missing context comments for nova-next" 2025-01-28 22:32:50 +00:00
Zuul a49c146504 Merge "Fix typo in release note" 2025-01-28 20:11:38 +00:00
Stephen Finucane 41a8a6ff82 trivial: Remove legacy API artifact
_api_version was always being set to '2.1' so logic based on this wasn't
doing anything. Removing this also highlights a few other places where
we have useless variable assignment and mocking happening.

Change-Id: I4171191624e10513cbf094a3bebb4b1fcba3cc6c
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
2025-01-24 10:07:10 +00:00