Commit Graph

61477 Commits

Author SHA1 Message Date
Michael Still 183896a79b libvirt: Add objects and notifications for sound model.
This patch adds just the objects and notifications required to
support an extra spec to configure a sound device inside
the guest. This is useful for SPICE consoles using the native
protocol.

Change-Id: I2faeda0fd0fb9c8894d69558a1ccaab8da9f6a1b
Signed-off-by: Michael Still <mikal@stillhq.com>
2025-07-07 14:44:57 +10:00
Zuul 1c03429337 Merge "Replace utils.spawn_n with spawn" 2025-07-04 18:00:25 +00:00
Zuul eff31bbb2d Merge "db: Resolve alembic deprecation warning" 2025-07-04 13:16:23 +00:00
Zuul 7bfcf82846 Merge "Use futurist for _get_default_green_pool()" 2025-07-03 17:11:12 +00:00
Balazs Gibizer 81a03ab824 Replace utils.spawn_n with spawn
As [1] switched over the implementation of spawn and spawn_n to the same
futurist Executor.submit we can now replace all spawn_n usage with spawn
and drop spawn_n from nova.utils.

[1]I3494660e1aaa1db46f9f08494cb5817ec7020cc5

Change-Id: I0027f119c0fbe8d5298307324eaf30c5e9e152d3
Signed-off-by: Balazs Gibizer <gibi@redhat.com>
2025-07-02 15:47:29 +02:00
Balazs Gibizer d90e7726c0 Use futurist for _get_default_green_pool()
Nova uses nova.utils.spawn* to create new threads. This so far relied on
a GreenPool to provide GreenThreads. We changes this pool to a
futurist.GreenThreadPoolExecutor to have an interface where the
implementation can be swapped out to futurist.ThreadPoolExecutor to get
native threads instead.

This is an interface change on utils.spawn as it will return
futurist.Future instead of GreenThread. So couple of fixes needed across
nova to use:
* .result() instead of .wait()
* .add_done_callback() instead of .link(). Here we needed to change the
  usage as the new callback does not forward args, so we rely on
  closures instead.

This is also an interface and a behavior change for utils.spawn_n as it
now calls utils.spawn internally. This means that top of the above
detailed interface change there is behavior change for spawn_n.

The spawn creates GreenThread a wrapper around greenlet while spawn_n
created only the underlying greenlet. The greenlet cannot be managed
the same way as a more intelligent GreenThread, including the return
value but not limited to it, e.g. the whole cancellation mechanism
is missing from greenlet too. After this patch spawn_n will also use
GreenThread instead of naked greenlet. We consider the resulting small
performance change negligible.

Also the way we implement SpawnIsSynchronousFixture in our test is
adapted along with other test fixture adaptation to call / mock the
right functions.

Change-Id: I3494660e1aaa1db46f9f08494cb5817ec7020cc5
Signed-off-by: Balazs Gibizer <gibi@redhat.com>
2025-07-02 15:47:29 +02:00
Zuul 2c19c07d5e Merge "doc: Adding link for RabbitMQ installation during nova deployment on controller node." 2025-07-01 22:33:40 +00:00
Zuul 43d57ae63d Merge "api: Address issues with server diagnostics APIs" 2025-06-27 20:29:16 +00:00
Zuul bb71b953c7 Merge "api: Address issues with remote consoles APIs" 2025-06-27 17:32:18 +00:00
Zuul 7e26a95d2d Merge "Note on RPC error decorators around build_and_run_instance" 2025-06-27 16:42:15 +00:00
Zuul 39793dde08 Merge "Fix neutron client dict grabbing" 2025-06-27 16:41:59 +00:00
Zuul 76ad55da2e Merge "api: Add response body schemas for server diagnostics API" 2025-06-27 15:15:54 +00:00
Stephen Finucane c4f81a54d5 api: Address issues with remote consoles APIs
* Add a note explaining presence of xvpvnc console type
* Make 'url' mandatory in create response
* Remove unnecessary description fields: we will populate these later
* De-deuplcate request body schemas
* Re-add references to the rdp console to the api-ref

Change-Id: I5555b8cf7a83fad689e98522850b5550b49566ed
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
2025-06-26 18:49:59 +01:00
Zuul 0d586ccca8 Merge "api: Add response body schemas for server topology API" 2025-06-26 13:31:51 +00:00
Zuul 3cc03f8a91 Merge "api: Add response body schemas for image metadata APIs" 2025-06-26 11:35:06 +00:00
Zuul 448578bf82 Merge "Translate scatter-gather to futurist" 2025-06-26 11:34:20 +00:00
Zuul b3d64d5f8f Merge "api: Add response body schemas for keypairs APIs" 2025-06-26 07:19:56 +00:00
Zuul d748e78486 Merge "api: Add response body schemas for remote consoles" 2025-06-26 02:23:22 +00:00
Balazs Gibizer 1cd1c472bd Note on RPC error decorators around build_and_run_instance
The RPC handler build_and_run_instance has a normal set of error handler
decorators but that call immediately spawns a separate thread and does not
wait for its result so it cannot get any exceptions. Therefore the error
handling should happen inside the thread. And it does. Mostly.

The _locked_do_build_and_run_instance is the function run by the thread
and it calls _do_build_and_run_instance. The _do_build_and_run_instance
has `except Exception` handling dropping the active exception and returning
an error value instead. So most of the exceptions are handled by
_do_build_and_run_instance without going through any of the error
decorators.

Whatever exception still bubbles up from _do_build_and_run_instance
are going through the error handling decorators top of
_do_build_and_run_instance.

The only source of exception outside of _do_build_and_run_instance but
within _locked_do_build_and_run_instance is
delete_allocation_for_instance. Exception from that call does not
go though any error handling decorators.

These error decorators has multiple tasks:
* wrap_exception calls _emit_legacy_exception_notification and
  _emit_versioned_exception_notification so sending error
  notifications
* reverts_task_state sets instance.task_state to None
* wrap_instance_fault records InstanceFault object to the DB

So there are two cases when error happens but the decorators are not
called:
* _do_build_and_run_instance catches the exception, does its own error
  handling and cleanup and then returns an error value. The task state
  revert and the InstanceFault creation is replicated to the individual
  cleanup sections here.
  Note that notification sending is not missing either as that is
  handled by the _build_and_run_instance called from
  _do_build_and_run_instance.

* delete_allocation_for_instance raises when called from
  _locked_do_build_and_run_instance. We reach this call if
  _do_build_and_run_instance failed already and did its own cleanup. So
  again task state are already reset and InstanceFault already recorded
  for the original fault and notification already sent.

So the only changes here is to add a note to build_and_run_instance so
that the real error handling are deeper in the stack so that readers will
not think error handling happen on top of the call stack.

There was two unit test cases that partially relied on the wrong
assumption that exceptions propagate to build_and_run_instance.
They could make that wrong assumption because the
SpawnIsSynchronousFixture simulates the GreenThread interface wrongly.
The goal of that fixture is to make any spawn (and spawn_n) call
synchronous meaning the function passed in is executed right away before
spawn returns. That is OK. But the fixture forgot that if the function
raises then such exception is now raised by spawn. Such a thing never
happens with a normal spawn. The exception (and the function return
value) only propagated when GreenThread.wait() is called.

So this patch fixes the fixture and corrects the two unit tests as well.

Change-Id: I440fff6663d0663fb1630dd096f352403982aa37
2025-06-25 17:59:02 +02:00
Balazs Gibizer 7d946c4535 Fix neutron client dict grabbing
Due to a bug in python3.13 [1] the following code will leads to an
emptied dict by the GC even though we hold a reference to the dict.

import gc

class A:

    def __init__(self, client):
        self.__dict__ = client.__dict__
        self.client = client

class B:
    def __init__(self):
        self.test_attr = "foo"

a = A(B())
print(a.__dict__)
print(a.client.__dict__)
gc.collect()
print("##  After gc.collect()")
print(a.__dict__)
print(a.client.__dict__)

 # Output with Python 13
{'test_attr': 'foo', 'client': <__main__.B object at 0x73ea355a8590>}
{'test_attr': 'foo', 'client': <__main__.B object at 0x73ea355a8590>}
 ##  After gc.collect()
{'test_attr': 'foo', 'client': <__main__.B object at 0x73ea355a8590>}
{}

 # Output with Python 12
{'test_attr': 'foo', 'client': <__main__.B object at 0x79c86f355400>}
{'test_attr': 'foo', 'client': <__main__.B object at 0x79c86f355400>}
 ##  After gc.collect()
{'test_attr': 'foo', 'client': <__main__.B object at 0x79c86f355400>}
{'test_attr': 'foo', 'client': <__main__.B object at 0x79c86f355400>

Our neutron client has this kind of code and therefore failing in
python3.13. This patch adds __getattr__ instead of trying to hold a
direct reference to the __dict__. This seems to work around the
problem.

Co-Authored-By: Johannes Kulik <johannes.kulik@sap.com>

[1] https://github.com/python/cpython/issues/130327
Closes-Bug: #2103413

Change-Id: I87c9fbb9331135674232c6e77d700966a938b0ac
2025-06-25 10:37:03 +02:00
Zuul 5582ec2e69 Merge "api: Add response body schemas for server IPs APIs" 2025-06-24 16:56:16 +00:00
Zuul 31b9c8ed58 Merge "libvirt: Enable autodeflate and freePageReporting for memballoon" 2025-06-23 15:54:32 +00:00
Zuul 41773f8c65 Merge "Fix disable memballoon device" 2025-06-19 22:51:15 +00:00
Zuul 64ca204c9c Merge "api: Address issues with instance actions API" 2025-06-16 15:05:21 +00:00
Pierre Riteau 1efbbc8d5f Remove Unicode characters
Change-Id: I1b01b42efceb5430b3eda54531eb843c0c73ab68
2025-06-13 23:53:55 +02:00
Balazs Gibizer 2275b8545e Translate scatter-gather to futurist
This rewrites the core logic of scatter-gather to use
the futurist lib that provides a consistent API to use either Green
or native ThreadPoolExecutor. This way the scatter-gather code
can be made independent from the actual concurrency backend.

In this change we also create a separate executor for the scatter-gather
calls to better control it.

A new config option [default]cell_worker_thread_pool_size is added
to define the number of threads in this ThreadPoolExecutor. For the
GreenThreadPoolExecutor case this config is ignored as only real
threads are expensive, green threads doesn't.

As oslo.sevice use os.fork(), and fork copies the parent process state
to the children processes we need to make sure that the children
processes get a proper, fresh executor state, so we destroy the
executor just before the fork.

The fixtures.SynchronousThreadPoolExecutorFixture is deleted. It was
globally mocking futurist.GreenThreadPoolExecutor which would now
mocking more than what we want. There was only one usage of it in the
ComputeManager testing where it is replaced with a direct change of the
manager internal executor in the test setup to get the same, but
localized, result.

Note that testing the threading mode is done in nova-next job
in I36c68740fae3e3a9bd3286a1b66d86fd3341aff5, pool statistics
reporting will be added by Id4244f5ae0fd49c99af2898789cdd510859e150d,
and documentation about our threading tunables are in
I003177de3a9f69c71c19eb8eaa7232785e03e669.

Change-Id: Ibff6c73ad9af911a42204e53fee31ed5537c829d
2025-06-13 12:15:56 +02:00
Zuul d75507e679 Merge "api: Address issues with hypervisors APIs" 2025-06-12 14:16:30 +00:00
Zuul 07e573ab4f Merge "api: Add response body schemas for hypervisors APIs (3/3)" 2025-06-12 13:57:11 +00:00
Zuul 20ef126cdb Merge "Cache [pci]alias parsing" 2025-06-12 03:31:23 +00:00
Zuul 856d2ddca3 Merge "Validate [pci]alias at service startup" 2025-06-12 03:31:11 +00:00
Zuul 3e11280522 Merge "Validated that PCI alias has proper ids" 2025-06-12 03:28:54 +00:00
Zuul 45623879d9 Merge "Multiple spec per PCI alias limitation" 2025-06-12 03:28:17 +00:00
Zuul 03a828720f Merge "Return HTTP400 for multi spec pci alias if PCI in Placement" 2025-06-12 03:28:05 +00:00
Zuul c11f807ae7 Merge "reorder and extend pre-commit hooks" 2025-06-12 03:13:34 +00:00
Zuul d66a0e2220 Merge "Allow autopep8 to fix more things" 2025-06-12 03:13:22 +00:00
Zuul c127d87f02 Merge "Remove unused config options" 2025-06-11 22:43:02 +00:00
Zuul dc03416d7c Merge "Add functional reproducer for bug 2102038" 2025-06-11 14:32:34 +00:00
Balazs Gibizer 0065bb6cd4 Cache [pci]alias parsing
For each lifecycle operation nova re-load, parses, and validates the
[pci]alias config option. This is wasteful. So this patch adds
functools.cache decorator on the function used to do this work.

Change-Id: If2ffb25430749a22c923c0938221833e7b883873
2025-06-11 07:23:01 -07:00
Balazs Gibizer ae064caf16 Validate [pci]alias at service startup
Both nova-api and nova-compute depends on the [pci]alias configuration.
These services loaded and validated the config lazily when it was
needed. This can late and hard to troubleshoot failures during instance
lifecycle operations due to simple config errors.

So this patch adds an early load of this config to nova-api and
nova-compute.

Related-Bug: #2102038
Related-Bug: #2111440
Change-Id: I5d5dc912ca24979067984c7cb53ceaded7daf236
2025-06-11 07:23:01 -07:00
Balazs Gibizer acc6221660 Validated that PCI alias has proper ids
Either the vendor_id and product_id needs to be set or the
resource_class needs to be set in each alias. This is now validated when
the alias is parsed to avoid late failure during placement
allocation_candidates query.

Closes-Bug: #2111440
Change-Id: I7fd43b3d6faac8c4098b0983e8adc596414823a1
2025-06-11 07:23:01 -07:00
Balazs Gibizer c3f392dd8e Multiple spec per PCI alias limitation
Document and the limitation of the PCI in Placement feature that it
does not support [pci]alias configuration where the name of the alias is
repeated. E.g.

[pci]
alias = { "name": "vf1", "product_id":"10ca", "vendor_id":"8086", "device_type":"type-VF"}
alias = { "name": "vf1", "product_id":"f000", "vendor_id":"8086", "device_type":"type-VF"}

This would mean the alias vf1 can be fulfilled from devices with product
id 10ca OR f000. However this OR relationship cannot be encoded to a
single Placement allocation query as Placement does not support
requesting alternative resource classes for a request[2].

This limitation was encoded in the original PCI in Placement
implementation[1] but we missed to mention it in the doc.
This is now fixed.

[1]https://github.com/openstack/nova/blob/0d484ce37d86e989c8abdf57aec5e334f68206ef/nova/objects/request_spec.py#L504-L528
[2]https://docs.openstack.org/api-ref/placement/#list-allocation-candidates

Related-Bug: #2102038
Change-Id: I9dd78b1498f870a4e4c3f26c23d42d105aec0350
2025-06-11 07:23:00 -07:00
Balazs Gibizer 0bfac5c7fe Return HTTP400 for multi spec pci alias if PCI in Placement
PCI in Placement never supported PCI aliases with multiple specs, i.e.
when an alias name is used in multiple alias definitions. The code
raised ValueError late and without a proper error message. Now
PciInvalidAlias with a descriptive message is raised instead.

Closes-Bug: #2102038
Change-Id: Id1407a37dc5ddad22d8dbf7d589ed998ffc2804e
2025-06-11 07:22:05 -07:00
Stephen Finucane a0af4648b5 api: Address issues with hypervisors APIs
* Address an off-by-one error: the cpu_info field was modified in v2.28,
  not v2.27,
* Correct the api-ref to indicate that the 'servers' field is not
  actually required and will be missing if '?with_servers=false', while
  the 'name' and 'uuid' fields of servers entries *are* required.
* Clarify a comment about the above in the schemas.
* Uncouple the '_hypervisor_response' and '_hypervisor_detail_response'
  helper schemas. The minor increase in lines of code is worth it for
  the decrease in complexity.
* Add the 'host_ip', 'hypervisor_type', and 'hypervisor_version' fields
  to the list of required fields for "detail"-style responses (show and
  detailed list).
* Make the 'current_workload', 'disk_available_least', 'free_disk_gb',
  'free_ram_mb', 'host_ip' and 'running_vms' fields of the hypervisor
  "detail"-style responses nullable, and the 'current_workload',
  'disk_available_least', 'free_disk_gb', 'free_ram_mb' and
  'running_vms' fields of the deprecated statistics API nullable.

Change-Id: Ibe55b44e65fe17141c63cceae8a003816ffe4f23
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
2025-06-11 13:07:54 +01:00
Sean Mooney cd401c5c1b libvirt: Enable autodeflate and freePageReporting for memballoon
The libvirt driver now automatically enables autodeflate and
freePageReporting attributes for virtio memory balloon devices.
The autodeflate feature allows the QEMU virtio memory balloon
to release memory before the Out of Memory killer activates.
The freePageReporting feature enables returning unused pages
back to the hypervisor for use by other guests or processes,
improving overall memory efficiency on compute hosts.

These features are always enabled when a memballoon device is
configured, requiring no additional configuration from operators.

implements: blueprint automatic-memballoon-freeing
Generated-By: claude-code
Change-Id: If47a6d38cd311b08b78acffb307a99a7a2a080a1
Signed-off-by: Sean Mooney <work@seanmooney.info>
2025-06-11 11:13:11 +00:00
Stephen Finucane 3b08d60dc4 api: Address issues with server diagnostics APIs
Add some missing additionalProperties=False entries.

Change-Id: I4477dcb590392c189a2bd586ecd9ba4ccd35d89e
2025-06-11 10:17:10 +01:00
Zuul 2270c4ac94 Merge "api: Add response body schemas for hypervisors APIs (2/3)" 2025-06-11 08:17:05 +00:00
Zuul 3999899f75 Merge "api: Add response body schemas for hypervisors APIs (1/3)" 2025-06-11 03:36:38 +00:00
Zuul 03edd00bfe Merge "api: Add response body schemas for instance actions" 2025-06-10 22:47:33 +00:00
Zuul c2e5f8c4c7 Merge "api: Add response body schemas for hosts APIs" 2025-06-10 22:39:21 +00:00
Zuul 517415b6cb Merge "update pre-commit version pins" 2025-06-10 16:23:41 +00:00