The minimum required version of QEMU is now 5.2.0, and of libvirt is
7.0.0.
The below version constants get removed:
- MIN_LIBVIRT_VIOMMU_AW_BITS
- MIN_LIBVIRT_VDPA
- MIN_QEMU_VDPA
- MIN_LIBVIRT_AARCH64_CPU_COMPARE
Adjust the related unit tests accordingly.
Change-Id: Ie805eb7fa59f9f7728da27fddbd6e968e971a2e4
Signed-off-by: Kashyap Chamarthy <kchamart@redhat.com>
Despite having a NumInstancesFilter, we miss a weigher that would classify hosts
based on their instance usage.
Change-Id: Id232c2caf29d3443c61c0329d573a34a7481fd57
Implements-Blueprint: bp/num-instances-weigher
This change remvoes the az filter and always enabled
the placement pre-filter. As part of this removal
the config option to control enabling the pre-filter
is removed as it is now mandatory.
The AZ filter docs and tests are also removed and an upgrade
release note is added.
Depends-On: https://review.opendev.org/c/openstack/devstack/+/886972
Change-Id: Icc8580835beb2b4d40341f81c25eb1f024e70ade
Qemu>=5.0.0 bumped the default tb-cache size to 1GiB(from 32MiB)
and this made it difficult to run multiple guest VMs on systems
running with lower memory. With Libvirt>=8.0.0 it's possible to
configure lower tb-cache size.
Below config option is introduced to allow configure
TB cache size as per environment needs, this only
applies to 'virt_type=qemu':-
[libvirt]tb_cache_size
Also enable this flag in nova-next job.
[1] https://github.com/qemu/qemu/commit/600e17b26
[2] https://gitlab.com/libvirt/libvirt/-/commit/58bf03f85
Closes-Bug: #1949606
Implements: blueprint libvirt-tb-cache-size
Change-Id: I49d2276ff3d3cc5d560a1bd96f13408e798b256a
We add a new specific policy when a host value is provided for cold-migrate,
but by default it will only be an admin-only rule in order to not change
the behaviour.
Change-Id: I128242d5f689fdd08d74b1dcba861177174753ff
Implements: blueprint cold-migrate-to-host-policy
Previously, we archived deleted rows in batches of max_rows parents +
their child rows in a single database transaction. Doing it that way
limited how high a value of max_rows could be specified by the caller
because of the size of the database transaction it could generate.
For example, in a large scale deployment with hundreds of thousands of
deleted rows and constant server creation and deletion activity, a
value of max_rows=1000 might exceed the database's configured maximum
packet size or timeout due to a database deadlock, forcing the operator
to use a much lower max_rows value like 100 or 50.
And when the operator has e.g. 500,000 deleted instances rows (and
millions of deleted rows total) they are trying to archive, being
forced to use a max_rows value several orders of magnitude lower than
the number of rows they need to archive was a poor user experience.
This changes the logic to archive one parent row and its foreign key
related child rows one at a time in a single database transaction
while limiting the total number of rows per table as soon as it reaches
>= max_rows. Doing this will allow operators to choose more predictable
values for max_rows and get more progress per invocation of
archive_deleted_rows.
Closes-Bug: #2024258
Change-Id: I2209bf1b3320901cf603ec39163cf923b25b0359
In Icb913ed9be8d508de35e755a9c650ba25e45aca2 we forgot to add a privsep
decorator for the set_offline() method.
Change-Id: I769d35907ab9466fe65b942295fd7567a757087a
Closes-Bug: #2022955
When [quota]count_usage_from_placement = true or
[quota]driver = nova.quota.UnifiedLimitsDriver, cores and ram quota
usage are counted from placement. When an instance is SHELVED_OFFLOADED,
it will not have allocations in placement, so its cores and ram should
not count against quota during that time.
This means however that when an instance is unshelved, there is a
possibility of going over quota if the cores and ram it needs were
allocated by some other instance(s) while it was SHELVED_OFFLOADED.
This fixes a bug where quota was not being properly enforced during
unshelve of a SHELVED_OFFLOADED instance when quota usage is counted
from placement. Test coverage is also added for the "recheck" quota
cases.
Closes-Bug: #2003991
Change-Id: I4ab97626c10052c7af9934a80ff8db9ddab82738
The 'force' parameter of os-brick's disconnect_volume() method allows
callers to ignore flushing errors and ensure that devices are being
removed from the host.
We should use force=True when we are going to delete an instance to
avoid leaving leftover devices connected to the compute host which
could then potentially be reused to map to volumes to an instance that
should not have access to those volumes.
We can use force=True even when disconnecting a volume that will not be
deleted on termination because os-brick will always attempt to flush
and disconnect gracefully before forcefully removing devices.
Closes-Bug: #2004555
Change-Id: I3629b84d3255a8fe9d8a7cea8c6131d7c40899e8
Add file to the reno documentation build to show release notes for
stable/2023.1.
Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/2023.1.
Sem-Ver: feature
Change-Id: I96511e7f86a9f7be9f65e0133c8b38dade57801a
First time we now use the correct release naming [1].
Shamelessly copied the bullet points from the cycle highlights [2].
[1] https://governance.openstack.org/tc/reference/release-naming.html
[2] I02ed58bb5a4ecdc8171d9aa4a150be1bca214528
Change-Id: Id4e2e672e3a1a5aba7e664ba2d2f701b9be988e0
By this patch, we now automatically power down or up cores
when an instance is either stopped or started.
Also, by default, we now powersave or offline dedicated cores when
starting the compute service.
Implements: blueprint libvirt-cpu-state-mgmt
Change-Id: Id645fd1ba909683af903f3b8f11c7f06db3401cb
sqlalchemy-migrate does not (and will not) support sqlalchemy 2.0. We
need to drop these migrations to ensure we can upgrade our sqlalchemy
version.
Change-Id: I7756e393b78296fb8dbf3ca69c759d75b816376d
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Start to v2.95 any evacuated instances will be stopped a destination
Implements: bp/allowing-target-state-for-evacuate
Signed-off-by: Sahid Orentino Ferdjaoui <sahid.ferdjaoui@industrialdiscipline.com>
Change-Id: I141b6f057cc4eb9c541c2bc6eddae27270ede08d
Except on ironic, if we are able to determine that our locally-
configured node has a hostname other than what we expect, we should
abort startup. Since we currently depend on the loose name-based
association of nodes, services, and instances, we need to make sure
we do not startup in an inconsistent configuration.
Related to blueprint stable-compute-uuid
Change-Id: I595b27a57516cffe1f172cf2fb736e1b11373a1d
In some cases on Arista VXLAN fabrics, VMs are inaccessible via network
after live migration, despite garps being observed on the fabric itself.
This patch builds on the feature
``[workarounds]/enable_qemu_monitor_announce_self`` feature as reported
in `bug 1815989 <https://bugs.launchpad.net/nova/+bug/1815989>`
This patch adds the ability to config the number of times the QEMU
announce_self monitor command is called, and add a new configuration option to
specify a delay between calling the announce_self command multiple times,
as in some cases, multiple announce_self monitor commands are required for
the fabric to honor the garp packets and the VM to become accessible via
the network after live migration.
Closes-Bug: #1996995
Change-Id: I2f5bf7c9de621bb1dc7fae5b3374629a4fcc1f46
The older libvirt API compareCPU() does not take into account the
capabilities of the "host hypervisor" (KVM, QEMU and the details libvirt
knows about the host) when comparing CPUs. This is causing unwanted
failures during CPU compatibility checks. To fix this and other related
problems, libvirt has introduced (in v4.4.0) a newer API,
compareHypervisorCPU(), which _does_ take into account the host
hypervisor's capabilities before comparing CPUs. This will help fix a
range of problems, such as[1][2].
So let's switch to the newer API, which is largely a drop-in
replacement.
In this patch:
- Introduce a wrapper method, compare_hypervisor_cpu() for libvirt's
compareHypervisorCPU() API.
- Update the _compare_cpu() method to use the wrapper method,
compare_hypervisor_cpu().
- Update the unit tests to use the newer API, compareHypervisorCPU().
[1] https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1978064
[2] https://bugzilla.redhat.com/show_bug.cgi?id=2138381
Change-Id: Ib523753f52993cfe72e35e0309e429ca879c125c
Signed-off-by: Kashyap Chamarthy <kchamart@redhat.com>
Extend microversion 2.90 to allow FQDNs in the hostname parameter.
Multi-create with --hostname continues to be refused, returning error
400 to the user. This simplifies the code by not needing to handle any
sort of suffix or prefix mangling of the FQDN to handle multiple
instances. No other changes are made - not Neutron integration,
metadata API changes (the FQDN will appear as-is in places where the
hostname currently appears), etc.
Change-Id: I47e397dc6da8263762479cc8ae4f8777a6d9d811
Implements: bp/fqdn-in-hostname
This patch adds the following SPICE-related options to the 'spice'
configuration group of a Nova configuration:
- image_compression
- jpeg_compression
- zlib_compression
- playback_compression
- streaming_mode
These configuration options can be used to enable and set the SPICE
compression settings for libvirt (QEMU/KVM) provisioned instances.
Each configuration option is optional and can be set explictly to
configure the associated SPICE compression setting for libvirt. If all
configuration options are not set, then none of the SPICE compression
settings will be configured for libvirt, which corresponds to the
behavior before this change. In this case, the built-in defaults from
the libvirt backend (e.g. QEMU) are used.
Note that those options are only taken into account if SPICE support is
enabled (and the VNC support is disabled).
Implements: blueprint nova-support-spice-compression-algorithm
Change-Id: Ia7efeb1b1a04504721e1a5bdd1b5fa7a87cdb810
As discussed in PTG, we need to test the new RBAC in the
integrated gate and accordingly enable the new defaults
and scope check by default. A new integrated testing job
has been added and results show that the new defaults and
scope checks are working fine. During testing, we found a
few bugs in neutron policies but all are fixed now.
enforce_scope and enforce_new_defaults are oslo policy config
options but they are per service level and the default value
can be overridden. Oslo policy 3.11.0 version allows to override
the default value for these config options[1] so upgrading the
oslo policy version in requirements.txt
Depends-On: https://review.opendev.org/c/openstack/devstack/+/869781
Depends-On: https://review.opendev.org/c/openstack/placement/+/869525
[1] https://github.com/openstack/oslo.policy/blob/3.11.0/oslo_policy/opts.py#L125
Change-Id: I977b2daedf880229c8d364ca011f2ea965b86e3a
A new configuration option [filter_scheduler]pci_in_placement is added
that allows enabling the scheduler logic for PCI device handling in
Placement for flavor based PCI requests.
blueprint: pci-device-tracking-in-placement
Change-Id: I5ddf6d3cdc7e05cc4914b9b1e762fa02a5c7c550
Unlike uwsgi, apache mod_wsgi does not support passing
commandline arguments to the python wsgi script it invokes.
As a result while you can pass --config-file when hosting the
api and metadata wsgi applications with uwsgi there is no
way to use multiple config files with mod_wsgi.
This change mirrors how this is supported in keystone today
by intoducing a new OS_NOVA_CONFIG_FILES env var to allow
operators to optional pass a ';' delimited list of config
files to load.
This change also add docs for this env var and the existing
undocumented OS_NOVA_CONFIG_DIR.
Closes-Bug: 1994056
Change-Id: I8e3ccd75cbb7f2e132b403cb38022787c2c0a37b
When offloading a shelved instance, the compute needs to remove the
binding so the port will appear as "unbound" in neutron.
Closes-Bug: 1983471
Change-Id: Ia49271b126870c7936c84527a4c39ab96b6c5ea7
Signed-off-by: Arnaud Morin <arnaud.morin@ovhcloud.com>
For networks with subnets with enabled DHCP service don't provide
mtu value in the metadata. That way cloud-init will not configure it
"statically" in e.g. netplan's config file and guest OS will use MTU
value provided by the DHCP service.
Closes-Bug: #1899487
Change-Id: Ib775c2210349b72b3dc033554ac6d8b35b8d2d79
Currently, when you delete an ironic instance, we trigger
and undeploy in ironic and we release our allocation in placement.
We do this well before the ironic node is actually available.
We have attempted to fix this my marking unavailable nodes
as reserved in placement. This works great until you try
and re-image lots of nodes.
It turns out, ironic nodes that are waiting for their automatic
clean to finish, are returned as a valid allocation candidates
for quite some time. Eventually we mark then as reserved.
This patch takes a strange approach, if we mark all nodes as
reserved as soon as the instance lands, we close the race.
That is, when the allocation is removed the node is still
unavailable until the next update of placement is done and
notices that the node has become available. That may or may
not have been after automatic cleaning. The trade off is
that when you don't have automatic cleaning, we wait a bit
longer to notice the node is available again.
Note, this is also useful when a broken Ironic node is
marked as in-maintainance while it is in-use by a nova
instance. In a similar way, we mark the Nova as reserved
immmeidately, rather than first waiting for the instance to be
deleted before reserving the resources in Placement.
Closes-Bug: #1974070
Change-Id: Iab92124b5776a799c7f90d07281d28fcf191c8fe