Commit Graph

2508 Commits

Author SHA1 Message Date
Zuul 767252ca97 Merge "Add upgrade check for compute-object-ids linkage" 2023-08-16 20:39:59 +00:00
Dan Smith 27f384b7ac Add upgrade check for compute-object-ids linkage
Related to blueprint compute-object-ids

Change-Id: I6e837b9086fe20a9135712bca2d711843d39739a
2023-07-26 10:23:50 -07:00
Kashyap Chamarthy b6cf8e6128 Bump MIN_{LIBVIRT,QEMU} for "Bobcat"
The minimum required version of QEMU is now 5.2.0, and of libvirt is
7.0.0.

The below version constants get removed:

  - MIN_LIBVIRT_VIOMMU_AW_BITS
  - MIN_LIBVIRT_VDPA
  - MIN_QEMU_VDPA
  - MIN_LIBVIRT_AARCH64_CPU_COMPARE

Adjust the related unit tests accordingly.

Change-Id: Ie805eb7fa59f9f7728da27fddbd6e968e971a2e4
Signed-off-by: Kashyap Chamarthy <kchamart@redhat.com>
2023-07-24 13:39:42 +02:00
Zuul 7e25b672ef Merge "Add a new policy for cold-migrate with host" 2023-07-21 16:52:51 +00:00
Sylvain Bauza ca3fbb4d15 Add a new NumInstancesWeigher
Despite having a NumInstancesFilter, we miss a weigher that would classify hosts
based on their instance usage.

Change-Id: Id232c2caf29d3443c61c0329d573a34a7481fd57
Implements-Blueprint: bp/num-instances-weigher
2023-07-21 15:50:57 +02:00
Sean Mooney 5edd805fe2 Remove deprecated AZ filter.
This change remvoes the az filter and always enabled
the placement pre-filter. As part of this removal
the config option to control enabling the pre-filter
is removed as it is now mandatory.

The AZ filter docs and tests are also removed and an upgrade
release note is added.

Depends-On: https://review.opendev.org/c/openstack/devstack/+/886972
Change-Id: Icc8580835beb2b4d40341f81c25eb1f024e70ade
2023-07-17 12:22:22 +01:00
yatinkarel 3f7cc63d94 Add config option to configure TB cache size
Qemu>=5.0.0 bumped the default tb-cache size to 1GiB(from 32MiB)
and this made it difficult to run multiple guest VMs on systems
running with lower memory. With Libvirt>=8.0.0 it's possible to
configure lower tb-cache size.

Below config option is introduced to allow configure
TB cache size as per environment needs, this only
applies to 'virt_type=qemu':-

[libvirt]tb_cache_size

Also enable this flag in nova-next job.

[1] https://github.com/qemu/qemu/commit/600e17b26
[2] https://gitlab.com/libvirt/libvirt/-/commit/58bf03f85

Closes-Bug: #1949606
Implements: blueprint libvirt-tb-cache-size
Change-Id: I49d2276ff3d3cc5d560a1bd96f13408e798b256a
2023-07-13 19:35:52 +05:30
Sylvain Bauza 2d320f9b00 Add a new policy for cold-migrate with host
We add a new specific policy when a host value is provided for cold-migrate,
but by default it will only be an admin-only rule in order to not change
the behaviour.

Change-Id: I128242d5f689fdd08d74b1dcba861177174753ff
Implements: blueprint cold-migrate-to-host-policy
2023-06-26 11:34:12 +02:00
melanie witt 697fa3c000 database: Archive parent and child rows "trees" one at a time
Previously, we archived deleted rows in batches of max_rows parents +
their child rows in a single database transaction. Doing it that way
limited how high a value of max_rows could be specified by the caller
because of the size of the database transaction it could generate.

For example, in a large scale deployment with hundreds of thousands of
deleted rows and constant server creation and deletion activity, a
value of max_rows=1000 might exceed the database's configured maximum
packet size or timeout due to a database deadlock, forcing the operator
to use a much lower max_rows value like 100 or 50.

And when the operator has e.g. 500,000 deleted instances rows (and
millions of deleted rows total) they are trying to archive, being
forced to use a max_rows value several orders of magnitude lower than
the number of rows they need to archive was a poor user experience.

This changes the logic to archive one parent row and its foreign key
related child rows one at a time in a single database transaction
while limiting the total number of rows per table as soon as it reaches
>= max_rows. Doing this will allow operators to choose more predictable
values for max_rows and get more progress per invocation of
archive_deleted_rows.

Closes-Bug: #2024258

Change-Id: I2209bf1b3320901cf603ec39163cf923b25b0359
2023-06-20 20:04:46 +00:00
Sylvain Bauza 2c4421568e cpu: make governors to be optional
Change-Id: Ifb7d001cfdb95b1b0aa29f45c0ef71c0673e1760
Closes-Bug: #2023018
2023-06-07 11:54:57 +02:00
Sylvain Bauza 3fab43786b cpu: fix the privsep issue when offlining the cpu
In Icb913ed9be8d508de35e755a9c650ba25e45aca2 we forgot to add a privsep
decorator for the set_offline() method.

Change-Id: I769d35907ab9466fe65b942295fd7567a757087a
Closes-Bug: #2022955
2023-06-06 16:26:05 +02:00
melanie witt 6f79d6321e Enforce quota usage from placement when unshelving
When [quota]count_usage_from_placement = true or
[quota]driver = nova.quota.UnifiedLimitsDriver, cores and ram quota
usage are counted from placement. When an instance is SHELVED_OFFLOADED,
it will not have allocations in placement, so its cores and ram should
not count against quota during that time.

This means however that when an instance is unshelved, there is a
possibility of going over quota if the cores and ram it needs were
allocated by some other instance(s) while it was SHELVED_OFFLOADED.

This fixes a bug where quota was not being properly enforced during
unshelve of a SHELVED_OFFLOADED instance when quota usage is counted
from placement. Test coverage is also added for the "recheck" quota
cases.

Closes-Bug: #2003991

Change-Id: I4ab97626c10052c7af9934a80ff8db9ddab82738
2023-05-23 01:02:05 +00:00
Zuul 2dde4538bc Merge "vmwareapi: Mark driver as experimental" 2023-05-18 15:08:04 +00:00
melanie witt db455548a1 Use force=True for os-brick disconnect during delete
The 'force' parameter of os-brick's disconnect_volume() method allows
callers to ignore flushing errors and ensure that devices are being
removed from the host.

We should use force=True when we are going to delete an instance to
avoid leaving leftover devices connected to the compute host which
could then potentially be reused to map to volumes to an instance that
should not have access to those volumes.

We can use force=True even when disconnecting a volume that will not be
deleted on termination because os-brick will always attempt to flush
and disconnect gracefully before forcefully removing devices.

Closes-Bug: #2004555

Change-Id: I3629b84d3255a8fe9d8a7cea8c6131d7c40899e8
2023-05-10 07:09:05 -07:00
Zuul ad3b3681b6 Merge "add hypervisor version weigher" 2023-05-04 01:29:06 +00:00
Sean Mooney e38d6a356b add hypervisor version weigher
implements: blueprint weigh-host-by-hypervisor-version
Change-Id: I36b16a388383c26bdf432030bc9e28b2fd75d120
2023-04-20 18:33:55 +00:00
Zuul d6aa812099 Merge "hyperv: Mark driver as experimental" 2023-04-19 13:26:47 +00:00
Zuul 01ffb6df85 Merge "db: Remove legacy migrations" 2023-04-17 01:08:26 +00:00
Zuul 3886f078de Merge "Unbind port when offloading a shelved instance" 2023-03-13 18:04:20 +00:00
Zuul 8de4377fa0 Merge "Update master for stable/2023.1" 2023-03-09 13:06:43 +00:00
OpenStack Proposal Bot 4df62f7015 Imported Translations from Zanata
For more information about this automatic import see:
https://docs.openstack.org/i18n/latest/reviewing-translation-import.html

Change-Id: I9442fd04f852986e903768f06453f2a0d9cc4dbb
2023-03-09 03:13:12 +00:00
OpenStack Release Bot e0fc974b97 Update master for stable/2023.1
Add file to the reno documentation build to show release notes for
stable/2023.1.

Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/2023.1.

Sem-Ver: feature
Change-Id: I96511e7f86a9f7be9f65e0133c8b38dade57801a
2023-03-06 09:23:27 +00:00
Sylvain Bauza f587685f60 Add the 2023.1 Antelope prelude section
First time we now use the correct release naming [1].

Shamelessly copied the bullet points from the cycle highlights [2].

[1] https://governance.openstack.org/tc/reference/release-naming.html
[2] I02ed58bb5a4ecdc8171d9aa4a150be1bca214528

Change-Id: Id4e2e672e3a1a5aba7e664ba2d2f701b9be988e0
2023-03-01 10:26:53 +01:00
Zuul 6ec6f14629 Merge "Enable cpus when an instance is spawning" 2023-02-18 12:34:17 +00:00
Zuul 5c32d5efe1 Merge "libvirt: Add configuration options to set SPICE compression settings" 2023-02-17 01:45:41 +00:00
Sylvain Bauza 0807b7ae9a Enable cpus when an instance is spawning
By this patch, we now automatically power down or up cores
when an instance is either stopped or started.

Also, by default, we now powersave or offline dedicated cores when
starting the compute service.

Implements: blueprint libvirt-cpu-state-mgmt
Change-Id: Id645fd1ba909683af903f3b8f11c7f06db3401cb
2023-02-10 13:03:39 +01:00
Stephen Finucane fd39e4b4be db: Remove legacy migrations
sqlalchemy-migrate does not (and will not) support sqlalchemy 2.0. We
need to drop these migrations to ensure we can upgrade our sqlalchemy
version.

Change-Id: I7756e393b78296fb8dbf3ca69c759d75b816376d
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
2023-02-01 17:00:03 +00:00
Zuul f01a90ccb8 Merge "api: extend evacuate instance to support target state" 2023-02-01 11:49:46 +00:00
Zuul e9d716f555 Merge "Detect host renames and abort startup" 2023-02-01 07:12:48 +00:00
Zuul c993d8d311 Merge "Add further workaround features for qemu_monitor_announce_self" 2023-01-31 23:07:43 +00:00
Sahid Orentino Ferdjaoui d732ee38a1 api: extend evacuate instance to support target state
Start to v2.95 any evacuated instances will be stopped a destination

Implements: bp/allowing-target-state-for-evacuate
Signed-off-by: Sahid Orentino Ferdjaoui <sahid.ferdjaoui@industrialdiscipline.com>
Change-Id: I141b6f057cc4eb9c541c2bc6eddae27270ede08d
2023-01-31 18:16:00 +01:00
Dan Smith e258164f5a Detect host renames and abort startup
Except on ironic, if we are able to determine that our locally-
configured node has a hostname other than what we expect, we should
abort startup. Since we currently depend on the loose name-based
association of nodes, services, and instances, we need to make sure
we do not startup in an inconsistent configuration.

Related to blueprint stable-compute-uuid
Change-Id: I595b27a57516cffe1f172cf2fb736e1b11373a1d
2023-01-30 12:46:36 -08:00
as0 fba851bf3a Add further workaround features for qemu_monitor_announce_self
In some cases on Arista VXLAN fabrics, VMs are inaccessible via network
after live migration, despite garps being observed on the fabric itself.

This patch builds on the feature
``[workarounds]/enable_qemu_monitor_announce_self`` feature as reported
in `bug 1815989 <https://bugs.launchpad.net/nova/+bug/1815989>`

This patch adds the ability to config the number of times the QEMU
announce_self monitor command is called, and add a new configuration option to
specify a delay between calling the announce_self command multiple times,
as in some cases, multiple announce_self monitor commands are required for
the fabric to honor the garp packets and the VM to become accessible via
the network after live migration.

Closes-Bug: #1996995
Change-Id: I2f5bf7c9de621bb1dc7fae5b3374629a4fcc1f46
2023-01-30 15:44:44 +00:00
Zuul e3b2910023 Merge "Fix rescue volume-based instance" 2023-01-30 11:16:59 +00:00
Zuul ffccf7022e Merge "libvirt: Replace usage of compareCPU() with compareHypervisorCPU()" 2023-01-26 17:50:28 +00:00
Kashyap Chamarthy 468b03e0ee libvirt: Replace usage of compareCPU() with compareHypervisorCPU()
The older libvirt API compareCPU() does not take into account the
capabilities of the "host hypervisor" (KVM, QEMU and the details libvirt
knows about the host) when comparing CPUs.  This is causing unwanted
failures during CPU compatibility checks.  To fix this and other related
problems, libvirt has introduced (in v4.4.0) a newer API,
compareHypervisorCPU(), which _does_ take into account the host
hypervisor's capabilities before comparing CPUs.  This will help fix a
range of problems, such as[1][2].

So let's switch to the newer API, which is largely a drop-in
replacement.

In this patch:

 - Introduce a wrapper method, compare_hypervisor_cpu() for libvirt's
   compareHypervisorCPU() API.

 - Update the _compare_cpu() method to use the wrapper method,
   compare_hypervisor_cpu().

 - Update the unit tests to use the newer API, compareHypervisorCPU().

[1] https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1978064
[2] https://bugzilla.redhat.com/show_bug.cgi?id=2138381

Change-Id: Ib523753f52993cfe72e35e0309e429ca879c125c
Signed-off-by: Kashyap Chamarthy <kchamart@redhat.com>
2023-01-24 11:28:43 +01:00
Zuul d8b4b7bebd Merge "Microversion 2.94: FQDN in hostname" 2023-01-21 05:39:20 +00:00
Artom Lifshitz 9980b9ad52 Microversion 2.94: FQDN in hostname
Extend microversion 2.90 to allow FQDNs in the hostname parameter.
Multi-create with --hostname continues to be refused, returning error
400 to the user. This simplifies the code by not needing to handle any
sort of suffix or prefix mangling of the FQDN to handle multiple
instances. No other changes are made - not Neutron integration,
metadata API changes (the FQDN will appear as-is in places where the
hostname currently appears), etc.

Change-Id: I47e397dc6da8263762479cc8ae4f8777a6d9d811
Implements: bp/fqdn-in-hostname
2023-01-17 10:59:37 -05:00
Zuul 07d1da2fa9 Merge "Enable new defaults and scope checks by default" 2023-01-16 23:49:43 +00:00
Zuul 8e3ffb851b Merge "Allow enabling PCI scheduling in Placement" 2023-01-12 23:08:03 +00:00
Manuel Bentele b5e0ed248f libvirt: Add configuration options to set SPICE compression settings
This patch adds the following SPICE-related options to the 'spice'
configuration group of a Nova configuration:

  - image_compression
  - jpeg_compression
  - zlib_compression
  - playback_compression
  - streaming_mode

These configuration options can be used to enable and set the SPICE
compression settings for libvirt (QEMU/KVM) provisioned instances.
Each configuration option is optional and can be set explictly to
configure the associated SPICE compression setting for libvirt. If all
configuration options are not set, then none of the SPICE compression
settings will be configured for libvirt, which corresponds to the
behavior before this change. In this case, the built-in defaults from
the libvirt backend (e.g. QEMU) are used.

Note that those options are only taken into account if SPICE support is
enabled (and the VNC support is disabled).

Implements: blueprint nova-support-spice-compression-algorithm
Change-Id: Ia7efeb1b1a04504721e1a5bdd1b5fa7a87cdb810
2023-01-11 11:48:17 +00:00
Ghanshyam Mann d97af33c06 Enable new defaults and scope checks by default
As discussed in PTG, we need to test the new RBAC in the
integrated gate and accordingly enable the new defaults
and scope check by default. A new integrated testing job
has been added and results show that the new defaults and
scope checks are working fine. During testing, we found a
few bugs in neutron policies but all are fixed now.

enforce_scope and enforce_new_defaults are oslo policy config
options but they are per service level and the default value
can be overridden. Oslo policy 3.11.0 version allows to override
the default value for these config options[1] so upgrading the
oslo policy version in requirements.txt

Depends-On: https://review.opendev.org/c/openstack/devstack/+/869781
Depends-On: https://review.opendev.org/c/openstack/placement/+/869525

[1] https://github.com/openstack/oslo.policy/blob/3.11.0/oslo_policy/opts.py#L125

Change-Id: I977b2daedf880229c8d364ca011f2ea965b86e3a
2023-01-10 23:37:13 -06:00
Balazs Gibizer 2cb1eedeaf Allow enabling PCI scheduling in Placement
A new configuration option [filter_scheduler]pci_in_placement is added
that allows enabling the scheduler logic for PCI device handling in
Placement for flavor based PCI requests.

blueprint: pci-device-tracking-in-placement
Change-Id: I5ddf6d3cdc7e05cc4914b9b1e762fa02a5c7c550
2023-01-05 17:25:27 +01:00
Zuul d7de0c121a Merge "Ironic nodes with instance reserved in placement" 2022-12-14 14:11:40 +00:00
Zuul e40ac0c798 Merge "Support multiple config file with mod_wsgi" 2022-12-12 15:16:14 +00:00
Sean Mooney 73fe84fa0e Support multiple config file with mod_wsgi
Unlike uwsgi, apache mod_wsgi does not support passing
commandline arguments to the python wsgi script it invokes.

As a result while you can pass --config-file when hosting the
api and metadata wsgi applications with uwsgi there is no
way to use multiple config files with mod_wsgi.

This change mirrors how this is supported in keystone today
by intoducing a new OS_NOVA_CONFIG_FILES env var to allow
operators to optional pass a ';' delimited list of config
files to load.

This change also add docs for this env var and the existing
undocumented OS_NOVA_CONFIG_DIR.

Closes-Bug: 1994056
Change-Id: I8e3ccd75cbb7f2e132b403cb38022787c2c0a37b
2022-12-07 12:36:32 +01:00
Zuul 0ac487fd6c Merge "Don't provide MTU value in metadata service if DHCP is enabled" 2022-11-29 22:02:04 +00:00
Arnaud Morin 4eef0fe635 Unbind port when offloading a shelved instance
When offloading a shelved instance, the compute needs to remove the
binding so the port will appear as "unbound" in neutron.

Closes-Bug: 1983471

Change-Id: Ia49271b126870c7936c84527a4c39ab96b6c5ea7
Signed-off-by: Arnaud Morin <arnaud.morin@ovhcloud.com>
2022-11-29 17:06:46 +01:00
Slawek Kaplonski 6bdc79af30 Don't provide MTU value in metadata service if DHCP is enabled
For networks with subnets with enabled DHCP service don't provide
mtu value in the metadata. That way cloud-init will not configure it
"statically" in e.g. netplan's config file and guest OS will use MTU
value provided by the DHCP service.


Closes-Bug: #1899487
Change-Id: Ib775c2210349b72b3dc033554ac6d8b35b8d2d79
2022-11-29 15:12:24 +00:00
John Garbutt 3c022e9683 Ironic nodes with instance reserved in placement
Currently, when you delete an ironic instance, we trigger
and undeploy in ironic and we release our allocation in placement.
We do this well before the ironic node is actually available.

We have attempted to fix this my marking unavailable nodes
as reserved in placement. This works great until you try
and re-image lots of nodes.

It turns out, ironic nodes that are waiting for their automatic
clean to finish, are returned as a valid allocation candidates
for quite some time. Eventually we mark then as reserved.

This patch takes a strange approach, if we mark all nodes as
reserved as soon as the instance lands, we close the race.
That is, when the allocation is removed the node is still
unavailable until the next update of placement is done and
notices that the node has become available. That may or may
not have been after automatic cleaning. The trade off is
that when you don't have automatic cleaning, we wait a bit
longer to notice the node is available again.

Note, this is also useful when a broken Ironic node is
marked as in-maintainance while it is in-use by a nova
instance. In a similar way, we mark the Nova as reserved
immmeidately, rather than first waiting for the instance to be
deleted before reserving the resources in Placement.

Closes-Bug: #1974070
Change-Id: Iab92124b5776a799c7f90d07281d28fcf191c8fe
2022-11-17 14:09:08 +00:00