womax/nova - nova - Gitea: Git with a cup of tea

womax/nova

Author	SHA1	Message	Date
Nobuhiro MIKI	2fd034ec48	libvirt: Add 'COMPUTE_ADDRESS_SPACE_*' traits support Based on the Libvirt and QEMU versions, two traits, COMPUTE_ADDRESS_SPACE_PASSTHROUGH and COMPUTE_ADDRESS_SPACE_EMULATED, are controlled. Since the two are supported from the same version on the Libvirt and QEMU, Nova handles them in the same way. Blueprint: libvirt-maxphysaddr-support Depends-On: https://review.opendev.org/c/openstack/os-traits/+/871226 Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp> Change-Id: If6c7169b7b8f43ad15a8992831824fb546e85aab	2023-07-24 17:09:19 +09:00
Zuul	7e25b672ef	Merge "Add a new policy for cold-migrate with host"	2023-07-21 16:52:51 +00:00
Zuul	b6f4c57b43	Merge "Drop Fedora support"	2023-07-18 17:49:07 +00:00
Zuul	9f77ba3b63	Merge "Add config option to configure TB cache size"	2023-07-17 23:44:45 +00:00
Zuul	a0559af692	Merge "db: Avoid relying on autobegin"	2023-07-17 14:37:39 +00:00
Zuul	e02c5f0e7a	Merge "Populate ComputeNode.service_id"	2023-07-14 22:41:39 +00:00
yatinkarel	3f7cc63d94	Add config option to configure TB cache size Qemu>=5.0.0 bumped the default tb-cache size to 1GiB(from 32MiB) and this made it difficult to run multiple guest VMs on systems running with lower memory. With Libvirt>=8.0.0 it's possible to configure lower tb-cache size. Below config option is introduced to allow configure TB cache size as per environment needs, this only applies to 'virt_type=qemu':- [libvirt]tb_cache_size Also enable this flag in nova-next job. [1] https://github.com/qemu/qemu/commit/600e17b26 [2] https://gitlab.com/libvirt/libvirt/-/commit/58bf03f85 Closes-Bug: #1949606 Implements: blueprint libvirt-tb-cache-size Change-Id: I49d2276ff3d3cc5d560a1bd96f13408e798b256a	2023-07-13 19:35:52 +05:30
Amit Uniyal	f7ce4df51c	Refactor CinderFixture Replaced stub with MockPatch. Change-Id: Iaf4a9182b79ec4d1c2d3436b3dc9a6c760cd48f9	2023-07-12 14:16:03 +00:00
Sean Mooney	6f56c5c9fd	enable validations in nova-lvm As of I8ca059a4702471d4d30ea5a06079859eba3f5a81 validations are now requried for test_rebuild_volume_backed_server. Validations are also required for any volume attach/detach based test in general due to know qemu issues. This patch just turns them back on to unblock the gate. Closes-Bug: #2025813 Change-Id: Ia198f712e2ad277743aed08e27e480208f463ac7	2023-07-04 15:49:11 +00:00
Zuul	4b454febf7	Merge "database: Archive parent and child rows "trees" one at a time"	2023-07-02 08:08:52 +00:00
Zuul	d56d1a828d	Merge "Verify a move operation for cross_az_attach=False"	2023-06-26 11:44:46 +00:00
Sylvain Bauza	2d320f9b00	Add a new policy for cold-migrate with host We add a new specific policy when a host value is provided for cold-migrate, but by default it will only be an admin-only rule in order to not change the behaviour. Change-Id: I128242d5f689fdd08d74b1dcba861177174753ff Implements: blueprint cold-migrate-to-host-policy	2023-06-26 11:34:12 +02:00
jskunda	86c542c56a	Drop Fedora support We are about to drop Fedora support as the latest image in upstream has been transitioned to EOL. Centos 9 Stream has evolved as replacement platform for new features. Patch which removes fedora jobs and nodeset from devstack: https://review.opendev.org/c/openstack/devstack/+/885467 Change-Id: Ib7d3dd93602c94fd801f8fe5daa26353b04f589b	2023-06-21 23:58:44 +02:00
melanie witt	697fa3c000	database: Archive parent and child rows "trees" one at a time Previously, we archived deleted rows in batches of max_rows parents + their child rows in a single database transaction. Doing it that way limited how high a value of max_rows could be specified by the caller because of the size of the database transaction it could generate. For example, in a large scale deployment with hundreds of thousands of deleted rows and constant server creation and deletion activity, a value of max_rows=1000 might exceed the database's configured maximum packet size or timeout due to a database deadlock, forcing the operator to use a much lower max_rows value like 100 or 50. And when the operator has e.g. 500,000 deleted instances rows (and millions of deleted rows total) they are trying to archive, being forced to use a max_rows value several orders of magnitude lower than the number of rows they need to archive was a poor user experience. This changes the logic to archive one parent row and its foreign key related child rows one at a time in a single database transaction while limiting the total number of rows per table as soon as it reaches >= max_rows. Doing this will allow operators to choose more predictable values for max_rows and get more progress per invocation of archive_deleted_rows. Closes-Bug: #2024258 Change-Id: I2209bf1b3320901cf603ec39163cf923b25b0359	2023-06-20 20:04:46 +00:00
melanie witt	f6620d48c8	testing: Fix and robustify archive_deleted_rows test The regexes in test_archive_deleted_rows for multiple cells were incorrect in that they were not isolating the search pattern and rather could match with other rows in the result table as well, resulting in a false positive. This fixes the regexes and also adds one more server to the test scenario in order to make sure archive_deleted_rows iterates at least once to expose bugs that may be present in its internal iteration. This patch is in preparation for a future patch that will change the logic in archive_deleted_rows. Making this test more robust will more thoroughly test for regression. Change-Id: If39f6afb6359c67aa38cf315ec90ffa386d5c142	2023-06-16 06:13:49 +00:00
Zuul	308633f93a	Merge "cpu: fix the privsep issue when offlining the cpu"	2023-06-07 21:37:30 +00:00
Zuul	a4ca440ed8	Merge "tests: Add missing args to sqlalchemy.Table"	2023-06-07 16:54:24 +00:00
Zuul	c7b77aa17f	Merge "tests: Pass parameters to sqlalchemy.text() as bindparams"	2023-06-07 16:54:17 +00:00
Zuul	0e997a428c	Merge "db: Remove unnecessary 'insert()' argument"	2023-06-07 16:54:08 +00:00
Zuul	c472d829fa	Merge "db: Don't rely on branched connections"	2023-06-07 16:54:01 +00:00
Zuul	1fe8c4becb	Merge "Fix failed count for anti-affinity check"	2023-06-07 14:35:52 +00:00
Zuul	fc8951efb9	Merge "Process unlimited exceptions raised by unplug_vifs"	2023-06-07 14:16:10 +00:00
Zuul	86b1f1505a	Merge "Add debug logging when Instance raises OrphanedObjectError"	2023-06-06 20:07:47 +00:00
Sylvain Bauza	3fab43786b	cpu: fix the privsep issue when offlining the cpu In Icb913ed9be8d508de35e755a9c650ba25e45aca2 we forgot to add a privsep decorator for the set_offline() method. Change-Id: I769d35907ab9466fe65b942295fd7567a757087a Closes-Bug: #2022955	2023-06-06 16:26:05 +02:00
Yusuke Okada	56d320a203	Fix failed count for anti-affinity check The late anti-affinity check runs in the compute manager to avoid parallel scheduling requests to invalidate the anti-affinity server group policy. When the check fails the instance is re-scheduled. However this failure counted as a real instance boot failure of the compute host and can lead to de-prioritization of the compute host in the scheduler via BuildFailureWeigher. As the late anti-affinity check does not indicate any fault of the compute host itself it should not be counted towards the build failure counter. This patch adds new build results to handle this case. Closes-Bug: #1996732 Change-Id: I2ba035c09ace20e9835d9d12a5c5bee17d616718 Signed-off-by: Yusuke Okada <okada.yusuke@fujitsu.com>	2023-06-06 10:15:16 +02:00
Dan Smith	afad847e4d	Populate ComputeNode.service_id The ComputeNode object already has a service_id field that we stopped using a while ago. This moves us back to the point where we set it when creating new ComputeNode records, and also migrates existing records when they are loaded. The resource tracker is created before we may have created the service record, but is updated afterwards in the pre_start_hook(). So this adds a way for us to pass the service_ref to the resource tracker during that hook so that it is present before the first time we update all of our ComputeNode records. It also makes sure to pass the Service through from the actual Service manager instead of looking it up again to make sure we maintain the tight relationship and avoid any name-based ambiguity. Related to blueprint compute-object-ids Change-Id: I5e060d674b6145c9797c2251a2822106fc6d4a71	2023-05-31 07:06:34 -07:00
Zuul	971809b4d4	Merge "db: Remove the legacy 'migration_version' table"	2023-05-30 20:56:59 +00:00
melanie witt	6f79d6321e	Enforce quota usage from placement when unshelving When [quota]count_usage_from_placement = true or [quota]driver = nova.quota.UnifiedLimitsDriver, cores and ram quota usage are counted from placement. When an instance is SHELVED_OFFLOADED, it will not have allocations in placement, so its cores and ram should not count against quota during that time. This means however that when an instance is unshelved, there is a possibility of going over quota if the cores and ram it needs were allocated by some other instance(s) while it was SHELVED_OFFLOADED. This fixes a bug where quota was not being properly enforced during unshelve of a SHELVED_OFFLOADED instance when quota usage is counted from placement. Test coverage is also added for the "recheck" quota cases. Closes-Bug: #2003991 Change-Id: I4ab97626c10052c7af9934a80ff8db9ddab82738	2023-05-23 01:02:05 +00:00
melanie witt	427b2cb4d6	Reproducer for bug 2003991 unshelving offloaded instance This adds test coverage for: * Shelve/unshelve offloaded with legacy quota usage * Shelve/unshelve offloaded with quota usage from placement * Shelve/unshelve offloaded with unified limits * Shelve/unshelve with legacy quota usage * Shelve/unshelve with quota usage from placement * Shelve/unshelve with unified limits Related-Bug: #2003991 Change-Id: Icc9b6366aebba2f8468e2127da7b7e099098513a	2023-05-22 22:19:01 +00:00
Zuul	71b105a4cf	Merge "Fixes a typo in availability-zone doc"	2023-05-22 19:49:34 +00:00
Zuul	2dde4538bc	Merge "vmwareapi: Mark driver as experimental"	2023-05-18 15:08:04 +00:00
Amit Uniyal	e2264d7657	Fixes a typo in availability-zone doc Change-Id: Ic1bb8abaf2cbdac31a4503b12f38e5e2d5aadcfd	2023-05-18 06:25:18 +00:00
melanie witt	e0fbb6fc06	Add debug logging when Instance raises OrphanedObjectError This logging would be helpful in debugging issues when OrphanedObjectError is raised by an instance. Currently, there is not a way to identify which instance is attempting to lazy-load a field while orphaned. Being able to locate the instance in the database could also help with recovery/cleanup when a problematic record is disrupting operation of a deployment. Change-Id: I093de2839c1bb7c949a0812e07b63de4cc5ed167	2023-05-17 23:54:46 +00:00
Zuul	02a63e0f1b	Merge "tests: Use GreenThreadPoolExecutor.shutdown(wait=True)"	2023-05-17 09:20:59 +00:00
melanie witt	c095cfe04e	tests: Use GreenThreadPoolExecutor.shutdown(wait=True) We are still having some issues in the gate where greenlets from previous tests continue to run while the next test starts, causing false negative failures in unit or functional test jobs. This adds a new fixture that will ensure GreenThreadPoolExecutor.shutdown() is called with wait=True, to wait for greenlets in the pool to finish running before moving on. In local testing, doing this does not appear to adversely affect test run times, which was my primary concern. As a baseline, I ran a subset of functional tests in a loop until failure without the patch and after 11 hours, I got a failure reproducing the bug. With the patch, running the same subset of functional tests in a loop has been running for 24 hours and has not failed yet. Based on this, I think it may be worth trying this out to see if it will help stability of our unit and functional test jobs. And if it ends up impacting test run times or causes other issues, we can revert it. Partial-Bug: #1946339 Change-Id: Ia916310522b007061660172fa4d63d0fde9a55ac	2023-05-17 00:57:37 +00:00
Zuul	b3fdd7ccf0	Merge "doc: Update version info"	2023-05-11 23:34:27 +00:00
Elod Illes	fe125da63b	CI: fix backport validator for new branch naming validate-backport job started to fail as only old stable branch naming is accepted. This patch extends the script to allow numbers and dot as well in the branch names (like stable/2023.1). Change-Id: Icbdcd5d124717e195d55d9e42530611ed812fadd	2023-05-11 16:23:53 +02:00
Zuul	e9a54ff350	Merge "Bump nova-ceph-multstore timeout"	2023-05-11 10:51:49 +00:00
Dan Smith	6ff3237149	Bump nova-ceph-multstore timeout The recent change(s) to enable a lot more SSHABLE checks puts the runtime of the ceph job really close to the 2h timeout even when things are working. Sometimes it times out before it finishes even though things are progressing. Bump the timeout to avoid that. Also bump us to 8G swap to match what is set on the parent ceph job when we upgraded to jammy. We could just unset this, but better to pin it high in case that job (defined elsewhere) changes. Our job is the largest ceph job, so it makes sense that it keeps its own swap level high. Change-Id: I6cefd87671614d87d92e4675fbc989fc9453c8b9	2023-05-10 17:54:38 -07:00
melanie witt	41c64b94b0	Enable use of service user token with admin context When the [service_user] section is configured in nova.conf, nova will have the ability to send a service user token alongside the user's token. The service user token is sent when nova calls other services' REST APIs to authenticate as a service, and service calls can sometimes have elevated privileges. Currently, nova does not however have the ability to send a service user token with an admin context. This means that when nova makes REST API calls to other services with an anonymous admin RequestContext (such as in nova-manage or periodic tasks), it will not be authenticated as a service. This adds a keyword argument to service_auth.get_auth_plugin() to enable callers to provide a user_auth object instead of attempting to extract the user_auth from the RequestContext. The cinder and neutron client modules are also adjusted to make use of the new user_auth keyword argument so that nova calls made with anonymous admin request contexts can authenticate as a service when configured. Related-Bug: #2004555 Change-Id: I14df2d55f4b2f0be58f1a6ad3f19e48f7a6bfcb4	2023-05-10 14:52:59 +00:00
melanie witt	db455548a1	Use force=True for os-brick disconnect during delete The 'force' parameter of os-brick's disconnect_volume() method allows callers to ignore flushing errors and ensure that devices are being removed from the host. We should use force=True when we are going to delete an instance to avoid leaving leftover devices connected to the compute host which could then potentially be reused to map to volumes to an instance that should not have access to those volumes. We can use force=True even when disconnecting a volume that will not be deleted on termination because os-brick will always attempt to flush and disconnect gracefully before forcefully removing devices. Closes-Bug: #2004555 Change-Id: I3629b84d3255a8fe9d8a7cea8c6131d7c40899e8	2023-05-10 07:09:05 -07:00
Zuul	105afb338b	Merge "Revert "Debug Nova APIs call failures""	2023-05-09 23:54:04 +00:00
Zuul	0c397d60e7	Merge "Handle zero pinned CPU in a cell with mixed policy"	2023-05-09 12:46:37 +00:00
Zuul	07b7db090d	Merge "Reproduce asym NUMA mixed CPU policy bug"	2023-05-05 10:07:51 +00:00
Zuul	8e4a7290f8	Merge "Fix get_segments_id with subnets without segment_id"	2023-05-04 10:31:26 +00:00
Zuul	cb2cdee4c9	Merge "Have host look for CPU controller of cgroupsv2 location."	2023-05-04 10:27:32 +00:00
Zuul	deac3a2f8a	Merge "Save cell socket correctly when updating host NUMA topology"	2023-05-04 10:27:24 +00:00
Zuul	ad3b3681b6	Merge "add hypervisor version weigher"	2023-05-04 01:29:06 +00:00
Jorge San Emeterio	973ff4fc1a	Have host look for CPU controller of cgroupsv2 location. Make the host class look under '/sys/fs/cgroup/cgroup.controllers' for support of the cpu controller. The host will try searching through cgroupsv1 first, just like up until now, and in the case that fails, it will try cgroupsv2 then. The host will not support the feature if both checks fail. This new check needs to be mocked by all tests that focus on this piece of code, as it touches a system file that requires privileges. For such thing, the CGroupsFixture is defined to easily add suck mocking to all test cases that require so. I also removed old mocking at test_driver.py in favor of the fixture from above. Partial-Bug: #2008102 Change-Id: I99b57c27c8a4425389bec2b7f05af660bab85610	2023-05-03 15:03:07 -07:00
Sylvain Bauza	6d7bd6a034	Fix get_segments_id with subnets without segment_id Unfortunatly when we merged Ie166f3b51fddeaf916cda7c5ac34bbcdda0fd17a we forgot that subnets can have no segment_id field. Change-Id: Idb35b7e3c69fe8efe498abe4ebcc6cad8918c4ed Closes-Bug: #2018375	2023-05-03 17:00:14 +02:00

1 2 3 4 5 ...

60281 Commits