Fix up some sphinx-isms in nova-manage.rst:
- Replace ``[group]/option`` with :oslo.config:option:`group.option`
- Replace ``[group]`` with :oslo.config:group:`group`
- Replace `default role used for italics` with *explicit italics*
- Fix up a mention of [placement_database] -- for which we can't use
:oslo.config:* because that section doesn't exist anymore -- to
pertain only to Rocky & Stein, where it did exist.
Change-Id: I9cfb9d0b20d4dfc18236a2d528a8f65534c9a263
CONF.api_database.connection is a variable in code, not something
an operator needs to know what it means, so this changes that
mention in the docs and error message for the nova-manage db
archive_deleted_rows command.
Change-Id: If27814e0006a6c33ae6270dff626586c41eafcad
Closes-Bug: #1839391
If the instances per host are not cached in the HostManager
we lookup the HostMapping per candidate compute node during
each scheduling request to get the CellMapping so we can target
that cell database to pull the instance uuids on the given host.
For example, if placement returned 20 compute node allocation
candidates and we don't have the instances cached for any of those,
we'll do 20 queries to the API DB to get host mappings.
We can improve this by caching the host to cell uuid after the first
lookup for a given host and then after that, get the CellMapping
from the cells cache (which is a dict, keyed by cell uuid, to the
CellMapping for that cell).
Change-Id: Ic6b1edfad2e384eb32c6942edc522ee301123cbc
Related-Bug: #1737465
The "Request Spec Migration" upgrade status check was added
in Rocky [1] and the related online data migration was removed
in Stein [2]. Now that we're in Train we should be able to
remove the upgrade status check.
The verify install docs are updated to remove the now-removed
check along with "Console Auths" which were removed in Train [3].
[1] I1fb63765f0b0e8f35d6a66dccf9d12cc20e9c661
[2] Ib0de49b3c0d6b3d807c4eb321976848773c1a9e8
[3] I5c7e54259857d9959f5a2dfb99102602a0cf9bb7
Change-Id: I6dfa0b226ab0b99864fc27209ebb7b73e3f73512
Before I97f06d0ec34cbd75c182caaa686b8de5c777a576 it was possible to
create servers with neutron ports which had resource_request (e.g. a
port with QoS minimum bandwidth policy rule) without allocating the
requested resources in placement. So there could be servers for which
the allocation needs to be healed in placement.
This patch extends the nova-manage heal_allocation CLI to create the
missing port allocations in placement and update the port in neutron
with the resource provider uuid that is used for the allocation.
There are known limiations of this patch. It does not try to reimplement
Placement's allocation candidate functionality. Therefore it cannot
handle the situation when there is more than one RP in the compute
tree which provides the required traits for a port. In this situation
deciding which RP to use would require the in_tree allocation candidate
support from placement which is not available yet and 2) information
about which PCI PF an SRIOV port is allocated from its VF and which RP
represents that PCI device in placement. This information is only
available on the compute hosts.
For the unsupported cases the command will fail gracefully. As soon as
migration support for such servers are implemented in the blueprint
support-move-ops-with-qos-ports the admin can heal the allocation of
such servers by migrating them.
During healing both placement and neutron need to be updated. If any of
those updates fail the code tries to roll back the previous updates for
the instance to make sure that the healing can be re-run later without
issue. However if the rollback fails then the script will terminate with
an error message pointing to documentation that describes how to
recover from such a partially healed situation manually.
Closes-Bug: #1819923
Change-Id: I4b2b1688822eb2f0174df0c8c6c16d554781af85
Since cells v2 was introduced, nova operators must run two commands to
migrate the database schemas of nova's databases - nova-manage api_db
sync and nova-manage db sync. It is necessary to run them in this order,
since the db sync may depend on schema changes made to the api database
in the api_db sync. Executing the db sync first may fail, for example
with the following seen in a Queens to Rocky upgrade:
nova-manage db sync
ERROR: Could not access cell0.
Has the nova_api database been created?
Has the nova_cell0 database been created?
Has "nova-manage api_db sync" been run?
Has "nova-manage cell_v2 map_cell0" been run?
Is [api_database]/connection set in nova.conf?
Is the cell0 database connection URL correct?
Error: (pymysql.err.InternalError) (1054, u"Unknown column
'cell_mappings.disabled' in 'field list'") [SQL: u'SELECT
cell_mappings.created_at AS cell_mappings_created_at,
cell_mappings.updated_at AS cell_mappings_updated_at,
cell_mappings.id AS cell_mappings_id, cell_mappings.uuid AS
cell_mappings_uuid, cell_mappings.name AS cell_mappings_name,
cell_mappings.transport_url AS cell_mappings_transport_url,
cell_mappings.database_connection AS
cell_mappings_database_connection, cell_mappings.disabled AS
cell_mappings_disabled \nFROM cell_mappings \nWHERE
cell_mappings.uuid = %(uuid_1)s \n LIMIT %(param_1)s'] [parameters:
{u'uuid_1': '00000000-0000-0000-0000-000000000000', u'param_1': 1}]
(Background on this error at: http://sqlalche.me/e/2j85)
Despite this error, the command actually exits zero, so deployment tools
are likely to continue with the upgrade, leading to issues down the
line.
This change modifies the command to exit 1 if the cell0 sync fails.
This change also clarifies this ordering in the upgrade and nova-manage
documentation, and adds information on exit codes for the command.
Change-Id: Iff2a23e09f2c5330b8fc0e9456860b65bd6ac149
Closes-Bug: #1832860
The --before option to nova manage db purge and archive_deleted_rows
accepts a string to be parsed by dateutils.parser.parse() with
fuzzy=True. This is fairly forgiving, but doesn't handle e.g. "now - 1
day". This commit adds some clarification to the help strings, and some
examples to the docs.
Change-Id: Ib218b971784573fce16b6be4b79e0bf948371954
We're going to remove all the code, but first, remove the docs.
Part of blueprint remove-consoleauth
Change-Id: Ie96e18ea7762b93b4116b35d7ebcfcbe53c55527
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
When the 'nova-manage cellv2 discover_hosts' command is run in parallel
during a deployment, it results in simultaneous attempts to map the
same compute or service hosts at the same time, resulting in
tracebacks:
"DBDuplicateEntry: (pymysql.err.IntegrityError) (1062, u\"Duplicate
entry 'compute-0.localdomain' for key 'uniq_host_mappings0host'\")
[SQL: u'INSERT INTO host_mappings (created_at, updated_at, cell_id,
host) VALUES (%(created_at)s, %(updated_at)s, %(cell_id)s,
%(host)s)'] [parameters: {'host': u'compute-0.localdomain',
%'cell_id': 5, 'created_at': datetime.datetime(2019, 4, 10, 15, 20,
%50, 527925), 'updated_at': None}]
This adds more information to the command help and adds a warning
message when duplicate host mappings are detected with guidance about
how to run the command. The command will return 2 if a duplicate host
mapping is encountered and the documentation is updated to explain
this.
This also adds a warning to the scheduler periodic task to recommend
enabling the periodic on only one scheduler to prevent collisions.
We choose to warn and stop instead of ignoring DBDuplicateEntry because
there could potentially be a large number of parallel tasks competing
to insert duplicate records where only one can succeed. If we ignore
and continue to the next record, the large number of tasks will
repeatedly collide in a tight loop until all get through the entire
list of compute hosts that are being mapped. So we instead stop the
colliding task and emit a message.
Closes-Bug: #1824445
Change-Id: Ia7718ce099294e94309103feb9cc2397ff8f5188
Add a parameter to limit the archival of deleted rows by date. That is,
only rows related to instances deleted before provided date will be
archived.
This option works together with --max_rows, if both are specified both
will take effect.
Closes-Bug: #1751192
Change-Id: I408c22d8eada0518ec5d685213f250e8e3dae76e
Implements: blueprint nova-archive-before
The compute API has required cinder API >= 3.44 since Queens [1] for
working with the volume attachments API as part of the wider
volume multi-attach support.
In order to start removing the compatibility code in the compute API
this change adds an upgrade check for the minimum required cinder API
version (3.44).
[1] Ifc01dbf98545104c998ab96f65ff8623a6db0f28
Change-Id: Ic9d1fb364e06e08250c7c5d7d4bdb956cb60e678
This patch adds the translation of `RequestGroup.in_tree` to the
actual placement query and bumps microversion to enable it.
The release note for this change is added.
Change-Id: I8ec95d576417c32a57aa0298789dac6afb0cca02
Blueprint: use-placement-in-tree
Related-Bug: #1777591
These are no longer necessary with the removal of cells v1. A check for
cells v1 in 'nova-manage cell_v2 simple_cell_setup' is also removed,
meaning this can no longer return the '2' exit code.
Part of blueprint remove-cells-v1
Change-Id: I8c2bfb31224300bc639d5089c4dfb62143d04b7f
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
This resolves one of the TODOs in the heal_allocations CLI
by adding an --instance option to the command which, when
specified, will process just the single instance given.
Change-Id: Icf57f217f03ac52b1443addc34aa5128661a8554
This resolves one of the TODOs in the heal_allocations CLI
by adding a --dry-run option which will still print the output
as we process instances but not commit any allocation changes
to placement, just print out that they would happen.
Change-Id: Ide31957306602c1f306ebfa48d6e95f48b1e8ead
This migration was added in Ocata and backported to Newton in change
I8a05ee01ec7f6a6f88b896f78414fb5487e0071e to deal with Mitaka-era
build_requests records that would not have an instance_uuid value
and thus raise a ValueError in BuildRequest._from_db_object (because
BuildRequest.instance_uuid is not nullable).
This is essentially a revert of that change now since operators
have had long enough to run the migration. If anyone were to skip
level upgrade from Mitaka to Train (which we don't support, we require
you to roll through), and hit an issue with this they could simply
execute this on their nova_api DB:
DELETE FROM build_requests WHERE instance_uuid IS NULL;
Change-Id: Ie9593657544b7aef1fd7a5c8f01e30e09e3fcce6
This was added in Newton:
I97b72ae3e7e8ea3d6b596870d8da3aaa689fd6b5
And was meant to migrate keypairs from the cell
(nova) DB to the API DB. Before that though, the
keypairs per instance would be migrated to the
instance_extra table in the cell DB. The migration
to instance_extra was dropped in Queens with change:
Ie83e7bd807c2c79e5cbe1337292c2d1989d4ac03
As the commit message on ^ mentions, the 345 cell
DB schema migration required that the cell DB keypairs
table was empty before you could upgrade to Ocata.
The migrate_keypairs_to_api_db routine only migrates
any keypairs to the API DB if there are entries in the
keypairs table in the cell DB, but because of that blocker
migration in Ocata that cannot be the case anymore, so
really migrate_keypairs_to_api_db is just wasting time
querying the database during the online_data_migrations
routine without it actually migrating anything, so we
should just remove it.
Change-Id: Ie56bc411880c6d1c04599cf9521e12e8b4878e1e
Closes-Bug: #1822613
Since API tables do not have the concept of soft-delete, we purge
the instance_mappings, request_specs and instance_group_member records
of deleted instances while they are archived. The ``nova-manage db
archive_deleted_rows`` offers a ``max-rows`` parameter which actually
means the batch size of the iteration for moving the soft-deleted
records from table to their shadow-tables. So this patch clarifies
that the batch size does not include the API table records that are
purged so that the users are not confused by the ``--verbose`` output
of the command giving more rows than specified.
Change-Id: I652854c7192b996a33ed343a51a0fd8c7620e876
Closes-Bug: #1794994
As discussed in the mailing list [1] since cells v1
has been deprecated since Pike and the biggest user
of it (CERN as far as we know) moved to cells v2
in Queens, we can start rolling back the cells v1
specific documentation to avoid confusing people
new to nova about what cells is and making them
understand there was an optional v1.
There are still a few mentions of cells v1 left in
here for things like adding a new cell which need
to be re-written and for that I've left a todo.
Users can still get at cells v1 specific docs from
published stable branches and/or rebuilding the
docs from before this change.
[1] http://lists.openstack.org/pipermail/openstack-discuss/2019-February/002569.html
Change-Id: Idaa04a88b6883254cad9a8c6665e1c63a67e88d3
Adds configuration option ``[api]/local_metadata_per_cell``
to allow user run Nova metadata API service per cell. Doing
this can avoid query API DB for instance information each
time an instance query for its metadata.
Implements blueprint run-meta-api-per-cell
Change-Id: I2e6ebb551e782e8aa0ac90169f4d4b8895311b3c
With change I11083aa3c78bd8b6201558561457f3d65007a177
the code for the API Service Version upgrade check no
longer exists and therefore the upgrade check itself
is meaningless now.
Change-Id: I68b13002bc745c2c9ca7209b806f08c30272550e
With placement being extracted from nova, the
"Resource Providers" nova-status upgrade check no
longer works as intended since the placement data
will no longer be in the nova_api database. As a
result the check can fail on otherwise properly
deployed setups with placement extracted.
This check was originally intended to ease the upgrade
to Ocata when placement was required for nova to work,
as can be seen from the newton/ocata/pike references
in the code.
Note that one could argue the check itself as a concept
is still useful for fresh installs to make sure everything
is deployed correctly and nova-compute is properly
reporting into placement, but for it to be maintained
we would have to change it to no longer rely on the
nova_api database and instead use the placement REST API,
which while possible, might not be worth the effort or
maintenance cost.
For simplicity and expediency, the check is simply removed
in this change.
Related mailing list discussion can be found here [1].
[1] http://lists.openstack.org/pipermail/openstack-discuss/2018-December/000454.html
Change-Id: I630a518d449a64160c81410245a22ed3d985fb01
Closes-Bug: #1808956
The installation of the nova-consoleauth service was erroneously
removed from the docs prematurely. The nova-consoleauth service
is still being used in Rocky, with the removal being possible in
Stein.
This should have been fixed as part of change
Ibbdc7c50c312da2acc59dfe64de95a519f87f123 but was missed.
This is also related to the release note update in Rocky
under change Ie637b4871df8b870193b5bc07eece15c03860c06.
Co-Authored-By: Matt Riedemann <mriedem.os@gmail.com>
Closes-Bug: #1793255
Related-Bug: #1798188
Change-Id: Ied268da9e70bd2807c2dfe7a479181fbec52979d
Placement documents have been published since
I667387ec262680af899a628520c107fa0d4eec24.
So use links to placement documents
https://docs.openstack.org/placement/latest/
in nova documents.
Change-Id: I218a6d11fea934e8991e41b4b36203c6ba3e3dbf
This will check if a deployment is currently using consoles and warns
the operator to set [workarounds]enable_consoleauth = True on their
console proxy host if they are performing a rolling upgrade which is
not yet complete.
Partial-Bug: #1798188
Change-Id: Idd6079ce4038d6f19966e98bcc61422b61b3636b
This is a follow up to commit c4c6dc736 to clarify some
confusing comments in the code, add more comments in
the actual runtime code, and also provide an example
in the CLI man page docs along with an explanation of
the output, specifically for the case that $found>0
but done=0 and what that means.
Change-Id: I0691caab2c44d3189504c54e51bb263ecdc5d1d2
Related-Bug: #1794364
When online_data_migrations raise exceptions, nova/cinder-manage catches
the exceptions, prints fairly useless "something didn't work" messages,
and moves on. Two issues:
1) The user(/admin) has no way to see what actually failed (exception
detail is not logged)
2) The command returns exit status 0, as if all possible migrations have
been completed successfully - this can cause failures to get missed,
especially if automated
This change adds logging of the exceptions, and introduces a new exit
status of 2, which indicates that no updates took effect in the last
batch attempt, but some are (still) failing, which requires intervention.
Change-Id: Ib684091af0b19e62396f6becc78c656c49a60504
Closes-Bug: #1796192
This is a relic that has long since been replaced by the noVNC proxy
service. Start preparing for its removal.
Change-Id: Icb225dec3ad291b751e475bd3703ce0eb30b44db
Add a thin wrapper to invoke the POST /reshaper placement API with
appropriate error checking. This bumps the placement minimum to the
reshaper microversion, 1.30.
Change-Id: Idf8997d5efdfdfca6967899a0882ffb9ecf96915
blueprint: reshape-provider-tree
This adds the "nova-manage placement sync_aggregates"
command which will compare nova host aggregates to
placement resource provider aggregates and add any
missing resource provider aggregates based on the nova
host aggregates.
At this time, it's only additive in that the command
does not remove resource provider aggregates if those
matching nodes are not found in nova host aggregates.
That likely needs to happen in a change that provides
an opt-in option for that behavior since it could be
destructive for externally-managed provider aggregates
for things like ironic nodes or shared storage pools.
Part of blueprint placement-mirror-host-aggregates
Change-Id: Iac67b6bf7e46fbac02b9d3cb59efc3c59b9e56c8