Some things that were missed in previous patches and are thrown together
here:
- Add alembic as an explicit dependency (we were getting it transitively
from oslo.db). We also bump the sqlalchemy dependency to a 1.4.x
release, which is the minimum supported by our chosen version of
alembic (more on this below)
- Remove tooling related to the old migrations
- Fix the tox whitelisting of the flaky MySQL tests
On the SQLAlchemy front, we opt for 1.4.13. Technically alembic should
support anything from 1.4.0, however, with SQLAlchemy >= 1.4.0, < 1.4.13
we see errors like the following in some tests:
sqlalchemy.exc.InvalidRequestError: Entity namespace for
"count(instance_mappings.id)" has no property "queued_for_delete"
There's nothing specific about this in the release notes for 1.4.13 [1]
but it definitely fixes things.
[1] https://docs.sqlalchemy.org/en/14/changelog/changelog_14.html#change-1.4.13
Change-Id: I4c8eb13f11aa7471c26a5ba326319aef245c9836
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Alembic does lots of new things. Provide docs for how to use this. We
also improve upgrade docs slightly, removing references to ancient
reviews that are no longer really helpful as well as calling out our N
-> N+1 constraint.
Change-Id: I3760b82ce3bd71aa0a760d7137d69dfa3f29dc1d
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Let's take advantage of the features that alembic provides [1]. We use
this opportunity to clean up how we do our base model creation and
remove the downgrade section from the migration script template, since
we don't support downgrades.
[1] https://alembic.sqlalchemy.org/en/latest/autogenerate.html
Change-Id: I18846a5c7557db45bb63b97c7e8be5c4367e4547
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
This looks more complicated than it is, but it's really quite simple.
Essentially we have to deal with two possible configurations:
- For existing deployments, the DB sync operation should apply any
outstanding sqlalchemy-migrate-based migrations, dummy apply the
initial alembic migration, and then apply any additional alembic-based
migrations requested (or any available, if no version is specified).
- For new deployments, the DB sync operation should apply the initial
alembic migration and any additional alembic-based migrations
requested (or any available, if no version is specified). No
sqlalchemy-migrate-based migrations will ever be applied.
While we continue to allow users to request a specific database
migration version to upgrade to, we *do not* allow them to request a
sqlalchemy-migrate-based migration version. There's no good reason to do
this - the deployment won't run with an out-of-date DB schema (something
that's also true of the alembic migration, fwiw) - and we want to get
people off of sqlalchemy-migrate as fast as possible. A change in a
future release can remove the sqlalchemy-migrate-based migrations once
we're sure that they'll have upgraded to a release including all of the
sqlalchemy-migrated-based migrations (so Wallaby).
Tests are modified to validate the sanity of these operations. They're
mostly trivial changes, but we do need to do some funky things to ensure
that (a) we don't use logger configuration from 'alembic.ini' that will
mess with our existing logger configuration and (b) we re-use connection
objects as necessary to allow us to run tests against in-memory
databases, where a different connection would actually mean a different
database. We also can't rely on 'WalkVersionsMixin' from oslo.db since
that only supports sqlalchemy-migrate [1]. We instead must re-invent the
wheel here somewhat.
[1] https://github.com/openstack/oslo.db/blob/10.0.0/oslo_db/sqlalchemy/test_migrations.py#L42-L44
Change-Id: I850af601f81bd5d2ecc029682ae10d3a07c936ce
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
We're going to be extensively reworking these to handle the alembic
switchover in a future change. Ahead of this rework, rework things
somewhat to simplify things and remove unnecessary noise.
Change-Id: Ie9d00e25b7e99104155466060a2b6f8a884a5e0a
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Rewrap some code just to make it a little less fugly to read.
Change-Id: If78bbd578bbba73fc85446ad55d34d3addd6c4af
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
The pattern here is the same as the one used for the main database in
change I1fa2feaee78213ad81f1889ce54888696f58d98c. Once again, we're
mostly copy-pasting from the existing schema. Reviewers can easily diff
'nova/db/api/legacy_migrations/versions/067_train.py' against
'nova/db/api/migrations/versions/d67eeaabee36_initial_version.py' in
order to sanity check the new schema files.
Change-Id: Ic65ac5cec46ace7bd49243d052deb2a434bb1099
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
This wasn't as complicated as feared. We're mostly able to copy-paste
the existing sqlalchemy-migrate migration and simply changes the pattern
of calls from monkey-patched 'create' methods such as:
agent_builds = Table('agent_builds', meta,
sa.Column('created_at', sa.DateTime),
...
mysql_charset='utf8'
)
agent_builds.create()
to explicit alembic APIs like:
op.create_table(
'agent_builds',
sa.Column('created_at', sa.DateTime),
...
mysql_charset='utf8'
)
Reviewers are encouraged to diff the old migration file,
'nova/db/main/legacy_migrations/versions/402_train.py' against the new
one, 'nova/db/main/migrations/versions/8f2f1571d55b_initial_version.py',
to ease their job. The only significant divergences are the removal of
a single reference to the 'migrate_version' table created by
sqlalchemy-migrate, which obviously won't exist in an alembic-only
world, along with some reordering of tables. The latter step is
necessary since Alembic is not smart enough to correctly order the
creation of tables so that tables that reference (via a foreign key) one
or more other tables are created after the table(s) they reference.
Since we now have to create tables using the 'create_table' API as noted
above, rather than by creating a 'Table' instance and calling the
sqlalchemy-migrate-provided 'create' API as we could previously, we must
reorder our calls. The alternative would be to leave the creation of any
'ForeignKeyConstraint' until later but that seems no better and would
arguably be harder to read.
Note that this isn't yet wired up to anything: users can run it manually
but the nova-manage commands we provide are still connected to the
sqlalchemy-migrate commands. This will change shortly when we add shims
to handle the conversion.
Change-Id: I1fa2feaee78213ad81f1889ce54888696f58d98c
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
We have a class of bugs in our current database schema that were
introduced by now-squashed migrations, whereby we applied a new
constraint to a column but forgot to apply the same change to the column
in the shadow table. Currently, we preserve the behavior of these bugs
through use of the 'ALTER' statement. This results in more complex SQL
for the initial migration, however, more importantly, the ALTER
statement is not actually supported by SQLite and the underlying
migration engine must instead mimic behavior by creating a new table
with the updated schema and a temporary name, copy across all data from
the old table to the new table, delete the old table and rename the new
table.
While sqlalchemy-migrate handled this transparently for us, therefore
allowing us to mostly ignore the issue, alembic makes this lack of
support in SQLite explicit by insisting on the 'batch_alter_table'
context manager [1] wrapping any ALTER statements. This is context
manager is "a little tricky" (zzzeek's words, not mine) and is therefore
best avoided. Fortunately, this is easily done through the addition of a
couple of checks within the '_create_shadow_tables' helper that simply
modify the affected columns before creating the table.
[1] https://alembic.sqlalchemy.org/en/latest/ops.html#alembic.operations.Operations.batch_alter_table
Change-Id: Ib3c8ed876791cda96486d43384f7aec0c487fd2f
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Introduce a new 'nova.db.api.api' module to hold API database-specific
helpers, plus a generic 'nova.db.utils' module to hold code suitable for
both main and API databases. This highlights a level of complexity
around connection management that is present for the main database but
not for the API database. This is because we need to handle the
complexity of cells for the former but not the latter.
Change-Id: Ia5304c552ce552ae3c5223a2bfb3a9cd543ec57c
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
The two remaining modules, 'api_models' and 'api_migrations', are
moved to the new 'nova.db.api' module.
Change-Id: I138670fe36b07546db5518f78c657197780c5040
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Build on changes I714dce969088e17caaa1daddc881ebca6834e0f1, which
modified how we registered options for both the main and API databases,
and Iad2e4da4546b80a016e477577d23accb2606a6e4, which removed support for
experimental database reconnects, and start ignoring the
'use_db_reconnect' option. This means the option will no longer appear
in our documentation and won't be recognised by any of the validation
tooling if present in a nova.conf file.
No tests are present because I'm not entirely sure how to test the
thing. You can validate it manually though by attempting to set the
option in a test and watch things blow up like so:
oslo_config.cfg.NoSuchOptError: no such option use_db_reconnect in
group [database]
A second unnecessary 'deepcopy()' is removed. This was missed in change
I714dce969088e17caaa1daddc881ebca6834e0f1.
Change-Id: Ie7e4ccaf32f7222b8e305a38b48de7980744a98f
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Merge these, removing an unnecessary layer of abstraction, and place
them in the new 'nova.db.main' directory. The resulting change is huge,
but it's mainly the result of 's/sqlalchemy import api/main import api/'
and 's/nova.db.api/nova.db.main.api/' with some necessary cleanup. We
also need to rework how we do the blocking of API calls since we no
longer have a 'DBAPI' object that we can monkey patch as we were doing
before. This is now done via a global variable that is set by the 'main'
function of 'nova.cmd.compute'.
The main impact of this change is that it's no longer possible to set
'[database] use_db_reconnect' and have all APIs automatically wrapped in
a DB retry. Seeing as this behavior is experimental, isn't applied to
any of the API DB methods (which don't use oslo.db's 'DBAPI' helper),
and is used explicitly in what would appear to be the critical cases
(via the explicit 'oslo_db.api.wrap_db_retry' decorator), this doesn't
seem like a huge loss.
Change-Id: Iad2e4da4546b80a016e477577d23accb2606a6e4
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Deep copy the database options, allowing us to reuse them exactly. This
was actually suggested in I7b74c4bb4a3edc335523c62ec63e57fdd5c51f39,
which added the current code but was rejected at the time because it
"would make our lives harder when we need to deprecate those cloned API
DB opts." I don't know why we'd ever want to deprecate the cloned
options so I see no reason not to take this approach now. Doing so
results in less complicated/weird code and also sets us up nicely for
future changes that'll remove some of the now unnecessary abstraction
layers provided by oslo.db.
Change-Id: I714dce969088e17caaa1daddc881ebca6834e0f1
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
This was previously set really low to 1 second that was leading to more
involved flows such as test_delete_while_in_verify_resize_status timing
out when the target calls the conductor to confirm the resize on the
source.
This change simply increases the timeout in the test but we might want
to think about moving this call over to rpc_long_timeout this could be
an issue in real world deployments.
Closes-Bug: #1938021
Change-Id: Ibba2d1506a0b026d35d7bf35384ec6439f438b01
Bug #1912310 details an underlying QEMU live migration failure seen
during both volume backend and block migration based live migration
tests in the nova-live-migration job. The failure rate is getting high
enough now that we should mark these jobs as non-voting until it is
resolved.
Change-Id: Ifc211fadb3ca9c7c0a74d7dcd225bdb3f3008d26
Related-Bug: #1912310
As discussed in I332d4f33ea6b9506cc24ac12e5c0994f208a3107 this is useful
enough to be copied into InstanceHelperMixin.
Most of this change is migrating the original functional test that
introduced _wait_for_volume_attach over to use _IntegratedTestBase.
Change-Id: I345dce93c6e2593c1ff1b863425f9a854c49ab9d
When attaching a volume to a running instance the nova-api validates
that the volume is not already attached to the instance. However
nova-compute is responsible for actually creating the BDM entry in the
database. If sending attach requests fast enough it can be possible
that the same "attach_volume" request can be sent to nova-compute for
the same volume/instance combination.
To work around this we add a check in nova-compute to validate that
the volume has not been attached in the mean time.
Closes-Bug: #1937375
Change-Id: I92f35514efddcb071c7094370b79d91d34c5bc72
As seen in bug #1849425 os-brick can fail to extend an underlying volume
device on the compute silently, returning a new_size of None to n-cpu.
While this should ultimatley be addressed in os-brick n-cpu can also
handle this before we eventually run into a type error when attemting
floor division later in the volume extend flow.
Change-Id: Ic8091537274a5ad27fb5af8939f81ed154b7ad7c
Closes-Bug: #1849425
As seen before in bug #1913451 the nova-lvm job creates large RAW
snapshots during each run due to the nature of using LVM storage for the
ephemeral instance disks.
Id425aa546f1a5973bae8be9c017782d18f0b4a47 recently landed capping the
per teneant image size total to 1000 MiB that is far too low for this
job given the RAW snapshots it's creating. This in turn resulted in
504 failures when n-cpu attempted to upload snapshots to g-api.
This change simply increases the per tenant limit to 10 GiB that should
be more than enough for all of the tests covered by the nova-lvm job.
Closes-Bug: #1938765
Change-Id: I824655387a10ac9b813c1b7b2399e25ed95f7fc3
Change I9b16a3a849937aba5b90ed1ab9a80b7f0103f673 attempted to fix a
non-determinstic failure in the test_archive_task_logs functional test
by adding a time override. That fix however missed that the
osloutils_fixture.TimeFixture clears the time override when it exits
and that does not mean a previous time override will be restored.
This removes usage of the osloutils_fixture.TimeFixture and instead
sets time overrides and restores previous time overrides. Restoration
of the original time overrides also made an adjustment to the --before
option value necessary in order to pick up the first task_log record.
Closes-Bug: #1934519
Change-Id: Ic28d2e53d5f50e89635e19df08699f2b8c5cea84
Some associated logic checking the underlying libvirt hypervisor being
used is also reworked to read more easily.
Change-Id: I9ad5c2c74ecfa7a2a07e2888b628d2ecd4534523
Consider the following situation:
- Using the Ironic virt driver
- Replacing (so removing and re-adding) all baremetal nodes
associated with a single nova-compute service
The update resources periodic will have destroyed the compute node
records because they're no longer being reported by the virt driver.
If we then attempt to manually delete the compute service record, the
datbase layer will raise an exception, as there are no longer any
compute node records for the host. Previously, this exception would
get bubbled up as an error 500 in the API. This patch catches it and
allows service deletion to complete succefully.
Closes bug: 1860312
Change-Id: I2f9ad3df25306e070c8c3538bfed1212d6d8682f
Consider the following situation:
- Using the Ironic virt driver
- Replacing (so removing and re-adding) all baremetal nodes
associated with a single nova-compute service
The update resources periodic will have destroyed the compute node
records because they're no longer being reported by the virt driver.
If we then attempt to manually delete the compute service record, the
datbase layer will raise an exception, as there are no longer any
compute node records for the host. This exception gets bubbled up as
an error 500 in the API. This patch adds a unit test to demonstrate
this.
Related bug: 1860312
Change-Id: I03eec634b25582ec9643cacf3e5868c101176983