Few observations from my simple tests with alembic.
Building current schema
Alembic supports different strategies for building latest version of the schema:
- start with empty schema and apply all migration steps in sequence, this is sort of conventional way for the projects that deploy alembic as their main tool for schema definition
- use different tool to generate current/latest schema and tell alembic the revision of that schema
As we build our schema in daf_butler then latter method is what we need to use if we are to use alembic. In that case schema revision will still have to be defined in alembic (as a regular migration from preceding revision). When completely new database is created with butler tools its release is not known to alembic, one needs to run alembic stamp <revision> command line tool to store revision number in database, or alternatively use alembic Python API to do the same.
Revision "numbers"
By default alembic generates random string for revision when it creates new migration, but it is also possible to specify user-defined revision name instead. Potentially we could use daf_butler schema version number (whatever it is) as alembic revision if this could simplify things. Database-current alembic revision(s) is/are stored in the alembic_version table in the database (can contain more than one record in case of branches/forest):
CREATE TABLE alembic_version (
|
version_num VARCHAR(32) NOT NULL,
|
CONSTRAINT alembic_version_pkc PRIMARY KEY (version_num)
|
);
|
Revisions appear in the migration scripts each script contains its revision name and its parent revision name(s). Alembic need to look at those revisions to build full migration path.
Auto-generated migrations
For simple cases alembic can generate migration script based on new schema (sqlalchemy.Metadata instance) and current schema in database file. As a trivial exercise I build ci_hsc_gen3 database from w.2020.19 and ran auto-generation tool on that. To get current (w.2020.22) schema I had to add few lines of code to alembic/env.py script:
from lsst.daf.butler import Butler
|
|
def _make_metadata():
|
"""Build sqlalchemy metadata for current registry schema.
|
"""
|
config = os.path.expandvars("$CI_HSC_GEN3_DIR/DATA/butler.yaml")
|
butler = Butler(config)
|
return butler.registry._db._metadata
|
|
target_metadata = _make_metadata()
|
And then executed:
$ alembic revision --rev-id w.2020.22 -m 'migration from w.2020.19 to w.2020.22' --autogenerate
|
INFO [alembic.runtime.migration] Context impl SQLiteImpl.
|
INFO [alembic.runtime.migration] Will assume non-transactional DDL.
|
INFO [alembic.autogenerate.compare] Detected removed index 'dataset_composition_fkidx_component_dataset_id' on 'dataset_composition'
|
INFO [alembic.autogenerate.compare] Detected removed index 'dataset_composition_fkidx_parent_dataset_id' on 'dataset_composition'
|
INFO [alembic.autogenerate.compare] Detected removed table 'dataset_composition'
|
Generating /project/salnikov/gen3-middleware/daf_butler/alembic/versions/w.2020.22_migration_from_w_2020_19_to_w_2020_22.py ... done
|
Generated migration script alembic/versions/w.2020.22_migration_from_w_2020_19_to_w_2020_22.py looks like:
"""migration from w.2020.19 to w.2020.22
|
|
Revision ID: w.2020.22
|
Revises:
|
Create Date: 2020-06-02 17:05:13.335899
|
|
"""
|
from alembic import op
|
import sqlalchemy as sa
|
|
|
# revision identifiers, used by Alembic.
|
revision = 'w.2020.22'
|
down_revision = None
|
branch_labels = None
|
depends_on = None
|
|
|
def upgrade():
|
# ### commands auto generated by Alembic - please adjust! ###
|
op.drop_index('dataset_composition_fkidx_component_dataset_id', table_name='dataset_composition')
|
op.drop_index('dataset_composition_fkidx_parent_dataset_id', table_name='dataset_composition')
|
op.drop_table('dataset_composition')
|
# ### end Alembic commands ###
|
|
|
def downgrade():
|
# ### commands auto generated by Alembic - please adjust! ###
|
op.create_table('dataset_composition',
|
sa.Column('component_name', sa.VARCHAR(length=32), nullable=False),
|
sa.Column('simple', sa.BOOLEAN(), nullable=True),
|
sa.Column('parent_dataset_id', sa.BIGINT(), nullable=False),
|
sa.Column('component_dataset_id', sa.BIGINT(), nullable=True),
|
sa.CheckConstraint('length(component_name)<=32', name='dataset_composition_len_component_name'),
|
sa.CheckConstraint('simple IN (0, 1)'),
|
sa.ForeignKeyConstraint(['component_dataset_id'], ['dataset.id'], name='fkey_dataset_composition_dataset_id_component_dataset_id', ondelete='CASCADE'),
|
sa.ForeignKeyConstraint(['parent_dataset_id'], ['dataset.id'], name='fkey_dataset_composition_dataset_id_parent_dataset_id', ondelete='CASCADE'),
|
sa.PrimaryKeyConstraint('component_name', 'parent_dataset_id')
|
)
|
op.create_index('dataset_composition_fkidx_parent_dataset_id', 'dataset_composition', ['parent_dataset_id'], unique=False)
|
op.create_index('dataset_composition_fkidx_component_dataset_id', 'dataset_composition', ['component_dataset_id'], unique=False)
|
# ### end Alembic commands ###
|
Which looks simple, but for more complicated cases it certainly needs review and modifications (auto-generation cannot handle all possible changes). Also data migration has to be done manually of course.
General comment - alembic is a migration tool, not a schema definition tool. It does not know what is the complete schema definition is for any revision, and it cannot verify that schema in database corresponds to a particular revision, it blindly relies on a consistency of the revision defined in alembic_version table and revisions in migration scripts. Obviously it needs a lot of attention when managing those, e.g. executing alembic stamp should be run only after checking it 3 times and then once more.
I have done some reading on Alembic, here is a quick summary of the things that I thought important (and the docs are here: https://alembic.sqlalchemy.org/en/latest/index.html):
In general I think this is a reasonably powerful tool that we could use in our environment if we need this sort of tool. It is closely related to SQLAlchemy and documentation is about the same state of mess as in SQLAlchemy.