# Can't rerun ap_verify on same repository in Gen 3

XMLWordPrintable

#### Details

• Type: Bug
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
None
• Story Points:
4
• Sprint:
AP F20-3 (August)
• Team:
• Urgent?:
No

#### Description

The Gen 3 version of ap_verify cannot be run on the same workspace more than once, despite being designed for reuse like the Gen 2 version. The problem I've encountered happens at Butler initialization; there may be others downstream:

  File "python/lsst/ap/verify/ingestion.py", line 602, in ingestDatasetGen3  ingester = Gen3DatasetIngestTask(dataset, workspace, config=_getConfig(Gen3DatasetIngestTask, dataset))  File "python/lsst/ap/verify/ingestion.py", line 462, in __init__  self.dataset.makeCompatibleRepoGen3(self.workspace.repo)  File "python/lsst/ap/verify/dataset.py", line 266, in makeCompatibleRepoGen3  repoConfig = dafButler.Butler.makeRepo(repoDir, overwrite=True)  File "python/lsst/daf/butler/_butler.py", line 371, in makeRepo  Registry.fromConfig(config, create=createRegistry, butlerRoot=root)  File "python/lsst/daf/butler/registry/_registry.py", line 147, in fromConfig  versions=versions, writeable=writeable, create=create)  File "python/lsst/daf/butler/registry/_registry.py", line 161, in __init__  with self._db.declareStaticTables(create=create) as context:  File "python/lsst/daf/butler/registry/interfaces/_database.py", line 410, in declareStaticTables  raise SchemaAlreadyDefinedError(f"Cannot create tables in non-empty database {self}.") 

Ensure that both ingest_datasets.py and ap_verify.py work when rerun on old workspaces.

#### Activity

Hide
Tim Jenness added a comment - - edited

Are you mixing registry versions? It should only be creating tables if the tables don't previously exist. How are you deciding to call makeRepo? In butler convert we have:

  try:  butlerConfig = lsst.daf.butler.Butler.makeRepo(repo)  except FileExistsError:  # Use the existing butler configuration  butlerConfig = repo 

but you aren't getting FileExistsError indicating that there is no pre-existing butler.yaml but there is a pre-existing registry.

Show
Tim Jenness added a comment - - edited Are you mixing registry versions? It should only be creating tables if the tables don't previously exist. How are you deciding to call makeRepo? In butler convert we have: try : butlerConfig = lsst.daf.butler.Butler.makeRepo(repo) except FileExistsError: # Use the existing butler configuration butlerConfig = repo but you aren't getting FileExistsError indicating that there is no pre-existing butler.yaml but there is a pre-existing registry.
Hide
Krzysztof Findeisen added a comment - - edited

Tim Jenness, can you elaborate on "It should only be creating tables if the tables don't previously exist"? I'm tracing through the Butler code and the spec for Registry.fromConfig implies that either the registry is assumed to be empty, or nothing is done. Same with Database.declareStaticTables.

The behavior you describe is what I intended to happen when I passed overwrite=True, but it doesn't seem to be supported.

Show
Krzysztof Findeisen added a comment - - edited Tim Jenness , can you elaborate on "It should only be creating tables if the tables don't previously exist"? I'm tracing through the Butler code and the spec for Registry.fromConfig implies that either the registry is assumed to be empty, or nothing is done. Same with Database.declareStaticTables . The behavior you describe is what I intended to happen when I passed overwrite=True , but it doesn't seem to be supported.
Hide
Tim Jenness added a comment -

The overwrite flag is mostly there to allow people to ensure that makeRepo doesn't try to do anything if a butler exists there (using butler.yaml as a proxy). It's what is used by butler convert to let it try to create the repo if it's not there but otherwise ignore it. I'm not entirely sure I remember why I made overwrite an option since, as you are discovering, it overwrites the butler configuration and then breaks.

The API explicitly talks about config files with overwrite. I don't think anyone has thought about the ramifications in terms of whether overwrite should then proceed to delete all the registry tables and recreate them afresh. Andy Salnikov how plausible is a "drop everything and start from scratch" option in registry? Seems a bit dangerous.

Show
Tim Jenness added a comment - The overwrite flag is mostly there to allow people to ensure that makeRepo doesn't try to do anything if a butler exists there (using butler.yaml as a proxy). It's what is used by butler convert to let it try to create the repo if it's not there but otherwise ignore it. I'm not entirely sure I remember why I made overwrite an option since, as you are discovering, it overwrites the butler configuration and then breaks. The API explicitly talks about config files with overwrite. I don't think anyone has thought about the ramifications in terms of whether overwrite should then proceed to delete all the registry tables and recreate them afresh. Andy Salnikov how plausible is a "drop everything and start from scratch" option in registry? Seems a bit dangerous.
Hide
Krzysztof Findeisen added a comment -

"Drop everything and start from scratch" is actually not the behavior I'm looking for in this case; it sounds like I'll have to use the idiom you mentioned regardless.

Show
Krzysztof Findeisen added a comment - "Drop everything and start from scratch" is actually not the behavior I'm looking for in this case; it sounds like I'll have to use the idiom you mentioned regardless.
Hide
Tim Jenness added a comment -

So you want to write a new config file but then you don't want to change registry if it previously exists?

The problem with that is that config can contain registry information and so it's possible to end up with a config that is inconsistent with the registry that exists there (which we could deal with if necessary by looking at versions and deciding if the versions match we can return immediately – but that's not quite in place yet). Once we move dimensions.yaml out of the butler config file the behavior of overwrite will become much more obvious.

Show
Tim Jenness added a comment - So you want to write a new config file but then you don't want to change registry if it previously exists? The problem with that is that config can contain registry information and so it's possible to end up with a config that is inconsistent with the registry that exists there (which we could deal with if necessary by looking at versions and deciding if the versions match we can return immediately – but that's not quite in place yet). Once we move dimensions.yaml out of the butler config file the behavior of overwrite will become much more obvious.
Hide
Andy Salnikov added a comment -

Dropping registry data indeed feels dangerous, probably can be done in a separate administration API, I would not like to expose that as a common option.

Show
Andy Salnikov added a comment - Dropping registry data indeed feels dangerous, probably can be done in a separate administration API, I would not like to expose that as a common option.
Hide
Tim Jenness added a comment -

I think then that the overwrite=False is what you really want to be using in the short term.

After this discussion I'm not entirely sure that overwrite=True is ever a good idea. At best we want it to change the configuration of the repo without touching the database itself. This can sort of work if the registry schema definition is no longer part of the config file (DM-26407) or if we do a basic schema version check and do nothing if the schemas match else complain. It may be that all anyone wants is an option that returns without doing anything if a butler is already defined there (therefore ignoring any configuration you might have specified explicitly).

Show
Tim Jenness added a comment - I think then that the overwrite=False is what you really want to be using in the short term. After this discussion I'm not entirely sure that overwrite=True is ever a good idea. At best we want it to change the configuration of the repo without touching the database itself. This can sort of work if the registry schema definition is no longer part of the config file ( DM-26407 ) or if we do a basic schema version check and do nothing if the schemas match else complain. It may be that all anyone wants is an option that returns without doing anything if a butler is already defined there (therefore ignoring any configuration you might have specified explicitly).
Hide
Tim Jenness added a comment -

Looks good to me. A minor comment on documentation.

Show
Tim Jenness added a comment - Looks good to me. A minor comment on documentation.

#### People

Assignee:
Krzysztof Findeisen
Reporter:
Krzysztof Findeisen
Reviewers:
Tim Jenness
Watchers:
Andy Salnikov, Chris Morrison [X] (Inactive), Krzysztof Findeisen, Meredith Rawls, Tim Jenness