Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-26237

Can't rerun ap_verify on same repository in Gen 3

    XMLWordPrintable

Details

    • Bug
    • Status: Done
    • Resolution: Done
    • None
    • ap_verify
    • None

    Description

      The Gen 3 version of ap_verify cannot be run on the same workspace more than once, despite being designed for reuse like the Gen 2 version. The problem I've encountered happens at Butler initialization; there may be others downstream:

        File "python/lsst/ap/verify/ingestion.py", line 602, in ingestDatasetGen3
          ingester = Gen3DatasetIngestTask(dataset, workspace, config=_getConfig(Gen3DatasetIngestTask, dataset))
        File "python/lsst/ap/verify/ingestion.py", line 462, in __init__
          self.dataset.makeCompatibleRepoGen3(self.workspace.repo)
        File "python/lsst/ap/verify/dataset.py", line 266, in makeCompatibleRepoGen3
          repoConfig = dafButler.Butler.makeRepo(repoDir, overwrite=True)
        File "python/lsst/daf/butler/_butler.py", line 371, in makeRepo
          Registry.fromConfig(config, create=createRegistry, butlerRoot=root)
        File "python/lsst/daf/butler/registry/_registry.py", line 147, in fromConfig
          versions=versions, writeable=writeable, create=create)
        File "python/lsst/daf/butler/registry/_registry.py", line 161, in __init__
          with self._db.declareStaticTables(create=create) as context:
        File "python/lsst/daf/butler/registry/interfaces/_database.py", line 410, in declareStaticTables
          raise SchemaAlreadyDefinedError(f"Cannot create tables in non-empty database {self}.")
      

      Ensure that both ingest_datasets.py and ap_verify.py work when rerun on old workspaces.

      Attachments

        Issue Links

          Activity

            tjenness Tim Jenness added a comment - - edited

            Are you mixing registry versions? It should only be creating tables if the tables don't previously exist. How are you deciding to call makeRepo? In butler convert we have:

                try:
                    butlerConfig = lsst.daf.butler.Butler.makeRepo(repo)
                except FileExistsError:
                    # Use the existing butler configuration
                    butlerConfig = repo
            

            but you aren't getting FileExistsError indicating that there is no pre-existing butler.yaml but there is a pre-existing registry.

            tjenness Tim Jenness added a comment - - edited Are you mixing registry versions? It should only be creating tables if the tables don't previously exist. How are you deciding to call makeRepo? In butler convert we have: try : butlerConfig = lsst.daf.butler.Butler.makeRepo(repo) except FileExistsError: # Use the existing butler configuration butlerConfig = repo but you aren't getting FileExistsError indicating that there is no pre-existing butler.yaml but there is a pre-existing registry.
            krzys Krzysztof Findeisen added a comment - - edited

            tjenness, can you elaborate on "It should only be creating tables if the tables don't previously exist"? I'm tracing through the Butler code and the spec for Registry.fromConfig implies that either the registry is assumed to be empty, or nothing is done. Same with Database.declareStaticTables.

            The behavior you describe is what I intended to happen when I passed overwrite=True, but it doesn't seem to be supported.

            krzys Krzysztof Findeisen added a comment - - edited tjenness , can you elaborate on "It should only be creating tables if the tables don't previously exist"? I'm tracing through the Butler code and the spec for Registry.fromConfig implies that either the registry is assumed to be empty, or nothing is done. Same with Database.declareStaticTables . The behavior you describe is what I intended to happen when I passed overwrite=True , but it doesn't seem to be supported.
            tjenness Tim Jenness added a comment -

            The overwrite flag is mostly there to allow people to ensure that makeRepo doesn't try to do anything if a butler exists there (using butler.yaml as a proxy). It's what is used by butler convert to let it try to create the repo if it's not there but otherwise ignore it. I'm not entirely sure I remember why I made overwrite an option since, as you are discovering, it overwrites the butler configuration and then breaks.

            The API explicitly talks about config files with overwrite. I don't think anyone has thought about the ramifications in terms of whether overwrite should then proceed to delete all the registry tables and recreate them afresh. salnikov how plausible is a "drop everything and start from scratch" option in registry? Seems a bit dangerous.

            tjenness Tim Jenness added a comment - The overwrite flag is mostly there to allow people to ensure that makeRepo doesn't try to do anything if a butler exists there (using butler.yaml as a proxy). It's what is used by butler convert to let it try to create the repo if it's not there but otherwise ignore it. I'm not entirely sure I remember why I made overwrite an option since, as you are discovering, it overwrites the butler configuration and then breaks. The API explicitly talks about config files with overwrite. I don't think anyone has thought about the ramifications in terms of whether overwrite should then proceed to delete all the registry tables and recreate them afresh. salnikov how plausible is a "drop everything and start from scratch" option in registry? Seems a bit dangerous.

            "Drop everything and start from scratch" is actually not the behavior I'm looking for in this case; it sounds like I'll have to use the idiom you mentioned regardless.

            krzys Krzysztof Findeisen added a comment - "Drop everything and start from scratch" is actually not the behavior I'm looking for in this case; it sounds like I'll have to use the idiom you mentioned regardless.
            tjenness Tim Jenness added a comment -

            So you want to write a new config file but then you don't want to change registry if it previously exists?

            The problem with that is that config can contain registry information and so it's possible to end up with a config that is inconsistent with the registry that exists there (which we could deal with if necessary by looking at versions and deciding if the versions match we can return immediately – but that's not quite in place yet). Once we move dimensions.yaml out of the butler config file the behavior of overwrite will become much more obvious.

            tjenness Tim Jenness added a comment - So you want to write a new config file but then you don't want to change registry if it previously exists? The problem with that is that config can contain registry information and so it's possible to end up with a config that is inconsistent with the registry that exists there (which we could deal with if necessary by looking at versions and deciding if the versions match we can return immediately – but that's not quite in place yet). Once we move dimensions.yaml out of the butler config file the behavior of overwrite will become much more obvious.

            Dropping registry data indeed feels dangerous, probably can be done in a separate administration API, I would not like to expose that as a common option.

             

            salnikov Andy Salnikov added a comment - Dropping registry data indeed feels dangerous, probably can be done in a separate administration API, I would not like to expose that as a common option.  
            tjenness Tim Jenness added a comment -

            I think then that the overwrite=False is what you really want to be using in the short term.

            After this discussion I'm not entirely sure that overwrite=True is ever a good idea. At best we want it to change the configuration of the repo without touching the database itself. This can sort of work if the registry schema definition is no longer part of the config file (DM-26407) or if we do a basic schema version check and do nothing if the schemas match else complain. It may be that all anyone wants is an option that returns without doing anything if a butler is already defined there (therefore ignoring any configuration you might have specified explicitly).

            tjenness Tim Jenness added a comment - I think then that the overwrite=False is what you really want to be using in the short term. After this discussion I'm not entirely sure that overwrite=True is ever a good idea. At best we want it to change the configuration of the repo without touching the database itself. This can sort of work if the registry schema definition is no longer part of the config file ( DM-26407 ) or if we do a basic schema version check and do nothing if the schemas match else complain. It may be that all anyone wants is an option that returns without doing anything if a butler is already defined there (therefore ignoring any configuration you might have specified explicitly).
            tjenness Tim Jenness added a comment -

            Looks good to me. A minor comment on documentation.

            tjenness Tim Jenness added a comment - Looks good to me. A minor comment on documentation.

            People

              krzys Krzysztof Findeisen
              krzys Krzysztof Findeisen
              Tim Jenness
              Andy Salnikov, Chris Morrison [X] (Inactive), Krzysztof Findeisen, Meredith Rawls, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Jenkins

                  No builds found.