Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-24414

Implement --prune-replaced option in ctrl_mpexec

    XMLWordPrintable

    Details

    • Story Points:
      2
    • Sprint:
      DB_F20_09
    • Team:
      Data Access and Database
    • Urgent?:
      No

      Description

      This option was added to the argument parser on DM-21849 but not implemented due to problems with dataset deletion.

        Attachments

          Issue Links

            Activity

            No builds found.
            jbosch Jim Bosch created issue -
            jbosch Jim Bosch made changes -
            Field Original Value New Value
            Link This issue is blocked by DM-23671 [ DM-23671 ]
            swinbank John Swinbank made changes -
            Team Architecture [ 10304 ]
            tjenness Tim Jenness made changes -
            Watchers Jim Bosch [ Jim Bosch ] Andy Salnikov, Jim Bosch, Nate Pease [ Andy Salnikov, Jim Bosch, Nate Pease ]
            tjenness Tim Jenness made changes -
            Link This issue is triggered by DM-21849 [ DM-21849 ]
            Hide
            tjenness Tim Jenness added a comment -

            What does the --prune-replaced option do?

            Show
            tjenness Tim Jenness added a comment - What does the --prune-replaced option do?
            Hide
            jbosch Jim Bosch added a comment -

            It's supposed to delete the previous run in a chain (and the datasets within it) instead of just shunting them off to the side by taking the run out of the chain. Or if we change that UI (since everyone is confused by it), we can consider this ticket to be about providing some option to just blow away the last run when running again.

            Show
            jbosch Jim Bosch added a comment - It's supposed to delete the previous run in a chain (and the datasets within it) instead of just shunting them off to the side by taking the run out of the chain. Or if we change that UI (since everyone is confused by it), we can consider this ticket to be about providing some option to just blow away the last run when running again.
            gruendl Robert Gruendl [X] (Inactive) made changes -
            Remote Link This issue links to "Page (Confluence)" [ 25799 ]
            Hide
            tjenness Tim Jenness added a comment -

            Jim Bosch can you put a story point estimate on this ticket? It seems like it's a trivial thing that Andy Salnikov could do in an afternoon but I might be wrong (especially if "revamp the entire command line" is part of the plan).

            Show
            tjenness Tim Jenness added a comment - Jim Bosch can you put a story point estimate on this ticket? It seems like it's a trivial thing that Andy Salnikov could do in an afternoon but I might be wrong (especially if "revamp the entire command line" is part of the plan).
            jbosch Jim Bosch made changes -
            Story Points 1
            Hide
            jbosch Jim Bosch added a comment -

            Let's make it the trivial thing. I think we're already planning to tell users this interface needs feedback and improvement at Gen2 deprecation time, and that overhaul shouldn't block this ticket.

            Show
            jbosch Jim Bosch added a comment - Let's make it the trivial thing. I think we're already planning to tell users this interface needs feedback and improvement at Gen2 deprecation time, and that overhaul shouldn't block this ticket.
            tjenness Tim Jenness made changes -
            Assignee Andy Salnikov [ salnikov ]
            tjenness Tim Jenness made changes -
            Team Architecture [ 10304 ] Data Access and Database [ 10204 ]
            salnikov Andy Salnikov made changes -
            Epic Link DM-25244 [ 435560 ]
            salnikov Andy Salnikov made changes -
            Sprint DB_F20_09 [ 1048 ]
            salnikov Andy Salnikov made changes -
            Status To Do [ 10001 ] In Progress [ 3 ]
            Hide
            salnikov Andy Salnikov added a comment -

            Reading docstring for Butler.pruneDatasets I see this:

                    disassociate : bool`, optional
                        Disassociate pruned datasets from ``self.tags`` (or the collections
                        given via the ``tags`` argument).  Ignored if ``refs`` is ``...``.
            

            but I think that refs is not supposed to be ... in this method? Jim Bosch, does the docstring need a fix?

            Show
            salnikov Andy Salnikov added a comment - Reading docstring for Butler.pruneDatasets I see this : disassociate : bool`, optional Disassociate pruned datasets from ``self.tags`` (or the collections given via the ``tags`` argument). Ignored if ``refs`` is ``...``. but I think that refs is not supposed to be ... in this method? Jim Bosch , does the docstring need a fix?
            Hide
            jbosch Jim Bosch added a comment -

            Definitely needs a fix, but I'm not sure what I meant to say with the last sentence. Probably best to just remove the last sentence.

            Show
            jbosch Jim Bosch added a comment - Definitely needs a fix, but I'm not sure what I meant to say with the last sentence. Probably best to just remove the last sentence.
            Hide
            salnikov Andy Salnikov added a comment - - edited

            This "trivial" fix gives me lots of troubles, mostly in that I had to do non-trivial things to test it. Now I think my tests are OK, but I'm getting an exception with my implementation of "purge" option. I though I can implement it with a simple call:

                butler.pruneCollection(replaced, purge=True, unstore=True) 

            (where "replaced" is a collection name and it is RUN collection). This crashes with FOREIGN KEY violation:

            Traceback (most recent call last):
              File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1248, in _execute_context
                cursor, statement, parameters, context
              File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 590, in do_execute
                cursor.execute(statement, parameters)
            sqlite3.IntegrityError: FOREIGN KEY constraint failedThe above exception was the direct cause of the following exception:Traceback (most recent call last):
              File "tests/test_cmdLineFwk.py", line 573, in testSimpleQGraphReplaceRun
                fwk.runPipeline(copy.deepcopy(qgraph), taskFactory, args)
              File "/project/salnikov/gen3-middleware/ctrl_mpexec/python/lsst/ctrl/mpexec/cmdLineFwk.py", line 669, in runPipeline
                butler = _ButlerFactory.makeWriteButler(args)
              File "/project/salnikov/gen3-middleware/ctrl_mpexec/python/lsst/ctrl/mpexec/cmdLineFwk.py", line 363, in makeWriteButler
                butler.pruneCollection(replaced, purge=True, unstore=True)
              File "/software/lsstsw/stack_20200515/stack/miniconda3-4.7.12-46b24e8/Linux64/daf_butler/19.0.0-148-g38150800+de92faa26f/python/lsst/daf/butler/_butler.py", line 959, in pruneCollection
                self.registry.removeCollection(name)
              File "/software/lsstsw/stack_20200515/stack/miniconda3-4.7.12-46b24e8/Linux64/daf_butler/19.0.0-148-g38150800+de92faa26f/python/lsst/daf/butler/core/utils.py", line 253, in inner
                return func(self, *args, **kwargs)
              File "/software/lsstsw/stack_20200515/stack/miniconda3-4.7.12-46b24e8/Linux64/daf_butler/19.0.0-148-g38150800+de92faa26f/python/lsst/daf/butler/registry/_registry.py", line 398, in removeCollection
                self._collections.remove(name)
              File "/software/lsstsw/stack_20200515/stack/miniconda3-4.7.12-46b24e8/Linux64/daf_butler/19.0.0-148-g38150800+de92faa26f/python/lsst/daf/butler/registry/collections/_base.py", line 410, in remove
                {self._collectionIdName: record.key})
              File "/software/lsstsw/stack_20200515/stack/miniconda3-4.7.12-46b24e8/Linux64/daf_butler/19.0.0-148-g38150800+de92faa26f/python/lsst/daf/butler/registry/interfaces/_database.py", line 1299, in delete
                return self._connection.execute(sql, *rows).rowcount
              File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 984, in execute
                return meth(self, multiparams, params)
              File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 293, in _execute_on_connection
                return connection._execute_clauseelement(self, multiparams, params)
              File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1103, in _execute_clauseelement
                distilled_params,
              File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1288, in _execute_context
                e, statement, parameters, cursor, context
              File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1482, in _handle_dbapi_exception
                sqlalchemy_exception, with_traceback=exc_info[2], from_=e
              File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 178, in raise_
                raise exception
              File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1248, in _execute_context
                cursor, statement, parameters, context
              File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 590, in do_execute
                cursor.execute(statement, parameters)
            sqlalchemy.exc.IntegrityError: (sqlite3.IntegrityError) FOREIGN KEY constraint failed
            [SQL: DELETE FROM collection WHERE collection.collection_id = ?]
            [parameters: (5,)]
            (Background on this error at: http://sqlalche.me/e/gkpj) 

            Jim Bosch, are my assumptions totally wrong, do I need to do something else before (or instead of) pruneCollection?

            Show
            salnikov Andy Salnikov added a comment - - edited This "trivial" fix gives me lots of troubles, mostly in that I had to do non-trivial things to test it. Now I think my tests are OK, but I'm getting an exception with my implementation of "purge" option. I though I can implement it with a simple call: butler.pruneCollection(replaced, purge=True, unstore=True) (where "replaced" is a collection name and it is RUN collection). This crashes with FOREIGN KEY violation: Traceback (most recent call last): File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1248, in _execute_context cursor, statement, parameters, context File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 590, in do_execute cursor.execute(statement, parameters) sqlite3.IntegrityError: FOREIGN KEY constraint failedThe above exception was the direct cause of the following exception:Traceback (most recent call last): File "tests/test_cmdLineFwk.py", line 573, in testSimpleQGraphReplaceRun fwk.runPipeline(copy.deepcopy(qgraph), taskFactory, args) File "/project/salnikov/gen3-middleware/ctrl_mpexec/python/lsst/ctrl/mpexec/cmdLineFwk.py", line 669, in runPipeline butler = _ButlerFactory.makeWriteButler(args) File "/project/salnikov/gen3-middleware/ctrl_mpexec/python/lsst/ctrl/mpexec/cmdLineFwk.py", line 363, in makeWriteButler butler.pruneCollection(replaced, purge=True, unstore=True) File "/software/lsstsw/stack_20200515/stack/miniconda3-4.7.12-46b24e8/Linux64/daf_butler/19.0.0-148-g38150800+de92faa26f/python/lsst/daf/butler/_butler.py", line 959, in pruneCollection self.registry.removeCollection(name) File "/software/lsstsw/stack_20200515/stack/miniconda3-4.7.12-46b24e8/Linux64/daf_butler/19.0.0-148-g38150800+de92faa26f/python/lsst/daf/butler/core/utils.py", line 253, in inner return func(self, *args, **kwargs) File "/software/lsstsw/stack_20200515/stack/miniconda3-4.7.12-46b24e8/Linux64/daf_butler/19.0.0-148-g38150800+de92faa26f/python/lsst/daf/butler/registry/_registry.py", line 398, in removeCollection self._collections.remove(name) File "/software/lsstsw/stack_20200515/stack/miniconda3-4.7.12-46b24e8/Linux64/daf_butler/19.0.0-148-g38150800+de92faa26f/python/lsst/daf/butler/registry/collections/_base.py", line 410, in remove {self._collectionIdName: record.key}) File "/software/lsstsw/stack_20200515/stack/miniconda3-4.7.12-46b24e8/Linux64/daf_butler/19.0.0-148-g38150800+de92faa26f/python/lsst/daf/butler/registry/interfaces/_database.py", line 1299, in delete return self._connection.execute(sql, *rows).rowcount File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 984, in execute return meth(self, multiparams, params) File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 293, in _execute_on_connection return connection._execute_clauseelement(self, multiparams, params) File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1103, in _execute_clauseelement distilled_params, File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1288, in _execute_context e, statement, parameters, cursor, context File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1482, in _handle_dbapi_exception sqlalchemy_exception, with_traceback=exc_info[2], from_=e File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 178, in raise_ raise exception File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1248, in _execute_context cursor, statement, parameters, context File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 590, in do_execute cursor.execute(statement, parameters) sqlalchemy.exc.IntegrityError: (sqlite3.IntegrityError) FOREIGN KEY constraint failed [SQL: DELETE FROM collection WHERE collection.collection_id = ?] [parameters: (5,)] (Background on this error at: http://sqlalche.me/e/gkpj) Jim Bosch , are my assumptions totally wrong, do I need to do something else before (or instead of) pruneCollection ?
            Hide
            jbosch Jim Bosch added a comment -

            I thought what you tried world work. AFK now, but will look at exception when back at it.

            Show
            jbosch Jim Bosch added a comment - I thought what you tried world work. AFK now, but will look at exception when back at it.
            Hide
            salnikov Andy Salnikov added a comment -

            I think a collection has to be removed from chain collection before it can be removed itself. The schema:

            CREATE TABLE collection_chain (
                    parent BIGINT,
                    position SMALLINT,
                    child BIGINT NOT NULL,
                    dataset_type_name VARCHAR(128),
                    PRIMARY KEY (parent, position),
                    CONSTRAINT collection_chain_len_dataset_type_name CHECK (length(dataset_type_name)<=128 AND length(dataset_type_name)>=1),
                    CONSTRAINT fkey_collection_chain_collection_collection_id_parent FOREIGN KEY(parent) REFERENCES collection (collection_id) ON DELETE CASCADE,
                    CONSTRAINT fkey_collection_chain_collection_collection_id_child FOREIGN KEY(child) REFERENCES collection (collection_id)
            ); 

            does not have ON DELETE CASCADE for child FK, was it intentional?

            Show
            salnikov Andy Salnikov added a comment - I think a collection has to be removed from chain collection before it can be removed itself. The schema: CREATE TABLE collection_chain ( parent BIGINT, position SMALLINT, child BIGINT NOT NULL, dataset_type_name VARCHAR(128), PRIMARY KEY (parent, position), CONSTRAINT collection_chain_len_dataset_type_name CHECK (length(dataset_type_name)<=128 AND length(dataset_type_name)>=1), CONSTRAINT fkey_collection_chain_collection_collection_id_parent FOREIGN KEY(parent) REFERENCES collection (collection_id) ON DELETE CASCADE, CONSTRAINT fkey_collection_chain_collection_collection_id_child FOREIGN KEY(child) REFERENCES collection (collection_id) ); does not have ON DELETE CASCADE for child FK, was it intentional?
            Hide
            jbosch Jim Bosch added a comment -

            That was intentional, because deleting a RUN child collection also implies deleting datasets, and that requires datastore involvement and hence can't be done exclusively through ON DELETE.

            But if the problem is that it is trying to delete a CHAINED collection, that explains it - I was imagining that the replaced variable in your code was the name of a RUN collection, because --prune-replaced should be popping the first RUN off of the CHAINED collection stack, and then deleting that RUN, not the CHAINED collection. Or is there a scenario I had not considered where the first collection on that stack is also a CHAINED collection?

            Show
            jbosch Jim Bosch added a comment - That was intentional, because deleting a RUN child collection also implies deleting datasets, and that requires datastore involvement and hence can't be done exclusively through ON DELETE. But if the problem is that it is trying to delete a CHAINED collection, that explains it - I was imagining that the replaced variable in your code was the name of a RUN collection, because --prune-replaced should be popping the first RUN off of the CHAINED collection stack, and then deleting that RUN , not the CHAINED collection. Or is there a scenario I had not considered where the first collection on that stack is also a CHAINED collection?
            Hide
            salnikov Andy Salnikov added a comment -

            replaced is indeed a RUN collection, I'm not trying to delete CHAINED collection. But deleting RUN collection also need to "unlink" it from a CHAINED collection, and that step has to be done explicitly if there is no ON DELETE CASCADE on the child key in the collection_chain table. I don't think this has to do something with datastore, dropping datasets is done explicitly by pruneCollection.

            Show
            salnikov Andy Salnikov added a comment - replaced is indeed a RUN collection, I'm not trying to delete CHAINED collection. But deleting RUN collection also need to "unlink" it from a CHAINED collection, and that step has to be done explicitly if there is no ON DELETE CASCADE on the child key in the collection_chain table. I don't think this has to do something with datastore, dropping datasets is done explicitly by pruneCollection .
            Hide
            jbosch Jim Bosch added a comment -

            Oh, sorry, I misunderstood your last post, and your analysis was exactly right. But I'm not sure if we want to add ON DELETE CASCADE to fix this; if a user tries to do --prune-replaced on a RUN that is referenced by some other CHAINED collection than the one they are modifying, it might be a sign (especially in a multi-user environment) that the RUN is important in some other context they were not aware of, and they should think twice (and then maybe use a butler command to delete it if they are sure; that doesn't exist yet, but it's DM-26684). Of course, we would need to make sure we remove it from the CHAINED collection that the user is trying to modify before deleting it, then, and I'm not sure if that's easy with the way the code is structured right now.

            Show
            jbosch Jim Bosch added a comment - Oh, sorry, I misunderstood your last post, and your analysis was exactly right. But I'm not sure if we want to add ON DELETE CASCADE to fix this; if a user tries to do --prune-replaced on a RUN that is referenced by some other CHAINED collection than the one they are modifying, it might be a sign (especially in a multi-user environment) that the RUN is important in some other context they were not aware of, and they should think twice (and then maybe use a butler command to delete it if they are sure; that doesn't exist yet, but it's DM-26684 ). Of course, we would need to make sure we remove it from the CHAINED collection that the user is trying to modify before deleting it, then, and I'm not sure if that's easy with the way the code is structured right now.
            Hide
            salnikov Andy Salnikov added a comment -

            Ah, OK, it makes sense. I think workaround should be simple, I can call butler.registry.setCollectionChain with whatever remains in the chain after popping first element, let me try that.

            Show
            salnikov Andy Salnikov added a comment - Ah, OK, it makes sense. I think workaround should be simple, I can call butler.registry.setCollectionChain with whatever remains in the chain after popping first element, let me try that.
            salnikov Andy Salnikov made changes -
            Story Points 1 2
            Hide
            salnikov Andy Salnikov added a comment -

            Should be ready for review. I think my understanding of what the options should do is right but please check that it really should work that way.

            ctrl_mpexec: https://github.com/lsst/ctrl_mpexec/pull/72

            daf_butler: https://github.com/lsst/daf_butler/pull/372

             

            Show
            salnikov Andy Salnikov added a comment - Should be ready for review. I think my understanding of what the options should do is right but please check that it really should work that way. ctrl_mpexec: https://github.com/lsst/ctrl_mpexec/pull/72 daf_butler: https://github.com/lsst/daf_butler/pull/372  
            salnikov Andy Salnikov made changes -
            Reviewers Jim Bosch [ jbosch ]
            Status In Progress [ 3 ] In Review [ 10004 ]
            jbosch Jim Bosch made changes -
            Status In Review [ 10004 ] Reviewed [ 10101 ]
            Hide
            tjenness Tim Jenness added a comment -

            Andy Salnikov am I correct in thinking that this ticket can be closed?

            Show
            tjenness Tim Jenness added a comment - Andy Salnikov am I correct in thinking that this ticket can be closed?
            Hide
            salnikov Andy Salnikov added a comment -

            Indeed, I thought I closed it already.

            Show
            salnikov Andy Salnikov added a comment - Indeed, I thought I closed it already.
            salnikov Andy Salnikov made changes -
            Resolution Done [ 10000 ]
            Status Reviewed [ 10101 ] Done [ 10002 ]

              People

              Assignee:
              salnikov Andy Salnikov
              Reporter:
              jbosch Jim Bosch
              Reviewers:
              Jim Bosch
              Watchers:
              Andy Salnikov, Jim Bosch, Nate Pease [X] (Inactive), Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.