Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-33635

Remove ON DELETE CASCADE from collection tag rows

    XMLWordPrintable

Details

    • Architecture
    • No

    Description

      I believe we're implementing a plan in which raws are protected from expiration in OODS by adding them to TAGGED collections, but this may be based on a faulty recollection of how this would work.

      At a low level, the database rows that store dataset membership in TAGGED collections have a foreign key to the dataset table marked with ON DELETE CASCADE, which means, "if the dataset is deleted just delete this membership row, too" - so there is no foreign key protection against TAGGED dataset deletion.

      There may well be higher-level guards; some of our deletion APIs do a lot of querying in advance to figure out whether it's safe to delete something, and these may perform such a check. But if we want a rigorous guard against TAGGED dataset deletion, we should change this foreign key definition. I think we do want it: even aside from the OODS use case, this has long been an uncomfortable asymmetry with how CHAINED collection membership works, where the philosophy is that you shouldn't be able to blindly delete something you own without at least being told when someone else might be depending on it.

      I'm not sure this ticket will be easy-ish (it may just be a two-line code change, but is definitely also a minor schema migration, and that's always a pain) or much harder; the hard case would enter because the TAGGED member table is also the RUN membership table, and we may be relying on ON DELETE CASCADE to remove datasets from RUN collections, and if we are, I don't know how hard it would be to change that.

      Attachments

        Issue Links

          Activity

            No builds found.
            jbosch Jim Bosch created issue -

            I told this Jim in a meeting just now, but I thought it was worth adding here:  On cleanup: the OODS queries which datasets are deletable, and which files are TAGGED and does a set difference between the two.  From that, it knows which files it can delete.

            spietrowicz Steve Pietrowicz added a comment - I told this Jim in a meeting just now, but I thought it was worth adding here:  On cleanup: the OODS queries which datasets are deletable, and which files are TAGGED and does a set difference between the two.  From that, it knows which files it can delete.
            tjenness Tim Jenness made changes -
            Field Original Value New Value
            Team Architecture [ 10304 ]
            jbosch Jim Bosch made changes -
            Labels gen3-middleware gen3-middleware gen3-registry-incompatibility
            jbosch Jim Bosch added a comment -

            tjenness, I think we'll want to include this in the upcoming schema migration batch.

            jbosch Jim Bosch added a comment - tjenness , I think we'll want to include this in the upcoming schema migration batch.
            tjenness Tim Jenness made changes -
            Link This issue relates to DM-33942 [ DM-33942 ]
            tjenness Tim Jenness added a comment -

            I'm trying to work out what is actually needed for this ticket.

            Butler.pruneDatasets() currently has a disassociate parameter which explicitly states that when True this dataset will be removed from listed tagged collections. Furthermore disassociate must be True when purge is True and the latter says that it ignores the list of tagged collections. I'm trying to understand how we want the API to look in the OODS case because it seems like OODS wants the equivalent of purge=True, disassociate=False and then complain only if the dataset is in any tagged collection. What I'm trying to do is construct a test that fails now but will work after the fix on this ticket and I can't work out how to reproduce.

            tjenness Tim Jenness added a comment - I'm trying to work out what is actually needed for this ticket. Butler.pruneDatasets() currently has a disassociate parameter which explicitly states that when True this dataset will be removed from listed tagged collections. Furthermore disassociate must be True when purge is True and the latter says that it ignores the list of tagged collections. I'm trying to understand how we want the API to look in the OODS case because it seems like OODS wants the equivalent of purge=True, disassociate=False and then complain only if the dataset is in any tagged collection. What I'm trying to do is construct a test that fails now but will work after the fix on this ticket and I can't work out how to reproduce.
            jbosch Jim Bosch added a comment -

            it seems like OODS wants the equivalent of purge=True, disassociate=False and then complain only if the dataset is in any tagged collection

            I think that's right and my expectation is that if we allow that argument combination, and then use that argument combination when trying to delete datasets from a RUN collection when some are also in a TAGGED collection, it will do so while also removing them from the TAGGED collection. The desired behavior is that an exception (probably sqlalchemy.exc.IntegrityError, unless we re-raise) be raised instead, and nothing is deleted.

            But now I'm a bit worried that we'll trash all of those datasets anyway even if we do later get an exception, so it'll all be still be a moot point from OODS' perspective (but maybe a step in the right direction).

            jbosch Jim Bosch added a comment - it seems like OODS wants the equivalent of purge=True, disassociate=False and then complain only if the dataset is in any tagged collection I think that's right and my expectation is that if we allow that argument combination, and then use that argument combination when trying to delete datasets from a RUN collection when some are also in a TAGGED collection, it will do so while also removing them from the TAGGED collection. The desired behavior is that an exception (probably sqlalchemy.exc.IntegrityError, unless we re-raise) be raised instead, and nothing is deleted. But now I'm a bit worried that we'll trash all of those datasets anyway even if we do later get an exception, so it'll all be still be a moot point from OODS' perspective (but maybe a step in the right direction).
            tjenness Tim Jenness made changes -
            Link This issue relates to DM-33942 [ DM-33942 ]
            jbosch Jim Bosch made changes -
            Link This issue is blocked by DM-34951 [ DM-34951 ]

            People

              Unassigned Unassigned
              jbosch Jim Bosch
              Jim Bosch, Robert Lupton, Steve Pietrowicz, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:

                Jenkins

                  No builds found.