Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-33635

Remove ON DELETE CASCADE from collection tag rows

    XMLWordPrintable

    Details

    • Team:
      Architecture
    • Urgent?:
      No

      Description

      I believe we're implementing a plan in which raws are protected from expiration in OODS by adding them to TAGGED collections, but this may be based on a faulty recollection of how this would work.

      At a low level, the database rows that store dataset membership in TAGGED collections have a foreign key to the dataset table marked with ON DELETE CASCADE, which means, "if the dataset is deleted just delete this membership row, too" - so there is no foreign key protection against TAGGED dataset deletion.

      There may well be higher-level guards; some of our deletion APIs do a lot of querying in advance to figure out whether it's safe to delete something, and these may perform such a check. But if we want a rigorous guard against TAGGED dataset deletion, we should change this foreign key definition. I think we do want it: even aside from the OODS use case, this has long been an uncomfortable asymmetry with how CHAINED collection membership works, where the philosophy is that you shouldn't be able to blindly delete something you own without at least being told when someone else might be depending on it.

      I'm not sure this ticket will be easy-ish (it may just be a two-line code change, but is definitely also a minor schema migration, and that's always a pain) or much harder; the hard case would enter because the TAGGED member table is also the RUN membership table, and we may be relying on ON DELETE CASCADE to remove datasets from RUN collections, and if we are, I don't know how hard it would be to change that.

        Attachments

          Issue Links

            Activity

            Hide
            spietrowicz Steve Pietrowicz added a comment -

            I told this Jim in a meeting just now, but I thought it was worth adding here:  On cleanup: the OODS queries which datasets are deletable, and which files are TAGGED and does a set difference between the two.  From that, it knows which files it can delete.

            Show
            spietrowicz Steve Pietrowicz added a comment - I told this Jim in a meeting just now, but I thought it was worth adding here:  On cleanup: the OODS queries which datasets are deletable, and which files are TAGGED and does a set difference between the two.  From that, it knows which files it can delete.
            Hide
            jbosch Jim Bosch added a comment -

            Tim Jenness, I think we'll want to include this in the upcoming schema migration batch.

            Show
            jbosch Jim Bosch added a comment - Tim Jenness , I think we'll want to include this in the upcoming schema migration batch.
            Hide
            tjenness Tim Jenness added a comment -

            I'm trying to work out what is actually needed for this ticket.

            Butler.pruneDatasets() currently has a disassociate parameter which explicitly states that when True this dataset will be removed from listed tagged collections. Furthermore disassociate must be True when purge is True and the latter says that it ignores the list of tagged collections. I'm trying to understand how we want the API to look in the OODS case because it seems like OODS wants the equivalent of purge=True, disassociate=False and then complain only if the dataset is in any tagged collection. What I'm trying to do is construct a test that fails now but will work after the fix on this ticket and I can't work out how to reproduce.

            Show
            tjenness Tim Jenness added a comment - I'm trying to work out what is actually needed for this ticket. Butler.pruneDatasets() currently has a disassociate parameter which explicitly states that when True this dataset will be removed from listed tagged collections. Furthermore disassociate must be True when purge is True and the latter says that it ignores the list of tagged collections. I'm trying to understand how we want the API to look in the OODS case because it seems like OODS wants the equivalent of purge=True, disassociate=False and then complain only if the dataset is in any tagged collection. What I'm trying to do is construct a test that fails now but will work after the fix on this ticket and I can't work out how to reproduce.
            Hide
            jbosch Jim Bosch added a comment -

            it seems like OODS wants the equivalent of purge=True, disassociate=False and then complain only if the dataset is in any tagged collection

            I think that's right and my expectation is that if we allow that argument combination, and then use that argument combination when trying to delete datasets from a RUN collection when some are also in a TAGGED collection, it will do so while also removing them from the TAGGED collection. The desired behavior is that an exception (probably sqlalchemy.exc.IntegrityError, unless we re-raise) be raised instead, and nothing is deleted.

            But now I'm a bit worried that we'll trash all of those datasets anyway even if we do later get an exception, so it'll all be still be a moot point from OODS' perspective (but maybe a step in the right direction).

            Show
            jbosch Jim Bosch added a comment - it seems like OODS wants the equivalent of purge=True, disassociate=False and then complain only if the dataset is in any tagged collection I think that's right and my expectation is that if we allow that argument combination, and then use that argument combination when trying to delete datasets from a RUN collection when some are also in a TAGGED collection, it will do so while also removing them from the TAGGED collection. The desired behavior is that an exception (probably sqlalchemy.exc.IntegrityError, unless we re-raise) be raised instead, and nothing is deleted. But now I'm a bit worried that we'll trash all of those datasets anyway even if we do later get an exception, so it'll all be still be a moot point from OODS' perspective (but maybe a step in the right direction).

              People

              Assignee:
              Unassigned Unassigned
              Reporter:
              jbosch Jim Bosch
              Watchers:
              Jim Bosch, Robert Lupton, Steve Pietrowicz, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:

                  Jenkins

                  No builds found.