Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-16227

Implement collection integrity constraint inside the registry database

    Details

    • Story Points:
      6
    • Sprint:
      BG3_F18_11, BG3_S19_01
    • Team:
      Data Release Production

      Description

      Butler design requires that there be at most one Dataset in each Collection with a particular DatasetType and data ID.  This is currently implemented in a very fragile, concurrency-unsafe way in the Python Registry classes.  To make it robust, we need to implement it in the Registry database itself, which is tricky for several reasons:

      • We don't have a single field that represents "the data ID" of a Dataset (but this is DM-14821).
      • This might need to be a multi-table constraint (the alternative is denormalizing dataset_type_name and the DM-14821 packed data ID integer into the DatasetCollections join table).
      • We've thus far assumed in various places that we can detect violations of this constraint before actually making any changes to the database or even triggering a rollback of an ongoing transaction.  It's pretty clear we'll need to just not make the latter assumption.  For the former assumption, we may just have to sometimes "orphan" Datasets that are successfully inserted into the Dataset table but not successfully associated with a Collection.

      There's also a very relevant discussion of this on DM-15686.

      Andy Salnikov, I've assigned this to you since you'd be much better than I would be at figuring out what kind of in-database constraint we want and how best to implement that in SQLAlchemy.  But I've added a blocker on DM-14821, which I'll need to do first, and I'm expecting us to work together (possibly via new blocker tickets assigned to me) on making sure the assumptions we make in the rest of the Butler are consistent with what we can actually implement robustly as a database constraint.

        Attachments

          Issue Links

            Activity

            Hide
            salnikov Andy Salnikov added a comment - - edited

            I approved last commit but please look at my comments, I think more work is needed to support non-sqlite backends.
            I'm going to mark this ticket as reviewed if other reviewers do not mind (Christopher Stephens, Michelle Gower)?

            Show
            salnikov Andy Salnikov added a comment - - edited I approved last commit but please look at my comments, I think more work is needed to support non-sqlite backends. I'm going to mark this ticket as reviewed if other reviewers do not mind ( Christopher Stephens , Michelle Gower )?
            Hide
            mgower Michelle Gower added a comment -

            I think we're assuming other non-sqlite backend work (i.e., Oracle)  is going to be taken care of later/elsewhere.   So I have no objection if someone marks this ticket as reviewed for just the sqlite3 part.

            Show
            mgower Michelle Gower added a comment - I think we're assuming other non-sqlite backend work (i.e., Oracle)  is going to be taken care of later/elsewhere.   So I have no objection if someone marks this ticket as reviewed for just the sqlite3 part.
            Hide
            jbosch Jim Bosch added a comment -

            Ok.  After seeing Andy Salnikov's comments on the PR, I had considered moving the implementation of associate to SqliteRegistry to indicate that the Oracle implementation had to implement its own version, but I think merging it as-is will at least leave it maybe-sometimes-working for Oracle instead of definitely broken, and that's better for now.

            Show
            jbosch Jim Bosch added a comment - Ok.  After seeing Andy Salnikov 's comments on the PR, I had considered moving the implementation of associate to SqliteRegistry to indicate that the Oracle implementation had to implement its own version, but I think merging it as-is will at least leave it maybe-sometimes-working for Oracle instead of definitely broken, and that's better for now.
            Hide
            salnikov Andy Salnikov added a comment -

            With Oracle it's always "maybe-sometimes-working" Joking. I'm OK with keeping it in base class.

            Show
            salnikov Andy Salnikov added a comment - With Oracle it's always "maybe-sometimes-working" Joking. I'm OK with keeping it in base class.
            Hide
            salnikov Andy Salnikov added a comment -

            Mark as reviewed, Oracle work should continue on a separate ticket.

            Show
            salnikov Andy Salnikov added a comment - Mark as reviewed, Oracle work should continue on a separate ticket.

              People

              • Assignee:
                jbosch Jim Bosch
                Reporter:
                jbosch Jim Bosch
                Reviewers:
                Andy Salnikov, Christopher Stephens, Michelle Gower
                Watchers:
                Andy Salnikov, Christopher Stephens, Jim Bosch, Michelle Gower, Vaikunth Thukral
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Summary Panel