Details
-
Type:
Story
-
Status: Done
-
Resolution: Done
-
Fix Version/s: None
-
Component/s: daf_butler
-
Labels:
-
Story Points:6
-
Epic Link:
-
Sprint:BG3_F18_11, BG3_S19_01
-
Team:Data Release Production
Description
Butler design requires that there be at most one Dataset in each Collection with a particular DatasetType and data ID. This is currently implemented in a very fragile, concurrency-unsafe way in the Python Registry classes. To make it robust, we need to implement it in the Registry database itself, which is tricky for several reasons:
- We don't have a single field that represents "the data ID" of a Dataset (but this is
DM-14821). - This might need to be a multi-table constraint (the alternative is denormalizing dataset_type_name and the
DM-14821packed data ID integer into the DatasetCollections join table). - We've thus far assumed in various places that we can detect violations of this constraint before actually making any changes to the database or even triggering a rollback of an ongoing transaction. It's pretty clear we'll need to just not make the latter assumption. For the former assumption, we may just have to sometimes "orphan" Datasets that are successfully inserted into the Dataset table but not successfully associated with a Collection.
There's also a very relevant discussion of this on DM-15686.
Andy Salnikov, I've assigned this to you since you'd be much better than I would be at figuring out what kind of in-database constraint we want and how best to implement that in SQLAlchemy. But I've added a blocker on DM-14821, which I'll need to do first, and I'm expecting us to work together (possibly via new blocker tickets assigned to me) on making sure the assumptions we make in the rest of the Butler are consistent with what we can actually implement robustly as a database constraint.
Attachments
Issue Links
- duplicates
-
DM-16221 Avoid full-job transactions in Gen3 ingest
- Won't Fix
- is blocked by
-
DM-14821 Provide packed integer versions of Gen3 data IDs
- Done
- is duplicated by
-
DM-15497 Revisit Registry transactions to prevent non-mutating failures from triggering global rollbacks
- Invalid
- relates to
-
DM-17419 Resolve upsert order in SQLite Registry
- Invalid
I approved last commit but please look at my comments, I think more work is needed to support non-sqlite backends.
I'm going to mark this ticket as reviewed if other reviewers do not mind (Christopher Stephens [X], Michelle Gower)?