Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-31976

Change handling of collections and dataset types in dataset transfer

    XMLWordPrintable

Details

    • 3
    • Architecture
    • No

    Description

      In DM-31956 deadlocks are reported when the end of workflow dataset transfer happens simultaneously. This is seemingly caused by the use of register collection inside a long-lived transaction.

      The simplest fix is to move the transaction inside the per run/datasetType loop and have it include solely the dataset registry import and the datastore registration.

      Upsides:

      • The transaction will be smaller.
      • Failures for some run/dataset type combinations could still allow other datasets to be transferred (this can also be a downside). Should it raise an exception after it has done what it could?

      Downsides:

      • If dataset import fails the collection will remain and be empty. We already are in that situation with dataset types. Should an empty run collection be removed on failure?
      • A transfer failure could still have transferred some data but not others.

      A middle ground would be to create any collections up front and then leave the transaction where it is. If the import fails for any reason, catch it, remove the run collections that were created (and also dataset types?) and reraise the exception. This would still have a long exception but would leave registry and datastore in a consistent state (although when the time comes that we are actually transferring data between buckets a rollback is going to be really really slow since it would have to delete hundred of thousands of datasets it had spent ages copying in).

      Attachments

        Issue Links

          Activity

            jbosch Jim Bosch added a comment -

            I've left some follow-up comments on the daf_butler PR, but I think any remaining changes at this point are quite straightforward.

            jbosch Jim Bosch added a comment - I've left some follow-up comments on the daf_butler PR, but I think any remaining changes at this point are quite straightforward.
            jbosch Jim Bosch added a comment -

            I assume it's not possible for `Registry.registerCollection` to return a boolean indicating whether it made the collection or not?

            Actually, this should be pretty easy - lots of changes to make to forward the value through various interface layers, but fundamentally this boils down to a Database.sync call, and that already returns the bool we need.

            jbosch Jim Bosch added a comment - I assume it's not possible for `Registry.registerCollection` to return a boolean indicating whether it made the collection or not? Actually, this should be pretty easy - lots of changes to make to forward the value through various interface layers, but fundamentally this boils down to a  Database.sync call, and that already returns the bool we need.
            tjenness Tim Jenness added a comment -

            If I'm reading the registerCollection code correctly, it seems that CollectionManager.register returns a CollectionRecord. This seems to suggest that the register method would have to do something like return a tuple of the CollectionRecord and the boolean. That looks to be trivial but I'm not sure if you are happy with doing that to the register method.

            tjenness Tim Jenness added a comment - If I'm reading the registerCollection code correctly, it seems that CollectionManager.register returns a CollectionRecord. This seems to suggest that the register method would have to do something like return a tuple of the CollectionRecord and the boolean. That looks to be trivial but I'm not sure if you are happy with doing that to the register method.
            jbosch Jim Bosch added a comment -

            No objection to returning a tuple there.

            jbosch Jim Bosch added a comment - No objection to returning a tuple there.
            tjenness Tim Jenness added a comment -

            I made registerRun and registerCollection return booleans and that simplified the collection reporting code.

            tjenness Tim Jenness added a comment - I made registerRun and registerCollection return booleans and that simplified the collection reporting code.

            People

              tjenness Tim Jenness
              tjenness Tim Jenness
              Jim Bosch
              Jim Bosch, Kian-Tat Lim, Michelle Gower, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Jenkins

                  No builds found.