Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-15887

Make Pipeline work with non-standard storage classes

    XMLWordPrintable

    Details

      Description

      Pipeline builder needs to instantiate DatasetTypes which depend on StorageClass (and they both are configured via task config). We want to keep pipeline builder independent of butler but that means that initialization of StorageClassFactory becomes an issue. DM-15850 adds support for loading all standard StorageClasses (which come from a standard YAML config) but any non-standard configuration will become an issue in this approach.

      I want to see how we can solve this problem by either pre-loading non-standard config for the factory or avoiding its use entirely. 

        Attachments

          Issue Links

            Activity

            Hide
            jbosch Jim Bosch added a comment -

            Could we make attachment of a StorageClass instance to a DatasetType instance something only a Butler/Registry/Datastore can do, in the same sense that a Registry is essentially necessary to add a dataset_id to a DatasetRef?

            I think in most user-facing contexts only requiring a string is clearly better, but I gather not having the StorageClass attached to the DatasetType would be a pretty big annoyance in the internals (which I think is confirmed by Tim Jenness's comments above).  We could consider having a variant/subclass of DatasetType that adds that information, and in more a strongly-typed or rigorously OO language I think that'd be the way to go.  In Python it's less clear.

            Show
            jbosch Jim Bosch added a comment - Could we make attachment of a StorageClass instance to a DatasetType instance something only a Butler/Registry/Datastore can do, in the same sense that a Registry is essentially necessary to add a dataset_id to a DatasetRef? I think in most user-facing contexts only requiring a string is clearly better, but I gather not having the StorageClass attached to the DatasetType would be a pretty big annoyance in the internals (which I think is confirmed by Tim Jenness 's comments above).  We could consider having a variant/subclass of DatasetType that adds that information, and in more a strongly-typed or rigorously OO language I think that'd be the way to go.  In Python it's less clear.
            Hide
            salnikov Andy Salnikov added a comment -

            I'm not really sure how to do that. We could make a class which "wraps" string-only-DatasetType with DatasetType-with-storageclass when DatasetType instances cross from a user land into a butler land, but that looks potentially fragile and high-maintenance. What Tim proposed (dynamic storageClass property) is probably better if you accept that it can behave unpredictably before you initialize Butler (which probably only matters for unit tests).

             

            Show
            salnikov Andy Salnikov added a comment - I'm not really sure how to do that. We could make a class which "wraps" string-only-DatasetType with DatasetType-with-storageclass when DatasetType instances cross from a user land into a butler land, but that looks potentially fragile and high-maintenance. What Tim proposed (dynamic storageClass property) is probably better if you accept that it can behave unpredictably before you initialize Butler (which probably only matters for unit tests).  
            Hide
            salnikov Andy Salnikov added a comment -

            Ready for review, the only non-trivial thing is that I had to re-implement _eq and hash_ methods for DatasetType. Jenkins has built is successfully for centos6/7, macos is still building.

            Show
            salnikov Andy Salnikov added a comment - Ready for review, the only non-trivial thing is that I had to re-implement _ eq and hash _ methods for DatasetType. Jenkins has built is successfully for centos6/7, macos is still building.
            Hide
            tjenness Tim Jenness added a comment -

            Looks good to me. Sorry for delay in reviewing.

            Show
            tjenness Tim Jenness added a comment - Looks good to me. Sorry for delay in reviewing.
            Hide
            salnikov Andy Salnikov added a comment -

            Thanks for review! Merged all three packages.

            Show
            salnikov Andy Salnikov added a comment - Thanks for review! Merged all three packages.

              People

              Assignee:
              salnikov Andy Salnikov
              Reporter:
              salnikov Andy Salnikov
              Reviewers:
              Tim Jenness
              Watchers:
              Andy Salnikov, Jim Bosch, Tim Jenness, Vaikunth Thukral
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  CI Builds

                  No builds found.