Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-21448

Clean up DatasetRef comparisons and immutability

    Details

      Description

      (Previous title was "Refactor DatasetRef into multiple better-defined classes", but now we're changing DatasetRef).

      The current DatasetRef has too many fields (provenance stuff we never use), too many states (whether or not it has a dataset_id completely changes what it represents), and ill-defined comparisons.  It's almost-but-not-quite immutable, which makes it tempting but a bit dangerous to use as the key type in a set or dict.

       

        Attachments

          Issue Links

            Activity

            Hide
            jbosch Jim Bosch added a comment - - edited

            I am returning to work on this now, but plan to lower ambitions significantly:

            • DatasetRef will be retained, and there will be no new classes in this area.
            • DatasetRef equality will be defined as a comparison of the dataset type, data ID, and dataset_id; this will make "resolved" DatasetRefs (which know their dataset_id) compare more precisely to each other, and always compare as unequal to unresolved DatasetRefs.
            • Add resolved() and unresolved() methods to add and remove ID, run, and component information (returning new objects).
            • Remove the reference to quantum from DatasetRef (we don't use this at all right now, and I don't plan to use it in the future).
            • Make DatasetRef truly immutable (or at least fixing the attributes used in comparisons; maybe not components), instead of letting some classes (i.e. Registry) modify it in-place.
            • Make DatasetRef hashable.

             

            Show
            jbosch Jim Bosch added a comment - - edited I am returning to work on this now, but plan to lower ambitions significantly: DatasetRef will be retained, and there will be no new classes in this area. DatasetRef equality will be defined as a comparison of the dataset type, data ID, and dataset_id; this will make "resolved" DatasetRefs (which know their dataset_id) compare more precisely to each other, and always compare as unequal to unresolved DatasetRefs. Add resolved() and unresolved() methods to add and remove ID, run, and component information (returning new objects). Remove the reference to quantum from DatasetRef (we don't use this at all right now, and I don't plan to use it in the future). Make DatasetRef truly immutable (or at least fixing the attributes used in comparisons; maybe not components), instead of letting some classes (i.e. Registry) modify it in-place. Make DatasetRef hashable.  
            Hide
            jbosch Jim Bosch added a comment -

            After much experimentation, I've decided not to change very much here; having different classes to distinguish between different states doesn't help very much when all of the typing is dynamic anyway.  Tim Jenness, mind reviewing?

            PR is https://github.com/lsst/daf_butler/pull/210

             

            Show
            jbosch Jim Bosch added a comment - After much experimentation, I've decided not to change very much here; having different classes to distinguish between different states doesn't help very much when all of the typing is dynamic anyway.  Tim Jenness , mind reviewing? PR is https://github.com/lsst/daf_butler/pull/210  
            Hide
            tjenness Tim Jenness added a comment -

            Looks good to me. I have one fairly substantive comment on the PR concerning components.

            Show
            tjenness Tim Jenness added a comment - Looks good to me. I have one fairly substantive comment on the PR concerning components.
            Hide
            jbosch Jim Bosch added a comment -

            I've added a PR for ctrl_mpexec, where setting components to None for unresolved DatasetRefs broke some code (noticed in a ci_hsc_gen3 failure).

            PR is here: https://github.com/lsst/ctrl_mpexec/pull/37

            The change is trivial (once you understand the context, at least) so I'll merge once Jenkins passes, but I'd love to get a sign-off on that PR, too.

            Show
            jbosch Jim Bosch added a comment - I've added a PR for ctrl_mpexec, where setting components to None for unresolved DatasetRefs broke some code (noticed in a ci_hsc_gen3 failure). PR is here: https://github.com/lsst/ctrl_mpexec/pull/37 The change is trivial (once you understand the context, at least) so I'll merge once Jenkins passes, but I'd love to get a sign-off on that PR, too.

              People

              • Assignee:
                jbosch Jim Bosch
                Reporter:
                jbosch Jim Bosch
                Reviewers:
                Tim Jenness
                Watchers:
                Jim Bosch, Tim Jenness
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Summary Panel