Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-35803

Add DataFrameDelegate for using DataFrames with InMemoryDatasetHandle

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: daf_butler
    • Labels:
      None

      Description

      The new InMemoryDatasetHandle needs a delegate to be able to read columns from a DataFrame.

      This is straightforward to add for simple use cases (specifically, getting a list of columns that are specified with a list of strings). The full Parquet formatter supports various index tuples as well. I leave supporting this to the future if it becomes necessary; in the meantime it will raise a NotImplementedError.

      When this is done it can be used in tests like https://github.com/lsst/pipe_tasks/blob/main/tests/test_isolatedStarAssociation.py

        Attachments

          Activity

          Hide
          tjenness Tim Jenness added a comment -

          These changes look fine to me although I don't like all the code duplication between formatter tests and delegate tests.

          I think you might get the same test coverage in a simpler way if your delegate test case is a class that inherits from the formatter one but has a different setUp that configures an in-memory datastore rather than a file datastore. This should trigger all the delegate code (which can be checked in the code coverage). You can then have additional test code for the error conditions and the storage class finding.

          Regarding the findStorageClass testing, since the code short circuits if the DataFrame storage class python type has already been loaded (which will have been done in the other tests) it is likely not testing what you think it is testing because compare_types has no effect if the type is already known. You may need to change the test to first get the DataFrame storage class from the factory and then force its pytype to be None.

          Show
          tjenness Tim Jenness added a comment - These changes look fine to me although I don't like all the code duplication between formatter tests and delegate tests. I think you might get the same test coverage in a simpler way if your delegate test case is a class that inherits from the formatter one but has a different setUp that configures an in-memory datastore rather than a file datastore. This should trigger all the delegate code (which can be checked in the code coverage). You can then have additional test code for the error conditions and the storage class finding. Regarding the findStorageClass testing, since the code short circuits if the DataFrame storage class python type has already been loaded (which will have been done in the other tests) it is likely not testing what you think it is testing because compare_types has no effect if the type is already known. You may need to change the test to first get the DataFrame storage class from the factory and then force its pytype to be None.

            People

            Assignee:
            erykoff Eli Rykoff
            Reporter:
            erykoff Eli Rykoff
            Reviewers:
            Tim Jenness
            Watchers:
            Eli Rykoff, Tim Jenness
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:

                Jenkins

                No builds found.