Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-27008

Update Functor implementation to use gen3 butler/parquet access

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Story Points:
      8
    • Epic Link:
    • Team:
      Data Release Production
    • Urgent?:
      No

      Description

      Gen3 parquet access is fundamentally different from gen2.  In gen2, a butler.get  call on a parquet dataset call returned a lazy ParquetTable object that doesn't load any data until requested by .getDataFrame at which point specific columns can be requested.  However, in gen3, a butler.get call returns data directly, and specific columns are requested at this time.  

      The Functor implementation currently in pipe_tasks is based on the gen2 ParquetTable object, as it takes one of those as the input to its _call_ method.  Updating for gen3 butler will need to change this.  

      A related concern is that one of the reasons for the gen2 ParquetTable implementation to be the way it was was to not have to read parquet metadata with every data load.  Part of this ticket should be to explore the degree to which this metadata loading affects practical use cases, and if anything needs to be changed in the gen3 parquet-interaction layer.

        Attachments

          Issue Links

            Activity

            Hide
            tmorton Tim Morton [X] (Inactive) added a comment -

            Yes, good idea. I've added a test that catches the issue (and, incidentally, another bug that also got caught by the same test!) in the latest PR (jenkins running again at https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/33302/pipeline).

            Show
            tmorton Tim Morton [X] (Inactive) added a comment - Yes, good idea. I've added a test that catches the issue (and, incidentally, another bug that also got caught by the same test!) in the latest PR (jenkins running again at https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/33302/pipeline ).
            Hide
            yusra Yusra AlSayyad added a comment -

            That's great you added a unit test.

            To Lauren MacArthur's point, there is test coverage for commandline|pipelineTasks: ci_hsc. We are expected to run ci_hsc in Jenkins (esp for tickets that affect DRP) before merging.

            Show
            yusra Yusra AlSayyad added a comment - That's great you added a unit test. To Lauren MacArthur 's point, there is test coverage for commandline|pipelineTasks: ci_hsc . We are expected to run ci_hsc in Jenkins (esp for tickets that affect DRP) before merging.
            Hide
            tmorton Tim Morton [X] (Inactive) added a comment -

            Thanks, Yusra AlSayyad. I'll remember to add ci_hsc to Jenkins in the future. I don't know if Lauren MacArthur is available today or not, but if either of you is able to mark this reviewed, then I can merge today.

            Show
            tmorton Tim Morton [X] (Inactive) added a comment - Thanks, Yusra AlSayyad . I'll remember to add ci_hsc to Jenkins in the future. I don't know if Lauren MacArthur is available today or not, but if either of you is able to mark this reviewed, then I can merge today.
            Hide
            lauren Lauren MacArthur added a comment -

            I don't have time in the next few days to actually look over the code, so any review today would simply consist of me confirming that I saw the successful jenkins + ci_hsc.  I'm not entirely sure that's adequate, but might be given the previous review?

            Show
            lauren Lauren MacArthur added a comment - I don't have time in the next few days to actually look over the code, so any review today would simply consist of me confirming that I saw the successful jenkins + ci_hsc .  I'm not entirely sure that's adequate, but might be given the previous review?
            Hide
            tmorton Tim Morton [X] (Inactive) added a comment -

            I would guess so? Since Sophie Reed already reviewed the first time, and the re-review was just because ci_hsc was failing, I would guess that's ok?

            Show
            tmorton Tim Morton [X] (Inactive) added a comment - I would guess so? Since Sophie Reed already reviewed the first time, and the re-review was just because ci_hsc was failing, I would guess that's ok?

              People

              Assignee:
              tmorton Tim Morton [X] (Inactive)
              Reporter:
              tmorton Tim Morton [X] (Inactive)
              Reviewers:
              Lauren MacArthur
              Watchers:
              Jim Bosch, Kian-Tat Lim, Lauren MacArthur, Sophie Reed, Tim Jenness, Tim Morton [X] (Inactive), Yusra AlSayyad
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.