Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-28527

Bad results (and unexpectd slowness) from query-datasets

    XMLWordPrintable

    Details

    • Story Points:
      1
    • Epic Link:
    • Team:
      Data Release Production
    • Urgent?:
      No

      Description

      This query returns incorrectly returns no results:

      $ butler query-datasets /project/hsc/gen3repo/rc2w02_ssw03 bfKernel --collections='*' 

      even though this query succeeds:

      $ butler query-datasets /project/hsc/gen3repo/rc2w02_ssw03 'bfKernel' --collections=HSC/calib/unbounded
      

      The latter query certainly felt much slower than it should. We should at least profile it.

      Something seems to be going wrong with the logic that attempts to query all RUN collections (and only RUN collections) when the collections are unconstrained.

        Attachments

          Activity

          Hide
          jbosch Jim Bosch added a comment -

          Nate Pease [X], sorry about hitting you up for back-to-back reviews, but this one is also small and even more in "your" part of daf_butler, so I'd like to make sure this isn't going in what you'd consider the wrong direction.

          See the (only) commit message re what the problem was and why I fixed it this way (also the PR description).

          As for the performance aspect of the ticket description, I did some profiling and it's totally dominated by butler startup costs (Python imports and aggressive fetching from the DB in particular).  So while that's not great, and something for us to look out for, it's not easily fixed and hence not something I'm going to bother with on this ticket.

          Show
          jbosch Jim Bosch added a comment - Nate Pease [X] , sorry about hitting you up for back-to-back reviews, but this one is also small and even more in "your" part of daf_butler, so I'd like to make sure this isn't going in what you'd consider the wrong direction. See the (only) commit message re what the problem was and why I fixed it this way (also the PR description). As for the performance aspect of the ticket description, I did some profiling and it's totally dominated by butler startup costs (Python imports and aggressive fetching from the DB in particular).  So while that's not great, and something for us to look out for, it's not easily fixed and hence not something I'm going to bother with on this ticket.
          Hide
          npease Nate Pease [X] (Inactive) added a comment -

          no problem re. reviews.

          the logic seems fine. There's a small change you can make to simply code, noted in the PR.

          Show
          npease Nate Pease [X] (Inactive) added a comment - no problem re. reviews. the logic seems fine. There's a small change you can make to simply code, noted in the PR.

            People

            Assignee:
            jbosch Jim Bosch
            Reporter:
            jbosch Jim Bosch
            Reviewers:
            Nate Pease [X] (Inactive)
            Watchers:
            Jim Bosch, Nate Pease [X] (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:

                Jenkins

                No builds found.