Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-34556

Use DP0.2 to test loading large numbers of metrics

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Invalid
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: faro
    • Labels:
      None
    • Team:
      DM Science
    • Urgent?:
      No

      Description

      use DP0.2 as a testbed to study the performance of querying over many thousands of metric values persisted as lsst.verify.Measurement objects and start to think about the workflows to compile summary statistics and correlate with metadata

       

        Attachments

          Issue Links

            Activity

            Hide
            tjenness Tim Jenness added a comment -

            I've just been told that people are waiting on me to change butler to use JSON rather than YAML. Is that correct? I did not think I was working on DM-31617. The questions I had on that ticket relate to the verify package which is not middleware.

            Show
            tjenness Tim Jenness added a comment - I've just been told that people are waiting on me to change butler to use JSON rather than YAML. Is that correct? I did not think I was working on DM-31617 . The questions I had on that ticket relate to the verify package which is not middleware.
            Hide
            lguy Leanne Guy added a comment -

            I added this as a suggested topic to the DMLT VF2F next week to get a status update on it. I was not necessecarily thinking it was you doing it.

            Show
            lguy Leanne Guy added a comment - I added this as a suggested topic to the DMLT VF2F next week to get a status update on it. I was not necessecarily thinking it was you doing it.
            Hide
            jsick Jonathan Sick added a comment -

            I saw this referenced from the DMLT f2f and I was curious about how lsst.verify.Measurement is related to YAML storage. Traditionally metric and specification definitions were stored in human-editable YAML (https://github.com/lsst/verify_metrics) and measurements were serialized out to JSON. Are measurements being written out to YAML now?

            Another thing, if the slow-down is happening while creating measurements, I just want to highlight that lsst.verify.Measurement doesn’t need a loaded Metric instance in order to create a measurement. Instead, we envisioned that measurement code would pass the metric’s name as a string and only later analysis code would load a Metric instance from the YAML repos. See https://pipelines.lsst.io/py-api/lsst.verify.Measurement.html#lsst.verify.Measurement

            In other words, a serialized Measurement probably shouldn’t (or at least, shouldn’t have to) have the full Metric in its data.

            Of course, I’m not up to date on how lsst.verify is being used now, but I just wanted to highlight this in case were missing a basic optimization.

            Show
            jsick Jonathan Sick added a comment - I saw this referenced from the DMLT f2f and I was curious about how lsst.verify.Measurement is related to YAML storage. Traditionally metric and specification definitions were stored in human-editable YAML ( https://github.com/lsst/verify_metrics ) and measurements were serialized out to JSON. Are measurements being written out to YAML now? Another thing, if the slow-down is happening while creating measurements, I just want to highlight that lsst.verify.Measurement doesn’t need a loaded Metric instance in order to create a measurement. Instead, we envisioned that measurement code would pass the metric’s name as a string and only later analysis code would load a Metric instance from the YAML repos. See  https://pipelines.lsst.io/py-api/lsst.verify.Measurement.html#lsst.verify.Measurement In other words, a serialized Measurement probably shouldn’t (or at least, shouldn’t have to) have the full Metric in its data. Of course, I’m not up to date on how lsst.verify is being used now, but I just wanted to highlight this in case were missing a basic optimization.
            Hide
            tjenness Tim Jenness added a comment -

            I have some notes written up on DMTN-203 – DM-31599

            Writing metrics to YAML was a mistake caused by it being the easiest possible approach to get a quick butler test going. It wasn't meant to be the end game or even end up in production that way. The problem is that Measurement does not follow any of the conventions for serialization and reconstruction supported by the JSON formatter (which supports some different approaches, including pydantic) and so could not be used directly. This is discussed in DM-31617. We either need a specialist MeasurementFormatter or need to tweak its API to match something like pydantic.

            Show
            tjenness Tim Jenness added a comment - I have some notes written up on DMTN-203 – DM-31599 Writing metrics to YAML was a mistake caused by it being the easiest possible approach to get a quick butler test going. It wasn't meant to be the end game or even end up in production that way. The problem is that Measurement does not follow any of the conventions for serialization and reconstruction supported by the JSON formatter (which supports some different approaches, including pydantic) and so could not be used directly. This is discussed in DM-31617 . We either need a specialist MeasurementFormatter or need to tweak its API to match something like pydantic.
            Hide
            tjenness Tim Jenness added a comment -

            I've also just realized that the timing test at the start of this ticket is not optimal. If you have queryDatasets results you should not then call butler.get because butler will take the ref and expand it and do another query to check that it's consistent. You need to use butler.getDirect to bypass the registry.

            Show
            tjenness Tim Jenness added a comment - I've also just realized that the timing test at the start of this ticket is not optimal. If you have queryDatasets results you should not then call butler.get because butler will take the ref and expand it and do another query to check that it's consistent. You need to use butler.getDirect to bypass the registry.

              People

              Assignee:
              pferguson3 Peter Ferguson
              Reporter:
              pferguson3 Peter Ferguson
              Watchers:
              Colin Slater, Jonathan Sick, Keith Bechtol, Lauren MacArthur, Leanne Guy, Peter Ferguson, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.