Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-34896

Collate timing metrics from run on 4-patch >200 visit DC2 dataset

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Story Points:
      2
    • Sprint:
      AP F22-1 (June)
    • Team:
      Alert Production
    • Urgent?:
      No

      Description

      Collate the timing output of a run on the large (4 patches with >200 visits) DC2 dataset, and compute some statistics on it (mean, median, stddev, min/max, top/bottom 5%).

      In addition, we should look at how long `timing_diaPipe_associator` takes, as a function of "number of the visit in the processing sequence" (which may not be the visit number, if we cannot ensure an ordering). We can use the DiaObject records in APDB to determine the number of DiaSources to compute (mean, median, or max?) nDiaSources in that visit as a proxy for processing order (i.e. plot association time vs. number of objects).

        Attachments

          Issue Links

            Activity

            Hide
            mrawls Meredith Rawls added a comment -

            It is straightforward to collate timing metrics for all tasks, except for association in the level of detail desired. I'd like to pair code with Eric Bellm to figure out how to pull this info out of the DiaObject table.

            Show
            mrawls Meredith Rawls added a comment - It is straightforward to collate timing metrics for all tasks, except for association in the level of detail desired. I'd like to pair code with Eric Bellm  to figure out how to pull this info out of the DiaObject table.
            Hide
            mrawls Meredith Rawls added a comment -

            This and DM-34898 were both done in the same notebook, which is in the ap_pipe-notebooks repository on the tickets/DM-34896 branch.

            Show
            mrawls Meredith Rawls added a comment - This and DM-34898 were both done in the same notebook, which is in the ap_pipe-notebooks repository on the tickets/ DM-34896 branch.
            Hide
            ebellm Eric Bellm added a comment -

            Hi Meredith Rawls, looks good. A couple of small-ish suggestions on the notebook:

            • I think it would be helpful to see the DIApipe vs. MJD processing time colored by detector-visit
            • I would also make the same plot for all of the existing metricvalue_ap_association_* timing metrics so we can (perhaps) isolate more clearly where the scaling is coming from--I'm hopeful it is just in the catalog loading step?
            • Can we estimate how many historical DIASources there are at each timestep?
            • Go ahead and add a few explanatory text cells throughout
            • Delete extraneous debugging cells that print long dictionaries
            Show
            ebellm Eric Bellm added a comment - Hi Meredith Rawls , looks good. A couple of small-ish suggestions on the notebook: I think it would be helpful to see the DIApipe vs. MJD processing time colored by detector-visit I would also make the same plot for all of the existing metricvalue_ap_association_* timing metrics so we can (perhaps) isolate more clearly where the scaling is coming from--I'm hopeful it is just in the catalog loading step? Can we estimate how many historical DIASources there are at each timestep? Go ahead and add a few explanatory text cells throughout Delete extraneous debugging cells that print long dictionaries
            Hide
            mrawls Meredith Rawls added a comment -

            I made the requested notebook edits, and merged to main already (minor whoops). If you want additional edits, LMK, otherwise this should be all done. The sprint confluence page is also updated.

            Show
            mrawls Meredith Rawls added a comment - I made the requested notebook edits, and merged to main already (minor whoops). If you want additional edits, LMK, otherwise this should be all done. The sprint confluence page is also updated.
            Hide
            ebellm Eric Bellm added a comment -

            I think that covers everything!

            Show
            ebellm Eric Bellm added a comment - I think that covers everything!

              People

              Assignee:
              mrawls Meredith Rawls
              Reporter:
              Parejkoj John Parejko
              Reviewers:
              Eric Bellm
              Watchers:
              Eric Bellm, John Parejko, Meredith Rawls
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.