Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-32826

Investigate metric variations in verify_drp_metrics after mid-November updates

    XMLWordPrintable

Details

    • Story
    • Status: Done
    • Resolution: Done
    • None
    • faro

    Description

      After we updated the rc2_subset nightly CI processing to inherit directly from obs_subaru's DRP.yaml pipeline, the values of metrics on our dashboard started flopping around ~nightly. Figure out why they are changing so frequently (pipelines do not change nightly).

      Attachments

        Issue Links

          Activity

            Looking in /project/jenkins/prod/agent-ldfc-ws-?/ws/sqre/verify_drp_metrics/datasets/rc2_subset/SMALL_HSC/ for "jenkins/step3" repos (for example), I only see one from today's processing (Dec. 6). It doesn't seem that the artifacts from verify_drp_metrics processing have been retained for any other daily processing runs, so there are not two separate repositories to compare.

            Instead, I will try running the rc2_subset processing twice, placing the outputs in separate repos, and see if the processing itself is non-deterministic.

            jcarlin Jeffrey Carlin added a comment - Looking in /project/jenkins/prod/agent-ldfc-ws-?/ws/sqre/verify_drp_metrics/datasets/rc2_subset/SMALL_HSC/ for "jenkins/step3" repos (for example), I only see one from today's processing (Dec. 6). It doesn't seem that the artifacts from verify_drp_metrics processing have been retained for any other daily processing runs, so there are not two separate repositories to compare. Instead, I will try running the rc2_subset processing twice, placing the outputs in separate repos, and see if the processing itself is non-deterministic.
            jcarlin Jeffrey Carlin added a comment - - edited

            I ran the rc2_subset processing twice, using identical configurations. Indeed, the metric values come out slightly different (PA1: 6.985124378559791 mmag vs. PA1: 6.774156219385405 mmag). The attached image shows a comparison of the deepCoadd_calexp images for a randomly selected patch – the two separate processing runs are the left and middle panels, and the right panel shows the difference between those images. There are clear differences between the coadd images.

            The same is true of the deepCoadd images, implying that the relative calibrations are the same, and that the coadded images are intrinsically different.

            I compared two randomly selected visit images (calexps) of the same dataId, and they are identical. Thus the differences between runs must have occurred after single-frame processing.

            Another suggestion was that the random number seed for FGCM may not have been fixed, so that FGCM would return different results for consecutive runs. I confirmed that randomSeed from both fgcmFitCycle_configs is the same (89234, as in obs_subaru/DRP.yaml).

            I checked the FGCM calibrations for the two runs (for the same dataId), and they differ slightly, so my guess is that FGCM is deriving slightly different solutions based on the ordering of inputs.

            jcarlin Jeffrey Carlin added a comment - - edited I ran the rc2_subset processing twice, using identical configurations. Indeed, the metric values come out slightly different (PA1: 6.985124378559791 mmag vs. PA1: 6.774156219385405 mmag). The attached image shows a comparison of the deepCoadd_calexp  images for a randomly selected patch – the two separate processing runs are the left and middle panels, and the right panel shows the difference between those images. There are clear differences between the coadd images. The same is true of the deepCoadd images, implying that the relative calibrations are the same, and that the coadded images are intrinsically different. I compared two randomly selected visit images ( calexps ) of the same dataId, and they are identical. Thus the differences between runs must have occurred after single-frame processing. Another suggestion was that the random number seed for FGCM may not have been fixed, so that FGCM would return different results for consecutive runs. I confirmed that randomSeed from both fgcmFitCycle_configs is the same (89234, as in obs_subaru/DRP.yaml). I checked the FGCM calibrations for the two runs (for the same dataId), and they differ slightly, so my guess is that FGCM is deriving slightly different solutions based on the ordering of inputs.

            Here's an illustration of the level of fluctuation we're seeing in the metrics:

            jcarlin Jeffrey Carlin added a comment - Here's an illustration of the level of fluctuation we're seeing in the metrics:

            In this slack thread, we determined that the differences in FGCM calibs arise because the entries in sourceTable_visit are not sorted, and thus are not identical from one run to the next (with all configs the same). (The measurements in the tables are identical, they are just sorted differently.) DM-33158 has been created to address this.

            jcarlin Jeffrey Carlin added a comment - In  this slack thread , we determined that the differences in FGCM calibs arise because the entries in sourceTable_visit are not sorted, and thus are not identical from one run to the next (with all configs the same). (The measurements in the tables are identical, they are just sorted differently.) DM-33158 has been created to address this.

            After DM-33158, I tested by running the same pipeline+configs twice in a row. The results from FGCM (and coaddition, faro, etc.) are identical, so I think this can be considered "fixed." 

            Once Jenkins is back on its feet, we can confirm that the verify_drp_metrics dashboard gives consistent results from day to day.

            jcarlin Jeffrey Carlin added a comment - After DM-33158 , I tested by running the same pipeline+configs twice in a row. The results from FGCM (and coaddition, faro, etc.) are identical, so I think this can be considered "fixed."  Once Jenkins is back on its feet, we can confirm that the verify_drp_metrics dashboard gives consistent results from day to day.
            jcarlin Jeffrey Carlin added a comment - - edited

            We now have 3 days in a row of metrics coming out the same (see attached image). I think we can call this a success!

            jcarlin Jeffrey Carlin added a comment - - edited We now have 3 days in a row of metrics coming out the same (see attached image). I think we can call this a success!
            lguy Leanne Guy added a comment -

            Excellent

            lguy Leanne Guy added a comment - Excellent

            People

              jcarlin Jeffrey Carlin
              jcarlin Jeffrey Carlin
              Leanne Guy
              Jeffrey Carlin, Leanne Guy
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Jenkins

                  No builds found.