Fix Version/s: None
After we updated the rc2_subset nightly CI processing to inherit directly from obs_subaru's DRP.yaml pipeline, the values of metrics on our dashboard started flopping around ~nightly. Figure out why they are changing so frequently (pipelines do not change nightly).
I ran the rc2_subset processing twice, using identical configurations. Indeed, the metric values come out slightly different (PA1: 6.985124378559791 mmag vs. PA1: 6.774156219385405 mmag). The attached image shows a comparison of the deepCoadd_calexp images for a randomly selected patch – the two separate processing runs are the left and middle panels, and the right panel shows the difference between those images. There are clear differences between the coadd images.
The same is true of the deepCoadd images, implying that the relative calibrations are the same, and that the coadded images are intrinsically different.
I compared two randomly selected visit images (calexps) of the same dataId, and they are identical. Thus the differences between runs must have occurred after single-frame processing.
Another suggestion was that the random number seed for FGCM may not have been fixed, so that FGCM would return different results for consecutive runs. I confirmed that randomSeed from both fgcmFitCycle_configs is the same (89234, as in obs_subaru/DRP.yaml).
I checked the FGCM calibrations for the two runs (for the same dataId), and they differ slightly, so my guess is that FGCM is deriving slightly different solutions based on the ordering of inputs.
Here's an illustration of the level of fluctuation we're seeing in the metrics:
In this slack thread, we determined that the differences in FGCM calibs arise because the entries in sourceTable_visit are not sorted, and thus are not identical from one run to the next (with all configs the same). (The measurements in the tables are identical, they are just sorted differently.)
DM-33158 has been created to address this.
DM-33158, I tested by running the same pipeline+configs twice in a row. The results from FGCM (and coaddition, faro, etc.) are identical, so I think this can be considered "fixed."
Once Jenkins is back on its feet, we can confirm that the verify_drp_metrics dashboard gives consistent results from day to day.
Looking in /project/jenkins/prod/agent-ldfc-ws-?/ws/sqre/verify_drp_metrics/datasets/rc2_subset/SMALL_HSC/ for "jenkins/step3" repos (for example), I only see one from today's processing (Dec. 6). It doesn't seem that the artifacts from verify_drp_metrics processing have been retained for any other daily processing runs, so there are not two separate repositories to compare.
Instead, I will try running the rc2_subset processing twice, placing the outputs in separate repos, and see if the processing itself is non-deterministic.