Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-29820

Compare Gen2 vs. Gen3 fgcm photoCalibs up to w_2022_04 HSC-RC2 run

    XMLWordPrintable

    Details

    • Story Points:
      8
    • Epic Link:
    • Sprint:
      DRP S21a (Dec Jan), DRP S21b, DRP S22A
    • Team:
      Data Release Production
    • Urgent?:
      No

      Description

      As part of our journey towards deprecation of the "gen2" middleware in favor of "gen3", compare the fgcm external photoCalib objects from our first full RC2 run on both platforms (performed with the w_2021_14 stack).

        Attachments

          Issue Links

            Activity

            No builds found.
            lauren Lauren MacArthur created issue -
            lauren Lauren MacArthur made changes -
            Field Original Value New Value
            Link This issue relates to DM-29819 [ DM-29819 ]
            lauren Lauren MacArthur made changes -
            Status To Do [ 10001 ] In Progress [ 3 ]
            lauren Lauren MacArthur made changes -
            Watchers Eli Rykoff, Jim Bosch, Lauren MacArthur [ Eli Rykoff, Jim Bosch, Lauren MacArthur ] Eli Rykoff, Jim Bosch, Lauren MacArthur, Yusra AlSayyad [ Eli Rykoff, Jim Bosch, Lauren MacArthur, Yusra AlSayyad ]
            lauren Lauren MacArthur made changes -
            Attachment w14_RC2_gen2_vs_gen3_fgcm.txt [ 49113 ]
            Hide
            lauren Lauren MacArthur added a comment - - edited

            To facilitate this comparison, I have "hacked-up" the compareVisitAnalysis.py script in pipe_analysis to be able to read in datasets from both gen2 and gen3 output repositories. As a first check on the fgcm photoCalib objects persisted by the two middlewares for a single visit (1228 COSMOS HSC-I), the following are plots directly comparing the gen2 vs. gen3 src catalogs having applied the associated fgcm results for each (having confirmed that the non-corrected ones compare identically and all non-photometric quantities compare identically):


            As can be seen, there are differences (while at a very low level...too low to care? Answer may hinge on how different the coadds look...stay tuned for that!)

            The attached file also includes direct and percent difference between the values returned by the getCalibrationMean(), getCalibrationErr(), and getInstFluxAtZeroMagnitude() functions on the respective photoCalib objects for each ccd in the visit.

            Show
            lauren Lauren MacArthur added a comment - - edited To facilitate this comparison, I have "hacked-up" the compareVisitAnalysis.py script in pipe_analysis to be able to read in datasets from both gen2 and gen3 output repositories. As a first check on the fgcm photoCalib objects persisted by the two middlewares for a single visit (1228 COSMOS HSC-I), the following are plots directly comparing the gen2 vs. gen3 src catalogs having applied the associated fgcm results for each (having confirmed that the non-corrected ones compare identically and all non-photometric quantities compare identically): As can be seen, there are differences (while at a very low level...too low to care? Answer may hinge on how different the coadds look...stay tuned for that!) The attached file also includes direct and percent difference between the values returned by the getCalibrationMean() , getCalibrationErr() , and getInstFluxAtZeroMagnitude() functions on the respective photoCalib objects for each ccd in the visit.
            yusra Yusra AlSayyad made changes -
            Sprint DRP S21a (Dec Jan) [ 1071 ] DRP S21a (Dec Jan), DRP S21b [ 1071, 1094 ]
            lauren Lauren MacArthur made changes -
            Remote Link This issue links to "Page (Confluence)" [ 28352 ]
            lauren Lauren MacArthur made changes -
            Summary Compare the fgcm photoCalibs of the gen2 vs. gen3 w_2021_14 RC2 Compare the fgcm photoCalibs of the gen2 vs. gen3 w_2021_22 RC2
            Hide
            lauren Lauren MacArthur added a comment -

            Given the pre-fgcm issues discovered in DM-29818 (and resolved in DM-29881 & DM-30030), I have updated the subject of this comparison to be based on the more recent w_2021_22 RC2 runs (from those of  w_2021_14).

            Show
            lauren Lauren MacArthur added a comment - Given the pre-fgcm issues discovered in DM-29818 (and resolved in DM-29881 & DM-30030 ), I have updated the subject of this comparison to be based on the more recent w_2021_22 RC2 runs (from those of   w_2021_14).
            lauren Lauren MacArthur made changes -
            Link This issue relates to DM-30647 [ DM-30647 ]
            Hide
            lauren Lauren MacArthur added a comment - - edited

            We have now essentially reached parity between gen2 & gen3 SFM, with the small caveat that the gen3 w_2021_22 run had 17 failed quanta (see DM-30647 for full details).  As such, it's time to move on to the post-SFM comparisons...this being one of them!  Unfortunately, despite effective SFM parity, there is still a difference between the gen2 & gen3 fgcm results.  The same visit above shows similar levels of difference, but with a different pattern in the w_2021_22 runs (a WIDE visit comparison shows similar levels, as do visits in other bands):

            Seeing as I'm totally unfamiliar with the inner workings of fgcm, I fear I must reach out to the expert (singular!) on this one as to whether they have any insights into possible causes for (very small!) differences in the gen2 vs. gen3 fgcm solutions. So, here's looking at you Eli Rykoff! You've mentioned in the past that the fgcm is inherently non-deterministic, so parity in this case may not be achievable (although I thought there was a config setting turned on to make things deterministic for this parity testing). If such is the case (and it has been deemed an "acceptable" situation), then we may want to consider making coadds in both gen2 and gen3 but leaving out the fgcm calibrations so that we won't be second-guessing any downstream differences as "possibly" being rooted in residual fgcm effects.

            Show
            lauren Lauren MacArthur added a comment - - edited We have now essentially reached parity between gen2 & gen3 SFM, with the small caveat that the gen3 w_2021_22  run had 17 failed quanta (see DM-30647 for full details).  As such, it's time to move on to the post-SFM comparisons...this being one of them!  Unfortunately, despite effective SFM parity, there is still a difference between the gen2 & gen3 fgcm results.  The same visit above shows similar levels of difference, but with a different pattern in the w_2021_22 runs (a WIDE visit comparison shows similar levels, as do visits in other bands): Seeing as I'm totally unfamiliar with the inner workings of fgcm , I fear I must reach out to the expert (singular!) on this one as to whether they have any insights into possible causes for (very small!) differences in the gen2 vs. gen3 fgcm  solutions. So, here's looking at you Eli Rykoff ! You've mentioned in the past that the fgcm is inherently non-deterministic, so parity in this case may not be achievable (although I thought there was a config setting turned on to make things deterministic for this parity testing). If such is the case (and it has been deemed an "acceptable" situation), then we may want to consider making coadds in both gen2 and gen3 but leaving out the fgcm calibrations so that we won't be second-guessing any downstream differences as "possibly" being rooted in residual fgcm effects.
            yusra Yusra AlSayyad made changes -
            Epic Link DM-29153 [ 458510 ] DM-30472 [ 509197 ]
            Hide
            erykoff Eli Rykoff added a comment -

            Okay, a very quick comment here. To be clear, the (slightly) non-deterministic part of fgcm is that there is a randomized selection of reserved stars that are not used in the fits. We are forcing a random seed which will give repeatable results given identical inputs.

            In this case, I just checked and due to either processing failures or missing raws (the common refrain...) there were 432 visits in the Gen2 calibration and 415 visits in the Gen3 calibration. Therefore, even with the same random seed it is not reasonable to expect that the two fgcmcal runs will arrive the same solution.

            I do want to highlight that when we do get the processing/inputs squared away I am not 100% sure that the inputs will be given to fgcm in the same order, or that the input files will be sorted the same way (especially the parquet tables). Therefore, I'm not 100% sure that using the same random seed will result in the same selection of reserved stars.

            Show
            erykoff Eli Rykoff added a comment - Okay, a very quick comment here. To be clear, the (slightly) non-deterministic part of fgcm is that there is a randomized selection of reserved stars that are not used in the fits. We are forcing a random seed which will give repeatable results given identical inputs. In this case, I just checked and due to either processing failures or missing raws (the common refrain...) there were 432 visits in the Gen2 calibration and 415 visits in the Gen3 calibration. Therefore, even with the same random seed it is not reasonable to expect that the two fgcmcal runs will arrive the same solution. I do want to highlight that when we do get the processing/inputs squared away I am not 100% sure that the inputs will be given to fgcm in the same order, or that the input files will be sorted the same way (especially the parquet tables). Therefore, I'm not 100% sure that using the same random seed will result in the same selection of reserved stars.
            Hide
            erykoff Eli Rykoff added a comment -

            Oh, this is HSC, not something to do with missing raws...

            Show
            erykoff Eli Rykoff added a comment - Oh, this is HSC, not something to do with missing raws...
            Hide
            lauren Lauren MacArthur added a comment -

            Nope, but it is all about the abandoned visits due to single ccd failures (the 17 failed quanta noted on DM-30365).  It looks promising that these were all due to the HSM shape bug that has since been fixed, so the w_2021_26 run (happening now) will be free from this issue.  As such, I propose we punt once again until I’ve had a look at those results.

            Show
            lauren Lauren MacArthur added a comment - Nope, but it is all about the abandoned visits due to single ccd failures (the 17 failed quanta noted on DM-30365 ).  It looks promising that these were all due to the HSM shape bug that has since been fixed, so the w_2021_26 run (happening now) will be free from this issue.  As such, I propose we punt once again until I’ve had a look at those results.
            lauren Lauren MacArthur made changes -
            Link This issue relates to DM-31701 [ DM-31701 ]
            yusra Yusra AlSayyad made changes -
            Epic Link DM-30472 [ 509197 ] DM-30540 [ 511198 ]
            yusra Yusra AlSayyad made changes -
            Sprint DRP S21a (Dec Jan), DRP S21b [ 1071, 1094 ] DRP S21a (Dec Jan), DRP S21b, DRP S22A [ 1071, 1094, 1137 ]
            yusra Yusra AlSayyad made changes -
            Epic Link DM-30540 [ 511198 ] DM-30547 [ 511213 ]
            lauren Lauren MacArthur made changes -
            Summary Compare the fgcm photoCalibs of the gen2 vs. gen3 w_2021_22 RC2 Compare Gen2 vs. Gen3 fgcm photoCalibs up to w_2022_04 HSC-RC2 run
            lauren Lauren MacArthur made changes -
            lauren Lauren MacArthur made changes -
            Attachment image.png [ 56873 ]
            lauren Lauren MacArthur made changes -
            lauren Lauren MacArthur made changes -
            Attachment image.png [ 56873 ]
            lauren Lauren MacArthur made changes -
            lauren Lauren MacArthur made changes -
            lauren Lauren MacArthur made changes -
            lauren Lauren MacArthur made changes -
            Attachment plot-t9697-griPSF-wFit-fit.png [ 56882 ]
            lauren Lauren MacArthur made changes -
            Hide
            lauren Lauren MacArthur added a comment - - edited

            I have looked at the results of the latest w_2022_04 runs (Gen2: DM-33457, Gen3: DM-33402). The TL;DR is that I think that, while we don't have parity, we have reached the point where the differences are small enough (and expected given known areas that just can't be synced), that we can safely pull the plug on Gen2 for FGCM.

            For a few more details, having again confirmed parity at the visit-level without before applying the external calibrations, I made direct comparisons of the two runs with the calibrations applied (as above). Our worst-case in terms of a mean offset is this one:

            The following was the worst in terms of std:

            But many looked really close indeed:

            And then there are some with slightly funkier patterns...

            I think it is expected that the HSC-G filter seems to be the least well-behaved and some have order 10mmag offsets between Gen2/Gen3 (as in the first example above). If Eli Rykoff is not worried about this, neither am I!

            Show
            lauren Lauren MacArthur added a comment - - edited I have looked at the results of the latest  w_2022_04 runs (Gen2: DM-33457 , Gen3: DM-33402 ). The TL;DR is that I think that, while we don't have parity, we have reached the point where the differences are small enough (and expected given known areas that just can't be synced), that we can safely pull the plug on Gen2 for FGCM . For a few more details, having again confirmed parity at the visit-level without before applying the external calibrations, I made direct comparisons of the two runs with the calibrations applied (as above). Our worst-case in terms of a mean offset is this one: The following was the worst in terms of std: But many looked really close indeed: And then there are some with slightly funkier patterns... I think it is expected that the HSC-G filter seems to be the least well-behaved and some have order 10mmag offsets between Gen2/Gen3 (as in the first example above). If Eli Rykoff is not worried about this, neither am I!
            Hide
            lauren Lauren MacArthur added a comment - - edited

            As for how this propagates to differences in the coadds, this is our worst case:


            and our best case is tract 9615 HSC-Z with mean=0.04, stdev=0.99!

            Finally, the stellar loci look quite similar:
            Gen2:

            vs. Gen3:

            All-in-all, I say we are good to pull the plug on Gen2.

            Show
            lauren Lauren MacArthur added a comment - - edited As for how this propagates to differences in the coadds, this is our worst case: and our best case is tract 9615 HSC-Z with mean=0.04, stdev=0.99! Finally, the stellar loci look quite similar: Gen2: vs. Gen3: All-in-all, I say we are good to pull the plug on Gen2.
            Hide
            lauren Lauren MacArthur added a comment - - edited

            Let me know what you think.  And if there are any particular cases you want looked at, let me know, or poke around yourself at:
            Gen2:
            https://lsst.ncsa.illinois.edu/~lauren/HSC_RC2/w_2022_04/plots/

            Gen3:
            https://lsst.ncsa.illinois.edu/~lauren/HSC_RC2/w_2022_04/vsGen3/plots/

            (it is the latter that has the direct comparison plots).

            Show
            lauren Lauren MacArthur added a comment - - edited Let me know what you think.  And if there are any particular cases you want looked at, let me know, or poke around yourself at: Gen2: https://lsst.ncsa.illinois.edu/~lauren/HSC_RC2/w_2022_04/plots/ Gen3: https://lsst.ncsa.illinois.edu/~lauren/HSC_RC2/w_2022_04/vsGen3/plots/ (it is the latter that has the direct comparison plots).
            lauren Lauren MacArthur made changes -
            Reviewers Eli Rykoff [ erykoff ]
            Status In Progress [ 3 ] In Review [ 10004 ]
            Hide
            erykoff Eli Rykoff added a comment -

            Oh ugh, I do not like that large offset in 9796/34384 g-band that you posted.

            Can you point me to the specific repos where these were processed?

            Show
            erykoff Eli Rykoff added a comment - Oh ugh, I do not like that large offset in 9796/34384 g-band that you posted. Can you point me to the specific repos where these were processed?
            Hide
            lauren Lauren MacArthur added a comment -

            Yup (tickets listed above).
            Gen2: /datasets/hsc/repo/rerun/RC/w_2022_04/DM-33457
            Gen3 collection: HSC/runs/RC2/w_2022_04/DM-33402

            Would you like a list of more the worst cases?

            Show
            lauren Lauren MacArthur added a comment - Yup (tickets listed above). Gen2: /datasets/hsc/repo/rerun/RC/w_2022_04/ DM-33457 Gen3 collection: HSC/runs/RC2/w_2022_04/ DM-33402 Would you like a list of more the worst cases?
            Hide
            lauren Lauren MacArthur added a comment -

            E.g.
            'visit': 34422, 'filter': 'HSC-G', 'tract': 9697, Run Comparison: CircApRad12pix mag diff (mmag):

            {'star': Stats(mean=-10.1403; stdev=0.9072}

            'visit': 34342, 'filter': 'HSC-G', 'tract': 9697, Run Comparison: CircApRad12pix mag diff (mmag):

            {'star': Stats(mean=-9.8882; stdev=1.2969}

            'visit': 34400, 'filter': 'HSC-G', 'tract': 9697, Run Comparison: CircApRad12pix mag diff (mmag):

            {'star': Stats(mean=-9.7092; stdev=0.7417}
            Show
            lauren Lauren MacArthur added a comment - E.g. 'visit': 34422, 'filter': 'HSC-G', 'tract': 9697, Run Comparison: CircApRad12pix mag diff (mmag): {'star': Stats(mean=-10.1403; stdev=0.9072} 'visit': 34342, 'filter': 'HSC-G', 'tract': 9697, Run Comparison: CircApRad12pix mag diff (mmag): {'star': Stats(mean=-9.8882; stdev=1.2969} 'visit': 34400, 'filter': 'HSC-G', 'tract': 9697, Run Comparison: CircApRad12pix mag diff (mmag): {'star': Stats(mean=-9.7092; stdev=0.7417}
            lauren Lauren MacArthur made changes -
            Link This issue relates to DM-33402 [ DM-33402 ]
            lauren Lauren MacArthur made changes -
            Link This issue relates to DM-33457 [ DM-33457 ]
            Hide
            lauren Lauren MacArthur added a comment -
            Show
            lauren Lauren MacArthur added a comment - Oh, and FYI, the big offset is "new", as in it was much smaller (almost zero, in fact) in the previous run: https://lsst.ncsa.illinois.edu/~lauren/HSC_RC2/w_2021_46/vsGen3/plots/HSC-G/tract-9697/visit-34384/compareVisit-v34384-diff_base_CircularApertureFlux_12_0-psfMagHist.png
            erykoff Eli Rykoff made changes -
            Attachment gen2-3_g_fgcm_04.png [ 56941 ]
            erykoff Eli Rykoff made changes -
            Attachment gen3_g_fgcm_04_50.png [ 56942 ]
            Hide
            erykoff Eli Rykoff added a comment -

            Bad news and good news.

            I did a comparison between the fgcm outputs for the standard stars on gen2/gen3 on w_2022_04 and it looks like complete garbage:

            But I also did the same comparison between the standard stars on gen3 w_2021_50 and gen3 w_2022_04 and they look just fine:

            So whatever has gone terribly wrong seems to have gone wrong on the gen2 side. I do not think it is worth keeping gen2 around to investigate what has gone wrong on that end. (Hopefully Lauren MacArthur agrees.). However, it is vital that we monitor the statistics from run to run in gen3 in case there is some transient issue that just happened to hit in gen2. On the other hand, I think it's more likely that I inadvertently broke something on the gen2 side (not that there's any smoking gun, it's just that debugging gen2 does not seem like the best use of time.)

            Show
            erykoff Eli Rykoff added a comment - Bad news and good news. I did a comparison between the fgcm outputs for the standard stars on gen2/gen3 on w_2022_04 and it looks like complete garbage: But I also did the same comparison between the standard stars on gen3 w_2021_50 and gen3 w_2022_04 and they look just fine: So whatever has gone terribly wrong seems to have gone wrong on the gen2 side. I do not think it is worth keeping gen2 around to investigate what has gone wrong on that end. (Hopefully Lauren MacArthur agrees.). However, it is vital that we monitor the statistics from run to run in gen3 in case there is some transient issue that just happened to hit in gen2. On the other hand, I think it's more likely that I inadvertently broke something on the gen2 side (not that there's any smoking gun, it's just that debugging gen2 does not seem like the best use of time.)
            Hide
            lauren Lauren MacArthur added a comment -

            Lauren MacArthur agrees

            Show
            lauren Lauren MacArthur added a comment - Lauren MacArthur agrees
            Hide
            erykoff Eli Rykoff added a comment -

            I guess this is reviewed then?

            Show
            erykoff Eli Rykoff added a comment - I guess this is reviewed then?
            erykoff Eli Rykoff made changes -
            Status In Review [ 10004 ] Reviewed [ 10101 ]
            lauren Lauren MacArthur made changes -
            Resolution Done [ 10000 ]
            Status Reviewed [ 10101 ] Done [ 10002 ]
            lauren Lauren MacArthur made changes -
            Story Points 8
            Hide
            lauren Lauren MacArthur added a comment -

            I agree again!  Thanks, Eli.  I will update the confluence page to note our collective assessment.

            Show
            lauren Lauren MacArthur added a comment - I agree again!  Thanks, Eli.  I will update the confluence page to note our collective assessment.
            lauren Lauren MacArthur made changes -
            Remote Link This issue links to "Page (Confluence)" [ 32225 ]

              People

              Assignee:
              lauren Lauren MacArthur
              Reporter:
              lauren Lauren MacArthur
              Reviewers:
              Eli Rykoff
              Watchers:
              Eli Rykoff, Jim Bosch, Lauren MacArthur, Yusra AlSayyad
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.