Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-29881

Investigate differences in gen2 vs. gen3 SFP products for HSC-Y

    XMLWordPrintable

    Details

    • Story Points:
      4
    • Epic Link:
    • Sprint:
      DRP S21a (Dec Jan), DRP S21b
    • Team:
      Data Release Production
    • Urgent?:
      No

      Description

      In the full RC2 check of the gen2 vs. gen3 comparisons of the data products up to the end of Single Frame Processing, it was noted that we are not getting bitwise-identical outputs for HSC-Y datasets.  Investigate the cause and hopefully come up with a solution.

      Seeing as differences have only thus far been seen in HSC-Y, a likely culprit is in the fringe correction ISR stage.

        Attachments

          Issue Links

            Activity

            Hide
            lauren Lauren MacArthur added a comment -

            I have indeed isolated the differences to be in the fringe correction stage of ISR.  Running with w_2021_17, here is the difference between the gen2 vs. gen3 images just after the fringe correction gets applied (noting that they were bitwise identical in image/variance/mask planes just prior):

            Show
            lauren Lauren MacArthur added a comment - I have indeed isolated the differences to be in the fringe correction stage of ISR.  Running with w_2021_17 , here is the difference between the gen2 vs. gen3 images just after the fringe correction gets applied (noting that they were bitwise identical in image/variance/mask planes just prior):
            Hide
            lauren Lauren MacArthur added a comment -

            I also noticed the following difference in the logs:

            gen2: 
            processCcd.isr.fringe INFO: Fringe solution: [6176.78828984] RMS: 10.035963 Good: 19809/30000 
             
            gen3: 
            isr.fringe INFO: Fringe solution: [6208.54359819] RMS: 10.129147 Good: 19709/30000
            

            Show
            lauren Lauren MacArthur added a comment - I also noticed the following difference in the logs: gen2: processCcd.isr.fringe INFO: Fringe solution: [ 6176.78828984 ] RMS: 10.035963 Good: 19809 / 30000   gen3: isr.fringe INFO: Fringe solution: [ 6208.54359819 ] RMS: 10.129147 Good: 19709 / 30000
            Hide
            lauren Lauren MacArthur added a comment -

            Seems this is stemming from an inconsistent setting of the random seed that gets fed to the generatePositions() function in FringeTask (whose purpose is to "Generate a random distribution of positions for measuring fringe amplitudes.").  I will leave it to the experts to decide if this level of difference is acceptable with a simple random seed change, but I have implemented a fix on the ticket branch to ensure the seed is set to the same expId-based value in both gen2 and gen3.  Indeed, running with this branch, the above comparison is now bitwise-identical:

            and the logs match up:

            gen2 POST-FIX: 
            processCcd.isr.fringe INFO: Fringe solution: [6208.54359819] RMS: 10.129147 Good: 19709/30000 
             
            gen3: 
            isr.fringe INFO: Fringe solution: [6208.54359819] RMS: 10.129147 Good: 19709/30000

            Show
            lauren Lauren MacArthur added a comment - Seems this is stemming from an inconsistent setting of the random seed that gets fed to the generatePositions() function in FringeTask (whose purpose is to "Generate a random distribution of positions for measuring fringe amplitudes.").  I will leave it to the experts to decide if this level of difference is acceptable with a simple random seed change, but I have implemented a fix on the ticket branch to ensure the seed is set to the same expId-based value in both gen2 and gen3.  Indeed, running with this branch, the above comparison is now bitwise-identical: and the logs match up: gen2 POST - FIX: processCcd.isr.fringe INFO: Fringe solution: [ 6208.54359819 ] RMS: 10.129147 Good: 19709 / 30000 gen3: isr.fringe INFO: Fringe solution: [ 6208.54359819 ] RMS: 10.129147 Good: 19709 / 30000
            Hide
            lauren Lauren MacArthur added a comment - - edited

            Jenkins + ci_hsc + ci_cpp_gen3 is running (latter may be overkill...but just in case!)

            Show
            lauren Lauren MacArthur added a comment - - edited Jenkins + ci_hsc + ci_cpp_gen3 is running  (latter may be overkill...but just in case!)
            Hide
            lauren Lauren MacArthur added a comment -

            Sorry to throw another one your way, but would you mind giving this a look when you get a chance?

            Show
            lauren Lauren MacArthur added a comment - Sorry to throw another one your way, but would you mind giving this a look when you get a chance?
            Hide
            lauren Lauren MacArthur added a comment -

            Initial Jenkins revealed that test values needed to be updated in obs_decam...which begs the question if that's the only place that was checking on this task to this level (despite it being the test_crosstalk.py unittest that was affected!)  Let me know if/what you think may need adding to the test_fringes.py unittests in ip_isr (nothing obvious stood out to me as particularly useful, and I suspect you have bigger/better plans that would moot such effort anyhow!)  New Jenkins is running (and is already past obs_decam and other obs packages).

            Show
            lauren Lauren MacArthur added a comment - Initial Jenkins revealed that test values needed to be updated in obs_decam ...which begs the question if that's the only place that was checking on this task to this level (despite it being the test_crosstalk.py unittest that was affected!)  Let me know if/what you think may need adding to the test_fringes.py unittests in ip_isr (nothing obvious stood out to me as particularly useful, and I suspect you have bigger/better plans that would moot such effort anyhow!)  New Jenkins is running (and is already past obs_decam and other obs packages).
            Hide
            czw Christopher Waters added a comment -

            I'm a bit surprised that it looks like gen3 was the case that was doing it correctly, but the changes all make sense to me.

            Show
            czw Christopher Waters added a comment - I'm a bit surprised that it looks like gen3 was the case that was doing it correctly, but the changes all make sense to me.
            Hide
            lauren Lauren MacArthur added a comment -

            Thanks Chris.  Indeed, the gen3 path was the one doing the "right thing".  And, while fixing anything gen2 is not really desirable at this stage, where it's an easy and clear path towards parity, it's still "worth it"!

            I had to rebase, so set off another Jenkins just to be sure. I'll merge if/when that clears.

            Show
            lauren Lauren MacArthur added a comment - Thanks Chris.  Indeed, the gen3 path was the one doing the "right thing".  And, while fixing anything gen2 is not really desirable at this stage, where it's an easy and clear path towards parity, it's still "worth it"! I had to rebase, so set off another Jenkins just to be sure. I'll merge if/when that clears.

              People

              Assignee:
              lauren Lauren MacArthur
              Reporter:
              lauren Lauren MacArthur
              Reviewers:
              Christopher Waters
              Watchers:
              Christopher Waters, Jim Bosch, Lauren MacArthur, Yusra AlSayyad
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.