Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-30812

Compare the data products of the gen2 vs. gen3 w_2021_24 DC2 runs up to Single Frame Processing

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: None
    • Labels:
    • Story Points:
      5
    • Epic Link:
    • Sprint:
      DRP S21b
    • Team:
      Data Release Production
    • Urgent?:
      No

      Description

      Perform a comparison of the w_2021_24 gen2 vs. gen3 middleware processing runs for the DC2/imsim dataset (i.e. those of DM-30674 & DM-30730) analogous to what was done for the HSC RC2 w_2021_22 on DM-30647.

        Attachments

          Issue Links

            Activity

            Hide
            lauren Lauren MacArthur added a comment -

            Awesome! So setting that config override along with a super relaxed (but smaller than current default of 10) to astrometry.wcsFitter.maxScatterArcsec should keep all the good (doing no harm), improve and recover the bad, and leave out the junk (mis-simulated) frames going into the coadds

            Show
            lauren Lauren MacArthur added a comment - Awesome! So setting that config override along with a super relaxed (but smaller than current default of 10) to astrometry.wcsFitter.maxScatterArcsec should keep all the good (doing no harm), improve and recover the bad, and leave out the junk (mis-simulated) frames going into the coadds
            Hide
            lauren Lauren MacArthur added a comment - - edited

            I have rerun my scripts looking for parity between the gen3 & gen2 SFM outputs using this new run (/datasets/DC2/repoRun2.2i/rerun/w_2021_25/DM-30812).  We are SOOOOO CLOSE, but I have finally encountered some examples of the case of incomplete reference catalog loading due to the 0 padding for the visit definition of this repo (see, e.g. DM-30030 and this community post for details).  To illustrate, the following shows the full loaded reference sample (silver circles), selected (i.e. trimmed and passing the reference source selector criterion) reference sample (orange x's) and sources actually used in the astrometric fit (stars) for a given case (visit 193888, detector=126):

            Gen3:

            and a zoom in:

            So, you can see that this detector lines up pretty closely with an edge of this shard and ends up missing out on some of the reference sources that would (should) be included with the 250 pixel padding to the raw WCS when doing the ref cat trimming. The following is the gen2 version:

            Note that, for gen2, the selected ref sample had 283 objects, whereas gen3 had only 268. Even so, the source matches that got included in the astrometric fit is actually identical in both cases, so the astrometry is only just barely affected here (but my parity testing is sensitive enough to pick this up). Given that I'm seeing 4 cases of this in just the DC2 dataset (and only a very incomplete one at that as I can only compare the detectors that actually got ingested into the /repo/dc2 repo), this situation is perhaps less rare than we had anticipated/hoped, so updating the visit definition is certainly something to consider (although the partial ingest issues are definitely more urgent...and resolving that will likely result in the visit definition update by default?!)

            All four cases here only just barely affect the SFM WCS, so I would have comfortably gone on to the coadd parity comparisons for DC2...but this is not feasible in our current situation of very different visit/detector inputs from gen2 & gen3 repos.

            The "good" news is that, as of w_2021_25 and the updated BF kernels for the gen2 repo (DM-30738), and modulo the above and the pesky (but likely insignificant) deblend_peakId offsets, we now seem to be at gen2 vs. gen3 parity for all visit/detector combos of that DC2 dataset that have in common in both the gen2 & gen3 repos.
             

            Show
            lauren Lauren MacArthur added a comment - - edited I have rerun my scripts looking for parity between the gen3 & gen2 SFM outputs using this new run ( /datasets/DC2/repoRun2.2i/rerun/w_2021_25/ DM-30812 ) .  We are SOOOOO CLOSE, but I have finally encountered some examples of the case of incomplete reference catalog loading due to the 0 padding for the visit definition of this repo (see, e.g. DM-30030 and this community post  for details).  To illustrate, the following shows the full loaded reference sample (silver circles), selected (i.e. trimmed and passing the reference source selector criterion) reference sample (orange x's) and sources actually used in the astrometric fit (stars) for a given case (visit 193888, detector=126): Gen3: and a zoom in: So, you can see that this detector lines up pretty closely with an edge of this shard and ends up missing out on some of the reference sources that would (should) be included with the 250 pixel padding to the raw WCS when doing the ref cat trimming. The following is the gen2 version: Note that, for gen2, the selected ref sample had 283 objects, whereas gen3 had only 268. Even so, the source matches that got included in the astrometric fit is actually identical in both cases, so the astrometry is only just barely affected here (but my parity testing is sensitive enough to pick this up). Given that I'm seeing 4 cases of this in just the DC2 dataset (and only a very incomplete one at that as I can only compare the detectors that actually got ingested into the /repo/dc2 repo), this situation is perhaps less rare than we had anticipated/hoped, so updating the visit definition is certainly something to consider (although the partial ingest issues are definitely more urgent...and resolving that will likely result in the visit definition update by default?!) All four cases here only just barely affect the SFM WCS, so I would have comfortably gone on to the coadd parity comparisons for DC2...but this is not feasible in our current situation of very different visit/detector inputs from gen2 & gen3 repos. The "good" news is that, as of w_2021_25 and the updated BF kernels for the gen2 repo ( DM-30738 ), and modulo the above and the pesky (but likely insignificant) deblend_peakId offsets, we now seem to be at gen2 vs. gen3 parity for all visit/detector combos of that DC2 dataset that have in common in both the gen2 & gen3 repos.  
            Hide
            lauren Lauren MacArthur added a comment -

            Would you mind giving this a look and letting me know if it is ready for sign-off?  I am particularly interested in your thoughts on how to move on to the coadd comparisons given our gen3 repos ingest "issues".

            Show
            lauren Lauren MacArthur added a comment - Would you mind giving this a look and letting me know if it is ready for sign-off?  I am particularly interested in your thoughts on how to move on to the coadd comparisons given our gen3 repos ingest "issues".
            Hide
            jbosch Jim Bosch added a comment -

            I think it may just make sense to focus Gen2/3 parity investigation on HSC, and only worry about looking at DC2 (Gen3 especially) in an absolute sense. I think I have set things in motion to address the missing raws, but I don't know when that will actually complete.

            But yes, ready for sign-off - and a reminder that I should go patch the visit padding, now that DM-30866 has landed with the functionality for doing that.

            Show
            jbosch Jim Bosch added a comment - I think it may just make sense to focus Gen2/3 parity investigation on HSC, and only worry about looking at DC2 (Gen3 especially) in an absolute sense. I think I have set things in motion to address the missing raws, but I don't know when that will actually complete. But yes, ready for sign-off - and a reminder that I should go patch the visit padding, now that DM-30866 has landed with the functionality for doing that.
            Hide
            lauren Lauren MacArthur added a comment -

            Thanks Jim.  Yeah...I'm still holding out hope for the raws situation to get sorted on time for the next processing (but no pressure!!)  DC2 is our only "natural" path to looking at gen2 vs gen3 coadds without external calibrations (for which we aren't yet at parity...and different/unpredictable input ordering in the gen3 bps vs. gen2 slurm runs may preclude exact parity).

            Show
            lauren Lauren MacArthur added a comment - Thanks Jim.  Yeah...I'm still holding out hope for the raws situation to get sorted on time for the next processing (but no pressure!!)  DC2 is our only "natural" path to looking at gen2 vs gen3 coadds without external calibrations (for which we aren't yet at parity...and different/unpredictable input ordering in the gen3 bps vs. gen2 slurm runs may preclude exact parity).

              People

              Assignee:
              lauren Lauren MacArthur
              Reporter:
              lauren Lauren MacArthur
              Reviewers:
              Jim Bosch
              Watchers:
              Eli Rykoff, James Chiang, Jim Bosch, Joshua Meyers, Lauren MacArthur, Yusra AlSayyad
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.