Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-30812

Compare the data products of the gen2 vs. gen3 w_2021_24 DC2 runs up to Single Frame Processing

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: None
    • Labels:
    • Story Points:
      5
    • Epic Link:
    • Sprint:
      DRP S21b
    • Team:
      Data Release Production
    • Urgent?:
      No

      Description

      Perform a comparison of the w_2021_24 gen2 vs. gen3 middleware processing runs for the DC2/imsim dataset (i.e. those of DM-30674 & DM-30730) analogous to what was done for the HSC RC2 w_2021_22 on DM-30647.

        Attachments

          Issue Links

            Activity

            Hide
            lauren Lauren MacArthur added a comment -

            I've run the script attached to DM-30647 on the gen2 vs. gen3 w_2021_24 runs.  Differences are as follows:

            There are huge differences in the number and subset of {{calexp}}s produced by the two runs which stems from at least two causes:

            • the known issue in gen3 where it is just "missing" many visit/detector combos (speculation is that it only processes those which lie within the specified tract, but there are also "holes/gaps" in the detector coverage that do not seem to be explained by this.  See, e.g. the upper right panels of the plots posted on DM-30747)
            • the shapeHSM issue of DM-30426 is present for both runs, but is particularly bad for gen2 as it brings any singleFrameDriver job to a halt, so there are many, many visit/detectors missing from that rerun.  I will kick off a run on w_2021_25 (i.e. after the fix went in) to make sure no "other" SFM failures occur in gen2 that do not happen in gen3

            For the visit/detector combos with successful processing in both runs:

            Each and every catalog had differences between gen2 & gen3 of the following type:
            Single Frame Processing for DC2 174534 i log:

            compareSfd WARN: ...ccd 0: id Absolute diff : mean: 87845838520320000.000 min: 87845838520320000.000 max 87845838520320000.000
            compareSfd WARN: ...ccd 0: parent Absolute diff : mean: 17126968516921894.000 min: 0.000 max 87845838520320000.000
            compareSfd WARN: ...ccd 0: deblend_peakId id number offset: \{0, 199124} (only printing once per catalog)
            compareSfd WARN: ...ccd 0: Lengths of finite entries differs for gen2 2316 and gen3 2320
            compareSfd WARN: ...ccd 0: ext_shapeHSM_HsmPsfMomentsDebiased_flag Number of differences: 286 of 2662 total
            compareSfd WARN: ...ccd 0: ext_shapeHSM_HsmPsfMomentsDebiased_flag_galsim Number of differences: 298 of 2662 total
            

            And many also had:

            compareSfd WARN: ...ccd 93: ext_shapeHSM_HsmPsfMomentsDebiased_x Absolute diff : mean: 0.324  min: 0.000  max 12.328
            compareSfd WARN: ...ccd 93: ext_shapeHSM_HsmPsfMomentsDebiased_y Absolute diff : mean: 0.325  min: 0.000  max 14.443
            compareSfd WARN: ...ccd 93: ext_shapeHSM_HsmPsfMomentsDebiased_xx Absolute diff : mean: 1.580  min: 0.000  max 93.294
            compareSfd WARN: ...ccd 93: ext_shapeHSM_HsmPsfMomentsDebiased_yy Absolute diff : mean: 1.444  min: 0.000  max 24.491
            compareSfd WARN: ...ccd 93: ext_shapeHSM_HsmPsfMomentsDebiased_xy Absolute diff : mean: 1.033  min: 0.000  max 33.809
            

            (I do realize the absolute metric here is not the most useful...but there was some other column value for which it was and I failed to adapt based on column as I loop through them all.  I'll change this on future runs, but these take days to run, so I'll just leave these as revealing "a difference" between gen2 & gen3 measurements.)

            Other than that, everything looks identical.

            So one question is: do we care about the parent and id differences?

            Another is how to go about getting to the bottom of the gen2 vs. gen3 shapeHSM differences (paging Joshua Meyers on this one!)?

            Show
            lauren Lauren MacArthur added a comment - I've run the script attached to DM-30647 on the gen2 vs. gen3 w_2021_24 runs.  Differences are as follows: There are huge differences in the number and subset of {{calexp}}s produced by the two runs which stems from at least two causes: the known issue in gen3 where it is just "missing" many visit/detector combos (speculation is that it only processes those which lie within the specified tract, but there are also "holes/gaps" in the detector coverage that do not seem to be explained by this.  See, e.g. the upper right panels of the plots posted on DM-30747 ) the shapeHSM  issue of DM-30426 is present for both runs, but is particularly bad for gen2 as it brings any singleFrameDriver job to a halt, so there are many, many visit/detectors missing from that rerun.  I will kick off a run on w_2021_25 (i.e. after the fix went in) to make sure no "other" SFM failures occur in gen2 that do not happen in gen3 For the visit/detector combos with successful processing in both runs: Each and every catalog had differences between gen2 & gen3 of the following type: Single Frame Processing for DC2 174534 i log: compareSfd WARN: ...ccd 0 : id Absolute diff : mean: 87845838520320000.000 min : 87845838520320000.000 max 87845838520320000.000 compareSfd WARN: ...ccd 0 : parent Absolute diff : mean: 17126968516921894.000 min : 0.000 max 87845838520320000.000 compareSfd WARN: ...ccd 0 : deblend_peakId id number offset: \{ 0 , 199124 } (only printing once per catalog) compareSfd WARN: ...ccd 0 : Lengths of finite entries differs for gen2 2316 and gen3 2320 compareSfd WARN: ...ccd 0 : ext_shapeHSM_HsmPsfMomentsDebiased_flag Number of differences: 286 of 2662 total compareSfd WARN: ...ccd 0 : ext_shapeHSM_HsmPsfMomentsDebiased_flag_galsim Number of differences: 298 of 2662 total And many also had: compareSfd WARN: ...ccd 93 : ext_shapeHSM_HsmPsfMomentsDebiased_x Absolute diff : mean: 0.324 min : 0.000 max 12.328 compareSfd WARN: ...ccd 93 : ext_shapeHSM_HsmPsfMomentsDebiased_y Absolute diff : mean: 0.325 min : 0.000 max 14.443 compareSfd WARN: ...ccd 93 : ext_shapeHSM_HsmPsfMomentsDebiased_xx Absolute diff : mean: 1.580 min : 0.000 max 93.294 compareSfd WARN: ...ccd 93 : ext_shapeHSM_HsmPsfMomentsDebiased_yy Absolute diff : mean: 1.444 min : 0.000 max 24.491 compareSfd WARN: ...ccd 93 : ext_shapeHSM_HsmPsfMomentsDebiased_xy Absolute diff : mean: 1.033 min : 0.000 max 33.809 (I do realize the absolute metric here is not the most useful...but there was some other column value for which it was and I failed to adapt based on column as I loop through them all.  I'll change this on future runs, but these take days to run, so I'll just leave these as revealing "a difference" between gen2 & gen3 measurements.) Other than that, everything looks identical. So one question is: do we care about the parent and id differences? Another is how to go about getting to the bottom of the gen2 vs. gen3 shapeHSM differences (paging Joshua Meyers on this one!)?
            Hide
            jmeyers314 Joshua Meyers added a comment - - edited

            I suspect the shapeHSM differences may be due to the id differences.  The random seed used in the debiasing is based on the id: https://github.com/lsst/meas_extensions_shapeHSM/blob/master/src/HsmMoments.cc#L252.

            Show
            jmeyers314 Joshua Meyers added a comment - - edited I suspect the shapeHSM  differences may be due to the id differences.  The random seed used in the debiasing is based on the id:  https://github.com/lsst/meas_extensions_shapeHSM/blob/master/src/HsmMoments.cc#L252 .
            Hide
            lauren Lauren MacArthur added a comment -

            Ah ha...that’d do it.  Thanks, Josh!  I’ll see if I can get the ids synced up before kicking off another run.

            Show
            lauren Lauren MacArthur added a comment - Ah ha...that’d do it.  Thanks, Josh!  I’ll see if I can get the ids synced up before kicking off another run.
            Hide
            lauren Lauren MacArthur added a comment - - edited

            As noted on DM-30815, the fix there synced up the ids and did indeed prove to be the root cause of the shapeHSM differences. I reran singleFrameDriver on the entire DC2 list of visits with w_2021_25 the DM-30815 branch and now have many more calexps passing (most visits got the full set of 189, but there were a few cases that fell a bit short...I'll look into those next).

            There has been a lot of work on some really bad WCS coming out of SFM for DC2 data (see DM-30466 for details of the problem and DM-30490 for a configurable option that "fixes" most cases). The bad astrometry uncovered thus far were part of DP0.1 analyses. Since it is of interest whether we have any of these bad apples in our regularly reprocessed DC2 dataset, I did a grep of the logs to identify any cases (not for the faint of heart!) and provide a list of the 4 worst offenders here (Eli Rykoff: might be of interest to you?). Note that none of these currently show up in the gen3 DC2 runs issues because of the incomplete transfer/tract-based selection of gen3 discussed, e.g., here).

            dataId = {'visit': 193111, 'run': '193111', 'raftName': 'R34', 'expId': 193111, 'detectorName': 'S02', 'detector': 155}
            Matched and fit WCS in 3 iterations; found 61 scatter = 1.235 +- 1.014 arcsec
             
            dataId = {'visit': 456690, 'run': '456690', 'raftName': 'R41', 'expId': 456690, 'detectorName': 'S20', 'detector': 168}
            Matched and fit WCS in 2 iterations; found 69 matches with scatter = 10.474 +- 9.280 arcsec
            [** so of course this is the only one I can't seem to find in any collection in /repo/dc2!!]
             
            dataId = {'visit': 263501, 'run': '263501', 'raftName': 'R42', 'expId': 263501, 'detectorName': 'S22', 'detector': 179}
            Matched and fit WCS in 3 iterations; found 47 matches with scatter = 2.598 +- 1.650 arcsec
             
            dataId = {'visit': 421725, 'run': '421725', 'raftName': 'R02', 'expId': 421725, 'detectorName': 'S12', 'detector': 14}
            Matched and fit WCS in 3 iterations; found 111 matches with scatter = 7.327 +- 6.184 arcsec
            

            ** e.g.

            $ butler query-datasets /repo/dc2 "raw" --where "instrument='LSSTCam-imSim' AND visit=456690 AND detector in (165..170) AND skymap='DC2'"
            py.warnings WARN: /software/lsstsw/stack_20210520/stack/miniconda3-py38_4.9.2-0.6.0/Linux64/daf_butler/21.0.0-103-g0fc66519+7e5b4c34a6/python/lsst/daf/butler/registry/interfaces/_database.py:1559: SAWarning: SELECT statement has a cartesian product between FROM element(s) "dc2_20210215.skymap" and FROM element "dc2_20210215.exposure".  Apply join condition(s) between each element to resolve.
              return self._connection.execute(sql, *args, **kwds)
             
             
            type     run                       id                  band   instrument  detector physical_filter exposure
            ---- ------------ ------------------------------------ ---- ------------- -------- --------------- --------
             raw 2.2i/raw/all d5315944-a59e-59d8-af91-677113a5ca62    r LSSTCam-imSim      165       r_sim_1.4   456690
             raw 2.2i/raw/all 1dcc42c9-f4b7-5da7-a298-f08c5cf7d041    r LSSTCam-imSim      169       r_sim_1.4   456690
             raw 2.2i/raw/all 06384d23-e99e-5b0c-9f59-b7ebf3ba6cec    r LSSTCam-imSim      170       r_sim_1.4   456690
            

            (so 166 & 167 are also missing...)

             

            The next 8 worst offenders include scatters of (I haven't dug out the ids of these yet...but would be happy to on request!):

            Matched and fit WCS in 3 iterations; found 81 matches with scatter = 0.824 +- 0.397 arcsec
            Matched and fit WCS in 1 iterations; found 81 matches with scatter = 0.558 +- 0.373 arcsec
            Matched and fit WCS in 3 iterations; found 60 matches with scatter = 0.480 +- 0.346 arcsec
            Matched and fit WCS in 3 iterations; found 68 matches with scatter = 0.471 +- 0.434 arcsec
            Matched and fit WCS in 1 iterations; found 72 matches with scatter = 0.337 +- 0.260 arcsec
            Matched and fit WCS in 1 iterations; found 56 matches with scatter = 0.319 +- 0.230 arcsec
            Matched and fit WCS in 1 iterations; found 65 matches with scatter = 0.276 +- 0.138 arcsec
            Matched and fit WCS in 1 iterations; found 68 matches with scatter = 0.190 +- 0.104 arcsec
            

            Show
            lauren Lauren MacArthur added a comment - - edited As noted on DM-30815 , the fix there synced up the ids and did indeed prove to be the root cause of the shapeHSM differences. I reran singleFrameDriver on the entire DC2 list of visits with w_2021_25 the DM-30815 branch and now have many more calexps passing (most visits got the full set of 189, but there were a few cases that fell a bit short...I'll look into those next). There has been a lot of work on some really bad WCS coming out of SFM for DC2 data (see DM-30466 for details of the problem and DM-30490 for a configurable option that "fixes" most cases). The bad astrometry uncovered thus far were part of DP0.1 analyses. Since it is of interest whether we have any of these bad apples in our regularly reprocessed DC2 dataset, I did a grep of the logs to identify any cases (not for the faint of heart!) and provide a list of the 4 worst offenders here ( Eli Rykoff : might be of interest to you?). Note that none of these currently show up in the gen3 DC2 runs issues because of the incomplete transfer/tract-based selection of gen3 discussed, e.g., here ). dataId = { 'visit' : 193111 , 'run' : '193111' , 'raftName' : 'R34' , 'expId' : 193111 , 'detectorName' : 'S02' , 'detector' : 155 } Matched and fit WCS in 3 iterations; found 61 scatter = 1.235 + - 1.014 arcsec   dataId = { 'visit' : 456690 , 'run' : '456690' , 'raftName' : 'R41' , 'expId' : 456690 , 'detectorName' : 'S20' , 'detector' : 168 } Matched and fit WCS in 2 iterations; found 69 matches with scatter = 10.474 + - 9.280 arcsec [ * * so of course this is the only one I can't seem to find in any collection in / repo / dc2!!]   dataId = { 'visit' : 263501 , 'run' : '263501' , 'raftName' : 'R42' , 'expId' : 263501 , 'detectorName' : 'S22' , 'detector' : 179 } Matched and fit WCS in 3 iterations; found 47 matches with scatter = 2.598 + - 1.650 arcsec   dataId = { 'visit' : 421725 , 'run' : '421725' , 'raftName' : 'R02' , 'expId' : 421725 , 'detectorName' : 'S12' , 'detector' : 14 } Matched and fit WCS in 3 iterations; found 111 matches with scatter = 7.327 + - 6.184 arcsec ** e.g. $ butler query - datasets / repo / dc2 "raw" - - where "instrument='LSSTCam-imSim' AND visit=456690 AND detector in (165..170) AND skymap='DC2'" py.warnings WARN: / software / lsstsw / stack_20210520 / stack / miniconda3 - py38_4. 9.2 - 0.6 . 0 / Linux64 / daf_butler / 21.0 . 0 - 103 - g0fc66519 + 7e5b4c34a6 / python / lsst / daf / butler / registry / interfaces / _database.py: 1559 : SAWarning: SELECT statement has a cartesian product between FROM element(s) "dc2_20210215.skymap" and FROM element "dc2_20210215.exposure" . Apply join condition(s) between each element to resolve. return self ._connection.execute(sql, * args, * * kwds)     type run id band instrument detector physical_filter exposure - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - raw 2.2i / raw / all d5315944 - a59e - 59d8 - af91 - 677113a5ca62 r LSSTCam - imSim 165 r_sim_1. 4 456690 raw 2.2i / raw / all 1dcc42c9 - f4b7 - 5da7 - a298 - f08c5cf7d041 r LSSTCam - imSim 169 r_sim_1. 4 456690 raw 2.2i / raw / all 06384d23 - e99e - 5b0c - 9f59 - b7ebf3ba6cec r LSSTCam - imSim 170 r_sim_1. 4 456690 (so 166 & 167 are also missing...)   The next 8 worst offenders include scatters of (I haven't dug out the ids of these yet...but would be happy to on request!): Matched and fit WCS in 3 iterations; found 81 matches with scatter = 0.824 + - 0.397 arcsec Matched and fit WCS in 1 iterations; found 81 matches with scatter = 0.558 + - 0.373 arcsec Matched and fit WCS in 3 iterations; found 60 matches with scatter = 0.480 + - 0.346 arcsec Matched and fit WCS in 3 iterations; found 68 matches with scatter = 0.471 + - 0.434 arcsec Matched and fit WCS in 1 iterations; found 72 matches with scatter = 0.337 + - 0.260 arcsec Matched and fit WCS in 1 iterations; found 56 matches with scatter = 0.319 + - 0.230 arcsec Matched and fit WCS in 1 iterations; found 65 matches with scatter = 0.276 + - 0.138 arcsec Matched and fit WCS in 1 iterations; found 68 matches with scatter = 0.190 + - 0.104 arcsec
            Hide
            erykoff Eli Rykoff added a comment -

            Lauren MacArthur I took a look at 3 of the 4 worst offenders (the one criminal mastermind, of course, is not in the gen3 repo so I can't easily do anything with it). And it's good news! If I turn on doMagnitudeOutlierRejection=True then they look a lot better (nstars is the number of matched stars to compute the photometric zp):

            Visit/Detector Original offset Original nstars New Offset New nstars
            193111/1555 1.235+/-1.014 11 0.007+/-0.004 130
            263501/179 2.598+-1.65 2 0.007+/-0.003 101
            421725/14 7.327+/-6.18 2 0.004+/-0.002 121
            Show
            erykoff Eli Rykoff added a comment - Lauren MacArthur I took a look at 3 of the 4 worst offenders (the one criminal mastermind, of course, is not in the gen3 repo so I can't easily do anything with it). And it's good news! If I turn on doMagnitudeOutlierRejection=True then they look a lot better (nstars is the number of matched stars to compute the photometric zp): Visit/Detector Original offset Original nstars New Offset New nstars 193111/1555 1.235+/-1.014 11 0.007+/-0.004 130 263501/179 2.598+-1.65 2 0.007+/-0.003 101 421725/14 7.327+/-6.18 2 0.004+/-0.002 121
            Hide
            lauren Lauren MacArthur added a comment -

            Good news indeed (and wow...only 2 stars for the zp est.!)  I may have a look at offender #1 in gen2 land...if it turns out to be another case of "hot mess", it may be worth the deeper look.

            Show
            lauren Lauren MacArthur added a comment - Good news indeed (and wow...only 2 stars for the zp est.!)  I may have a look at offender #1 in gen2 land...if it turns out to be another case of "hot mess", it may be worth the deeper look.
            Hide
            lauren Lauren MacArthur added a comment -

            More good news...offender #1 is also fixed by running with calibrate.astrometry.doMagnitudeOutlierRejection=True:

            --id visit=456690 filter="r" detector=168
             
            processCcd.calibrate.astrometry.matcher INFO: Matched 58 sources
            processCcd.calibrate.astrometry INFO: Rough zeropoint from astrometry matches is 32.1714 +/- 0.0087.
            processCcd.calibrate.astrometry INFO: Removed 13 magnitude outliers out of 58 total astrometry matches.
            processCcd.calibrate.astrometry INFO: Fit WCS iter 2 failed; using previous iteration: Unable to match sources
            processCcd.calibrate.astrometry INFO: Matched and fit WCS in 1 iterations; found 45 matches with scatter = 0.006 +- 0.003 arcsec
            

            I'm going to run the full visit with and without the outlier rejection to see if anything else changes. We may have to do some more thorough testing (e.g. I would be happy to launch a full set of the DC2 singleFrameDriver jobs...but not until after the maintenance on Thurs!), but I'm very close to saying this should be a recommended change to the imsim config overrides.

            Show
            lauren Lauren MacArthur added a comment - More good news...offender #1 is also fixed by running with calibrate.astrometry.doMagnitudeOutlierRejection=True : - - id visit = 456690 filter = "r" detector = 168   processCcd.calibrate.astrometry.matcher INFO: Matched 58 sources processCcd.calibrate.astrometry INFO: Rough zeropoint from astrometry matches is 32.1714 + / - 0.0087 . processCcd.calibrate.astrometry INFO: Removed 13 magnitude outliers out of 58 total astrometry matches. processCcd.calibrate.astrometry INFO: Fit WCS iter 2 failed; using previous iteration: Unable to match sources processCcd.calibrate.astrometry INFO: Matched and fit WCS in 1 iterations; found 45 matches with scatter = 0.006 + - 0.003 arcsec I'm going to run the full visit with and without the outlier rejection to see if anything else changes. We may have to do some more thorough testing (e.g. I would be happy to launch a full set of the DC2 singleFrameDriver jobs...but not until after the maintenance on Thurs!), but I'm very close to saying this should be a recommended change to the imsim config overrides.
            Hide
            lauren Lauren MacArthur added a comment - - edited

            So, there are differences, but I'd say all around it looks like we are indeed getting a much cleaner sample of reference matches with calibrate.astrometry.doMagnitudeOutlierRejection=True. For example, the following are plots of the full visit ref-src Delta(RA)*cos(Dec) for the sources used in the astrometric fit:

            Without the config override set (note the missing detector 168 because the bad WCS fit resulted in a failure to find matches in photoCal):

            With the calibrate.astrometry.doMagnitudeOutlierRejection=True override: 

            I've also attached the sky versions of these plots which can be blinked to see where differences in the selections occur.

            Show
            lauren Lauren MacArthur added a comment - - edited So, there are differences, but I'd say all around it looks like we are indeed getting a much cleaner sample of reference matches with  calibrate.astrometry.doMagnitudeOutlierRejection=True . For example, the following are plots of the full visit ref-src Delta(RA)*cos(Dec) for the sources used in the astrometric fit: Without the config override set (note the missing detector 168 because the bad WCS fit resulted in a failure to find matches in photoCal): With the calibrate.astrometry.doMagnitudeOutlierRejection=True override:   I've also attached the sky versions of these plots which can be blinked to see where differences in the selections occur.
            Hide
            jchiang James Chiang added a comment - - edited

            I was discussing Eli's fix in DM-30490 with him this morning and showed him a comparison of reported astrometric scatter for some DC2 data (5 years of WFD visits covering the DDF region) using w_2021_22 and w_2021_25 with doMagnitudeOutlierRejection=True.  These data include ~20k ccd-visits.  Here are the histrograms of the scatter values without (w_2021_22) and with (w_2021_25) applying the fix:

            Two of the three remaining ccd-visits with log10(scatter/arcsec) > -1.5 are images where only lensed galaxies, AGNs, and SNe were rendered.  We had a small bug in our image simulations where the normal galaxies and stars were not included in a very small handful of images for these data.  The fields are almost entirely blank aside from a few hundred objects, yet the astrometry still "solved".   The single remaining ccd-visit at -0.5 is a normal-looking field where the scatter went from 8 arcsec to 0.345 arcsec.   Notably, the none of the reported astrometric scatter values worsened by more than 0.001 arcsec after applying the fix.

            Show
            jchiang James Chiang added a comment - - edited I was discussing Eli's fix in DM-30490 with him this morning and showed him a comparison of reported astrometric scatter for some DC2 data (5 years of WFD visits covering the DDF region) using w_2021_22 and w_2021_25 with doMagnitudeOutlierRejection=True.  These data include ~20k ccd-visits.  Here are the histrograms of the scatter values without (w_2021_22) and with (w_2021_25) applying the fix: Two of the three remaining ccd-visits with log10(scatter/arcsec) > -1.5 are images where only lensed galaxies, AGNs, and SNe were rendered.  We had a small bug in our image simulations where the normal galaxies and stars were not included in a very small handful of images for these data.  The fields are almost entirely blank aside from a few hundred objects, yet the astrometry still "solved".   The single remaining ccd-visit at -0.5 is a normal-looking field where the scatter went from 8 arcsec to 0.345 arcsec.   Notably, the none of the reported astrometric scatter values worsened by more than 0.001 arcsec after applying the fix.
            Hide
            lauren Lauren MacArthur added a comment -

            Awesome! So setting that config override along with a super relaxed (but smaller than current default of 10) to astrometry.wcsFitter.maxScatterArcsec should keep all the good (doing no harm), improve and recover the bad, and leave out the junk (mis-simulated) frames going into the coadds

            Show
            lauren Lauren MacArthur added a comment - Awesome! So setting that config override along with a super relaxed (but smaller than current default of 10) to astrometry.wcsFitter.maxScatterArcsec should keep all the good (doing no harm), improve and recover the bad, and leave out the junk (mis-simulated) frames going into the coadds
            Hide
            lauren Lauren MacArthur added a comment - - edited

            I have rerun my scripts looking for parity between the gen3 & gen2 SFM outputs using this new run (/datasets/DC2/repoRun2.2i/rerun/w_2021_25/DM-30812).  We are SOOOOO CLOSE, but I have finally encountered some examples of the case of incomplete reference catalog loading due to the 0 padding for the visit definition of this repo (see, e.g. DM-30030 and this community post for details).  To illustrate, the following shows the full loaded reference sample (silver circles), selected (i.e. trimmed and passing the reference source selector criterion) reference sample (orange x's) and sources actually used in the astrometric fit (stars) for a given case (visit 193888, detector=126):

            Gen3:

            and a zoom in:

            So, you can see that this detector lines up pretty closely with an edge of this shard and ends up missing out on some of the reference sources that would (should) be included with the 250 pixel padding to the raw WCS when doing the ref cat trimming. The following is the gen2 version:

            Note that, for gen2, the selected ref sample had 283 objects, whereas gen3 had only 268. Even so, the source matches that got included in the astrometric fit is actually identical in both cases, so the astrometry is only just barely affected here (but my parity testing is sensitive enough to pick this up). Given that I'm seeing 4 cases of this in just the DC2 dataset (and only a very incomplete one at that as I can only compare the detectors that actually got ingested into the /repo/dc2 repo), this situation is perhaps less rare than we had anticipated/hoped, so updating the visit definition is certainly something to consider (although the partial ingest issues are definitely more urgent...and resolving that will likely result in the visit definition update by default?!)

            All four cases here only just barely affect the SFM WCS, so I would have comfortably gone on to the coadd parity comparisons for DC2...but this is not feasible in our current situation of very different visit/detector inputs from gen2 & gen3 repos.

            The "good" news is that, as of w_2021_25 and the updated BF kernels for the gen2 repo (DM-30738), and modulo the above and the pesky (but likely insignificant) deblend_peakId offsets, we now seem to be at gen2 vs. gen3 parity for all visit/detector combos of that DC2 dataset that have in common in both the gen2 & gen3 repos.
             

            Show
            lauren Lauren MacArthur added a comment - - edited I have rerun my scripts looking for parity between the gen3 & gen2 SFM outputs using this new run ( /datasets/DC2/repoRun2.2i/rerun/w_2021_25/ DM-30812 ) .  We are SOOOOO CLOSE, but I have finally encountered some examples of the case of incomplete reference catalog loading due to the 0 padding for the visit definition of this repo (see, e.g. DM-30030 and this community post  for details).  To illustrate, the following shows the full loaded reference sample (silver circles), selected (i.e. trimmed and passing the reference source selector criterion) reference sample (orange x's) and sources actually used in the astrometric fit (stars) for a given case (visit 193888, detector=126): Gen3: and a zoom in: So, you can see that this detector lines up pretty closely with an edge of this shard and ends up missing out on some of the reference sources that would (should) be included with the 250 pixel padding to the raw WCS when doing the ref cat trimming. The following is the gen2 version: Note that, for gen2, the selected ref sample had 283 objects, whereas gen3 had only 268. Even so, the source matches that got included in the astrometric fit is actually identical in both cases, so the astrometry is only just barely affected here (but my parity testing is sensitive enough to pick this up). Given that I'm seeing 4 cases of this in just the DC2 dataset (and only a very incomplete one at that as I can only compare the detectors that actually got ingested into the /repo/dc2 repo), this situation is perhaps less rare than we had anticipated/hoped, so updating the visit definition is certainly something to consider (although the partial ingest issues are definitely more urgent...and resolving that will likely result in the visit definition update by default?!) All four cases here only just barely affect the SFM WCS, so I would have comfortably gone on to the coadd parity comparisons for DC2...but this is not feasible in our current situation of very different visit/detector inputs from gen2 & gen3 repos. The "good" news is that, as of w_2021_25 and the updated BF kernels for the gen2 repo ( DM-30738 ), and modulo the above and the pesky (but likely insignificant) deblend_peakId offsets, we now seem to be at gen2 vs. gen3 parity for all visit/detector combos of that DC2 dataset that have in common in both the gen2 & gen3 repos.  
            Hide
            lauren Lauren MacArthur added a comment -

            Would you mind giving this a look and letting me know if it is ready for sign-off?  I am particularly interested in your thoughts on how to move on to the coadd comparisons given our gen3 repos ingest "issues".

            Show
            lauren Lauren MacArthur added a comment - Would you mind giving this a look and letting me know if it is ready for sign-off?  I am particularly interested in your thoughts on how to move on to the coadd comparisons given our gen3 repos ingest "issues".
            Hide
            jbosch Jim Bosch added a comment -

            I think it may just make sense to focus Gen2/3 parity investigation on HSC, and only worry about looking at DC2 (Gen3 especially) in an absolute sense. I think I have set things in motion to address the missing raws, but I don't know when that will actually complete.

            But yes, ready for sign-off - and a reminder that I should go patch the visit padding, now that DM-30866 has landed with the functionality for doing that.

            Show
            jbosch Jim Bosch added a comment - I think it may just make sense to focus Gen2/3 parity investigation on HSC, and only worry about looking at DC2 (Gen3 especially) in an absolute sense. I think I have set things in motion to address the missing raws, but I don't know when that will actually complete. But yes, ready for sign-off - and a reminder that I should go patch the visit padding, now that DM-30866 has landed with the functionality for doing that.
            Hide
            lauren Lauren MacArthur added a comment -

            Thanks Jim.  Yeah...I'm still holding out hope for the raws situation to get sorted on time for the next processing (but no pressure!!)  DC2 is our only "natural" path to looking at gen2 vs gen3 coadds without external calibrations (for which we aren't yet at parity...and different/unpredictable input ordering in the gen3 bps vs. gen2 slurm runs may preclude exact parity).

            Show
            lauren Lauren MacArthur added a comment - Thanks Jim.  Yeah...I'm still holding out hope for the raws situation to get sorted on time for the next processing (but no pressure!!)  DC2 is our only "natural" path to looking at gen2 vs gen3 coadds without external calibrations (for which we aren't yet at parity...and different/unpredictable input ordering in the gen3 bps vs. gen2 slurm runs may preclude exact parity).

              People

              Assignee:
              lauren Lauren MacArthur
              Reporter:
              lauren Lauren MacArthur
              Reviewers:
              Jim Bosch
              Watchers:
              Eli Rykoff, James Chiang, Jim Bosch, Joshua Meyers, Lauren MacArthur, Yusra AlSayyad
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.