Details
-
Type:
Story
-
Status: Done
-
Resolution: Done
-
Fix Version/s: None
-
Component/s: None
-
Labels:
-
Story Points:5
-
Epic Link:
-
Sprint:DRP S21b
-
Team:Data Release Production
-
Urgent?:No
Description
Attachments
Attachments
Issue Links
- relates to
-
DM-30815 Update expBits used in gen2 bulter to match value computed for gen3
- Done
-
DM-30943 Turn on delta mag rejection for astrometry matching in configs for LSSTCam-imSim
- Done
-
DM-31067 DC2 Reprocessing with w_2021_28 (gen2)
- Done
-
DM-30647 Compare the data products of the gen2 vs. gen3 w_2021_22 RC2 runs up to Single Frame Processing
- Done
-
DM-30674 Gen3 DC2 reprocessing with bps and w_2021_24 stack
- Done
-
DM-30730 DC2 Reprocessing with w_2021_24 (gen2)
- Done
Activity
I suspect the shapeHSM differences may be due to the id differences. The random seed used in the debiasing is based on the id: https://github.com/lsst/meas_extensions_shapeHSM/blob/master/src/HsmMoments.cc#L252.
Ah ha...that’d do it. Thanks, Josh! I’ll see if I can get the ids synced up before kicking off another run.
As noted on DM-30815, the fix there synced up the ids and did indeed prove to be the root cause of the shapeHSM differences. I reran singleFrameDriver on the entire DC2 list of visits with w_2021_25 the DM-30815 branch and now have many more calexps passing (most visits got the full set of 189, but there were a few cases that fell a bit short...I'll look into those next).
There has been a lot of work on some really bad WCS coming out of SFM for DC2 data (see DM-30466 for details of the problem and DM-30490 for a configurable option that "fixes" most cases). The bad astrometry uncovered thus far were part of DP0.1 analyses. Since it is of interest whether we have any of these bad apples in our regularly reprocessed DC2 dataset, I did a grep of the logs to identify any cases (not for the faint of heart!) and provide a list of the 4 worst offenders here (Eli Rykoff: might be of interest to you?). Note that none of these currently show up in the gen3 DC2 runs issues because of the incomplete transfer/tract-based selection of gen3 discussed, e.g., here).
dataId = {'visit': 193111, 'run': '193111', 'raftName': 'R34', 'expId': 193111, 'detectorName': 'S02', 'detector': 155} |
Matched and fit WCS in 3 iterations; found 61 scatter = 1.235 +- 1.014 arcsec |
|
dataId = {'visit': 456690, 'run': '456690', 'raftName': 'R41', 'expId': 456690, 'detectorName': 'S20', 'detector': 168} |
Matched and fit WCS in 2 iterations; found 69 matches with scatter = 10.474 +- 9.280 arcsec |
[** so of course this is the only one I can't seem to find in any collection in /repo/dc2!!] |
|
dataId = {'visit': 263501, 'run': '263501', 'raftName': 'R42', 'expId': 263501, 'detectorName': 'S22', 'detector': 179} |
Matched and fit WCS in 3 iterations; found 47 matches with scatter = 2.598 +- 1.650 arcsec |
|
dataId = {'visit': 421725, 'run': '421725', 'raftName': 'R02', 'expId': 421725, 'detectorName': 'S12', 'detector': 14} |
Matched and fit WCS in 3 iterations; found 111 matches with scatter = 7.327 +- 6.184 arcsec |
** e.g.
$ butler query-datasets /repo/dc2 "raw" --where "instrument='LSSTCam-imSim' AND visit=456690 AND detector in (165..170) AND skymap='DC2'" |
py.warnings WARN: /software/lsstsw/stack_20210520/stack/miniconda3-py38_4.9.2-0.6.0/Linux64/daf_butler/21.0.0-103-g0fc66519+7e5b4c34a6/python/lsst/daf/butler/registry/interfaces/_database.py:1559: SAWarning: SELECT statement has a cartesian product between FROM element(s) "dc2_20210215.skymap" and FROM element "dc2_20210215.exposure". Apply join condition(s) between each element to resolve. |
return self._connection.execute(sql, *args, **kwds) |
|
|
type run id band instrument detector physical_filter exposure |
---- ------------ ------------------------------------ ---- ------------- -------- --------------- -------- |
raw 2.2i/raw/all d5315944-a59e-59d8-af91-677113a5ca62 r LSSTCam-imSim 165 r_sim_1.4 456690 |
raw 2.2i/raw/all 1dcc42c9-f4b7-5da7-a298-f08c5cf7d041 r LSSTCam-imSim 169 r_sim_1.4 456690 |
raw 2.2i/raw/all 06384d23-e99e-5b0c-9f59-b7ebf3ba6cec r LSSTCam-imSim 170 r_sim_1.4 456690 |
(so 166 & 167 are also missing...)
The next 8 worst offenders include scatters of (I haven't dug out the ids of these yet...but would be happy to on request!):
Matched and fit WCS in 3 iterations; found 81 matches with scatter = 0.824 +- 0.397 arcsec |
Matched and fit WCS in 1 iterations; found 81 matches with scatter = 0.558 +- 0.373 arcsec |
Matched and fit WCS in 3 iterations; found 60 matches with scatter = 0.480 +- 0.346 arcsec |
Matched and fit WCS in 3 iterations; found 68 matches with scatter = 0.471 +- 0.434 arcsec |
Matched and fit WCS in 1 iterations; found 72 matches with scatter = 0.337 +- 0.260 arcsec |
Matched and fit WCS in 1 iterations; found 56 matches with scatter = 0.319 +- 0.230 arcsec |
Matched and fit WCS in 1 iterations; found 65 matches with scatter = 0.276 +- 0.138 arcsec |
Matched and fit WCS in 1 iterations; found 68 matches with scatter = 0.190 +- 0.104 arcsec |
Lauren MacArthur I took a look at 3 of the 4 worst offenders (the one criminal mastermind, of course, is not in the gen3 repo so I can't easily do anything with it). And it's good news! If I turn on doMagnitudeOutlierRejection=True then they look a lot better (nstars is the number of matched stars to compute the photometric zp):
Visit/Detector | Original offset | Original nstars | New Offset | New nstars |
---|---|---|---|---|
193111/1555 | 1.235+/-1.014 | 11 | 0.007+/-0.004 | 130 |
263501/179 | 2.598+-1.65 | 2 | 0.007+/-0.003 | 101 |
421725/14 | 7.327+/-6.18 | 2 | 0.004+/-0.002 | 121 |
Good news indeed (and wow...only 2 stars for the zp est.!) I may have a look at offender #1 in gen2 land...if it turns out to be another case of "hot mess", it may be worth the deeper look.
More good news...offender #1 is also fixed by running with calibrate.astrometry.doMagnitudeOutlierRejection=True:
--id visit=456690 filter="r" detector=168 |
|
processCcd.calibrate.astrometry.matcher INFO: Matched 58 sources |
processCcd.calibrate.astrometry INFO: Rough zeropoint from astrometry matches is 32.1714 +/- 0.0087. |
processCcd.calibrate.astrometry INFO: Removed 13 magnitude outliers out of 58 total astrometry matches. |
processCcd.calibrate.astrometry INFO: Fit WCS iter 2 failed; using previous iteration: Unable to match sources |
processCcd.calibrate.astrometry INFO: Matched and fit WCS in 1 iterations; found 45 matches with scatter = 0.006 +- 0.003 arcsec |
I'm going to run the full visit with and without the outlier rejection to see if anything else changes. We may have to do some more thorough testing (e.g. I would be happy to launch a full set of the DC2 singleFrameDriver jobs...but not until after the maintenance on Thurs!), but I'm very close to saying this should be a recommended change to the imsim config overrides.
So, there are differences, but I'd say all around it looks like we are indeed getting a much cleaner sample of reference matches with calibrate.astrometry.doMagnitudeOutlierRejection=True. For example, the following are plots of the full visit ref-src Delta(RA)*cos(Dec) for the sources used in the astrometric fit:
Without the config override set (note the missing detector 168 because the bad WCS fit resulted in a failure to find matches in photoCal):
With the calibrate.astrometry.doMagnitudeOutlierRejection=True override:
I've also attached the sky versions of these plots which can be blinked to see where differences in the selections occur.
I was discussing Eli's fix in DM-30490 with him this morning and showed him a comparison of reported astrometric scatter for some DC2 data (5 years of WFD visits covering the DDF region) using w_2021_22 and w_2021_25 with doMagnitudeOutlierRejection=True. These data include ~20k ccd-visits. Here are the histrograms of the scatter values without (w_2021_22) and with (w_2021_25) applying the fix:
Two of the three remaining ccd-visits with log10(scatter/arcsec) > -1.5 are images where only lensed galaxies, AGNs, and SNe were rendered. We had a small bug in our image simulations where the normal galaxies and stars were not included in a very small handful of images for these data. The fields are almost entirely blank aside from a few hundred objects, yet the astrometry still "solved". The single remaining ccd-visit at -0.5 is a normal-looking field where the scatter went from 8 arcsec to 0.345 arcsec. Notably, the none of the reported astrometric scatter values worsened by more than 0.001 arcsec after applying the fix.
Awesome! So setting that config override along with a super relaxed (but smaller than current default of 10) to astrometry.wcsFitter.maxScatterArcsec should keep all the good (doing no harm), improve and recover the bad, and leave out the junk (mis-simulated) frames going into the coadds
I have rerun my scripts looking for parity between the gen3 & gen2 SFM outputs using this new run (/datasets/DC2/repoRun2.2i/rerun/w_2021_25/DM-30812). We are SOOOOO CLOSE, but I have finally encountered some examples of the case of incomplete reference catalog loading due to the 0 padding for the visit definition of this repo (see, e.g. DM-30030 and this community post for details). To illustrate, the following shows the full loaded reference sample (silver circles), selected (i.e. trimmed and passing the reference source selector criterion) reference sample (orange x's) and sources actually used in the astrometric fit (stars) for a given case (visit 193888, detector=126):
Gen3:
and a zoom in:
So, you can see that this detector lines up pretty closely with an edge of this shard and ends up missing out on some of the reference sources that would (should) be included with the 250 pixel padding to the raw WCS when doing the ref cat trimming. The following is the gen2 version:
Note that, for gen2, the selected ref sample had 283 objects, whereas gen3 had only 268. Even so, the source matches that got included in the astrometric fit is actually identical in both cases, so the astrometry is only just barely affected here (but my parity testing is sensitive enough to pick this up). Given that I'm seeing 4 cases of this in just the DC2 dataset (and only a very incomplete one at that as I can only compare the detectors that actually got ingested into the /repo/dc2 repo), this situation is perhaps less rare than we had anticipated/hoped, so updating the visit definition is certainly something to consider (although the partial ingest issues are definitely more urgent...and resolving that will likely result in the visit definition update by default?!)
All four cases here only just barely affect the SFM WCS, so I would have comfortably gone on to the coadd parity comparisons for DC2...but this is not feasible in our current situation of very different visit/detector inputs from gen2 & gen3 repos.
The "good" news is that, as of w_2021_25 and the updated BF kernels for the gen2 repo (DM-30738), and modulo the above and the pesky (but likely insignificant) deblend_peakId offsets, we now seem to be at gen2 vs. gen3 parity for all visit/detector combos of that DC2 dataset that have in common in both the gen2 & gen3 repos.
Would you mind giving this a look and letting me know if it is ready for sign-off? I am particularly interested in your thoughts on how to move on to the coadd comparisons given our gen3 repos ingest "issues".
I think it may just make sense to focus Gen2/3 parity investigation on HSC, and only worry about looking at DC2 (Gen3 especially) in an absolute sense. I think I have set things in motion to address the missing raws, but I don't know when that will actually complete.
But yes, ready for sign-off - and a reminder that I should go patch the visit padding, now that DM-30866 has landed with the functionality for doing that.
Thanks Jim. Yeah...I'm still holding out hope for the raws situation to get sorted on time for the next processing (but no pressure!!) DC2 is our only "natural" path to looking at gen2 vs gen3 coadds without external calibrations (for which we aren't yet at parity...and different/unpredictable input ordering in the gen3 bps vs. gen2 slurm runs may preclude exact parity).
I've run the script attached to
DM-30647on the gen2 vs. gen3 w_2021_24 runs. Differences are as follows:There are huge differences in the number and subset of {{calexp}}s produced by the two runs which stems from at least two causes:
DM-30747)DM-30426is present for both runs, but is particularly bad for gen2 as it brings any singleFrameDriver job to a halt, so there are many, many visit/detectors missing from that rerun. I will kick off a run on w_2021_25 (i.e. after the fix went in) to make sure no "other" SFM failures occur in gen2 that do not happen in gen3For the visit/detector combos with successful processing in both runs:
Each and every catalog had differences between gen2 & gen3 of the following type:
Single Frame Processing for DC2 174534 i log:
And many also had:
(I do realize the absolute metric here is not the most useful...but there was some other column value for which it was and I failed to adapt based on column as I loop through them all. I'll change this on future runs, but these take days to run, so I'll just leave these as revealing "a difference" between gen2 & gen3 measurements.)
Other than that, everything looks identical.
So one question is: do we care about the parent and id differences?
Another is how to go about getting to the bottom of the gen2 vs. gen3 shapeHSM differences (paging Joshua Meyers on this one!)?