# Investigate differences in the coadd background between gen2 and gen3

XMLWordPrintable

#### Details

• Type: Story
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s: None
• Labels:
• Story Points:
26
• Team:
Data Release Production
• Urgent?:
No

#### Description

While testing the streak masking code, I noticed that there are pretty significant differences between the background in gen2 and gen3 for the coadd in the attached image. The satellite trail visible in the gen2 version has also disappeared in the gen3 coadd.

#### Activity

No builds found.
Clare Saunders created issue -
Field Original Value New Value
Attachment 7409C531-CA73-4746-A00C-CD9E10450B7D.png [ 49741 ]
Attachment 02854A94-AAA6-4DFF-A455-BFB428CF8A28.png [ 49742 ]
Hide
Lauren MacArthur added a comment - - edited

If the gen3 repo is using freshly made global SKY frames (as opposed to converted gen2 old versions), then this is entirely expected since the gen2 versions are way out of date. E.g.

Currently used:

Show
Lauren MacArthur added a comment - - edited If the gen3 repo is using freshly made global SKY frames (as opposed to converted gen2 old versions), then this is entirely expected since the gen2 versions are way out of date. E.g. Currently used: vs. “freshly” made:
 Assignee Lauren MacArthur [ lauren ]
 Status To Do [ 10001 ] In Progress [ 3 ]
Hide
Lauren MacArthur added a comment -

Ok, I think I have just ruled out the above conjecture.  I had a "guess" at the repos associated with the image Clare Saunders attached to the ticket and, having created the two butlers, asked each which SKY frames they were using.  E.g.

 # Get SKY frame URI's from Clare's coadds visit, ccd, filter = 1228, 69, "HSC-I" rootDirGen2 = "/datasets/hsc/repo/rerun/private/csaunder/coadd_noMaskStreaks2" butlerGen2 = dafPersist.Butler(rootDirGen2) rootDirGen3 = "/repo/main" butlerGen3 = Butler(rootDirGen3, collections="u/csaunder/coadd_test", instrument="HSC") fileNameGen2 = butlerGen2.getUri("sky", visit=visit, ccd=ccd) fileNameGen3 = butlerGen3.getURI("sky", visit=visit, detector=ccd) fileNameGen3 = str(fileNameGen3).replace("file://", "") print("fileNameGen2 =", fileNameGen2) print("fileNameGen3 =", fileNameGen3)   fileNameGen2 = /datasets/hsc/repo/CALIB/SKY/2014-07-14/HSC-I/SKY-2014-07-14-HSC-I-069.fits fileNameGen3 = /datasets/hsc/calib/20180117/SKY/2014-07-14/HSC-I/SKY-2014-07-14-HSC-I-069.fits 

These two files are indeed identical, so the SKY frames going into coaddition appear not to be the issue.

Show
Lauren MacArthur added a comment - Ok, I think I have just ruled out the above conjecture.  I had a "guess" at the repos associated with the image Clare Saunders  attached to the ticket and, having created the two butlers, asked each which SKY frames they were using.  E.g. # Get SKY frame URI's from Clare's coadds visit, ccd, filter = 1228 , 69 , "HSC-I" rootDirGen2 = "/datasets/hsc/repo/rerun/private/csaunder/coadd_noMaskStreaks2" butlerGen2 = dafPersist.Butler(rootDirGen2) rootDirGen3 = "/repo/main" butlerGen3 = Butler(rootDirGen3, collections = "u/csaunder/coadd_test" , instrument = "HSC" ) fileNameGen2 = butlerGen2.getUri( "sky" , visit = visit, ccd = ccd) fileNameGen3 = butlerGen3.getURI( "sky" , visit = visit, detector = ccd) fileNameGen3 = str (fileNameGen3).replace( "file://" , "") print ( "fileNameGen2 =" , fileNameGen2) print ( "fileNameGen3 =" , fileNameGen3)   fileNameGen2 = / datasets / hsc / repo / CALIB / SKY / 2014 - 07 - 14 / HSC - I / SKY - 2014 - 07 - 14 - HSC - I - 069.fits fileNameGen3 = / datasets / hsc / calib / 20180117 / SKY / 2014 - 07 - 14 / HSC - I / SKY - 2014 - 07 - 14 - HSC - I - 069.fits These two files are indeed identical, so the SKY frames going into coaddition appear not to be the issue.
 Attachment warp_gen2vsgen3_19660_4.png [ 49753 ]
 Attachment SKY_HSC-I_19660_1.png [ 49754 ]
Hide
Lauren MacArthur added a comment -

I went ahead and tried making gen2 and gen3 warps.  I used the RC2 w_2021_18 reruns as "input" to the task and, indeed, the problem is coming in during this phase.  The following plots the gen2 and gen3 warps for tract=9813 patch=4 (or 4,0), visit =19660 and their diff:

and this is the SKY frame associated with the lower CCD section:

which shows the same pattern as the difference image (rightmost panel above). So it seems there is a difference in the application of the SKY frames. I'll look further into this.

Show
Lauren MacArthur added a comment - I went ahead and tried making gen2 and gen3 warps.  I used the RC2 w_2021_18 reruns as "input" to the task and, indeed, the problem is coming in during this phase.  The following plots the gen2 and gen3 warps for tract=9813 patch=4 (or 4,0), visit =19660 and their diff: and this is the SKY frame associated with the lower CCD section: which shows the same pattern as the difference image (rightmost panel above). So it seems there is a difference in the application of the SKY frames. I'll look further into this.
 Attachment skyCorr_HSC-I_19660_1.png [ 49755 ]
Hide
Lauren MacArthur added a comment -

Ok, so the first difference I'm seeing is actually in the skyCorr images, which Christopher Waters already alluded to on DM-22534.  Here is an example difference image of the skyCorr.getImage() arrays of gen2 vs. gen3 for the w_2021_18 RC2 runs:

These files are of type lsst.afw.math._backgroundList.BackgroundList and have 7 individual background entries (these all get summed when calling a getImage() on the skyCorr dataset).  It looks like the first 4 are identical, but those in 5-7 differ.  The first four come from the calexpBackground dataset and those are looking identical between gen2 & gen3, so this is coming in during skyCorrectionTask. The logs are a bit hard to parse, but it does look like the reported "Sky frame scale"s are indeed different for gen2 vs. gen3. I think I'll have to run skyCorrection manually to glean any more information (a really rough guess is that a difference in the order of the detectors can somehow make a small difference?  Other ideas welcome!)

I also am not sure if this level of difference can compound all the way to the difference seen in the coadd backgrounds in the above attachment --this doesn't seem to be a big enough effect nor have enough structure that is already seen in the warps – but it would be nice to rule it out.

Show
Lauren MacArthur added a comment - Ok, so the first difference I'm seeing is actually in the skyCorr images, which Christopher Waters already alluded to on DM-22534 .  Here is an example difference image of the skyCorr.getImage() arrays of gen2 vs. gen3 for the w_2021_18 RC2 runs: These files are of type lsst.afw.math._backgroundList.BackgroundList and have 7 individual background entries (these all get summed when calling a getImage() on the skyCorr dataset).  It looks like the first 4 are identical, but those in 5-7 differ.  The first four come from the calexpBackground dataset and those are looking identical between gen2 & gen3, so this is coming in during skyCorrectionTask . The logs are a bit hard to parse, but it does look like the reported "Sky frame scale"s are indeed different for gen2 vs. gen3. I think I'll have to run skyCorrection manually to glean any more information (a really rough guess is that a difference in the order of the detectors can somehow make a small difference?  Other ideas welcome!) I also am not sure if this level of difference can compound all the way to the difference seen in the coadd backgrounds in the above attachment --this doesn't seem to be a big enough effect nor have enough structure that is already seen in the warps – but it would be nice to rule it out.
Hide
Clare Saunders added a comment -

Thanks for looking into this Lauren MacArthur. Sorry for leaving the source of the coadds a mystery. They are just from RC2 weeklies:

 butler_gen2 = Gen2Butler('/datasets/hsc/repo/rerun/RC/w_2021_14/DM-29519') butler_gen3 = Butler('/repo/main', collections="HSC/runs/RC2/w_2021_14/DM-29528") gen2_dataId = {'tract':9813, 'patch':"4,8", 'filter':'HSC-I'} gen3_dataId = {'tract': 9813, 'patch': 76, 'band': 'i'} gen2Coadd = butler_gen2.get('deepCoadd', dataId=gen2_dataId) gen3Coadd = butler_gen3.get('deepCoadd', dataId=gen3_dataId) 

However, using these two butlers, I did find the same results as Lauren MacArthur for the SKY frames used for visit=1228, ccd=69:

 fileNameGen2 = /datasets/hsc/repo/CALIB/SKY/2014-07-14/HSC-I/SKY-2014-07-14-HSC-I-069.fits fileNameGen3 = /datasets/hsc/calib/20180117/SKY/2014-07-14/HSC-I/SKY-2014-07-14-HSC-I-069.fits 

Show
Clare Saunders added a comment - Thanks for looking into this Lauren MacArthur . Sorry for leaving the source of the coadds a mystery. They are just from RC2 weeklies: butler_gen2 = Gen2Butler( '/datasets/hsc/repo/rerun/RC/w_2021_14/DM-29519' ) butler_gen3 = Butler( '/repo/main' , collections= "HSC/runs/RC2/w_2021_14/DM-29528" ) gen2_dataId = { 'tract' : 9813 , 'patch' : "4,8" , 'filter' : 'HSC-I' } gen3_dataId = { 'tract' : 9813 , 'patch' : 76 , 'band' : 'i' } gen2Coadd = butler_gen2.get( 'deepCoadd' , dataId=gen2_dataId) gen3Coadd = butler_gen3.get( 'deepCoadd' , dataId=gen3_dataId) However, using these two butlers, I did find the same results as Lauren MacArthur for the SKY frames used for visit=1228, ccd=69: fileNameGen2 = /datasets/hsc/repo/CALIB/SKY/ 2014 - 07 - 14 /HSC-I/SKY- 2014 - 07 - 14 -HSC-I- 069 .fits fileNameGen3 = /datasets/hsc/calib/ 20180117 /SKY/ 2014 - 07 - 14 /HSC-I/SKY- 2014 - 07 - 14 -HSC-I- 069 .fits
 Epic Link DM-29149 [ 458506 ] DM-29153 [ 458510 ]
 Labels drp-parity
 Epic Link DM-29153 [ 458510 ] DM-30472 [ 509197 ]
 Epic Link DM-30472 [ 509197 ] DM-30540 [ 511198 ]
Hide
Lauren MacArthur added a comment - - edited

I'm back to this one, but now looking at DC2 processing which represents a more "vanilla" coaddition, i.e. no external (fgcm/jointcal) global calibrations in play. The coadds are not bitwise identical between gen2 & gen3, so "something" is happening differently. First checks have all proven bitwise identical, including:

• comparison of directWarps: image, variance, mask planes, photoCalibs, PSFs, WCSs
• comparison of psfMatchedWarps: image, variance, mask planes, photoCalibs, PSFs, WCSs

Moving on to looking at the "deepCoadd_calexp_background" dataset...yep, they are slightly different (at the ~4% level):
gen2: -0.00567247 vs gen3: -0.00545275 (diff = 0.00021972)

Looking more closely at the coadd image difference, it is indeed roughly consistent with an offset equivalent to the above background level difference):

imageDiff minmax: -0.00024414062 -0.00012207031 median: -0.00021971297

I've been looking at the configs, but don't see any smoking guns.

Show
Lauren MacArthur added a comment - - edited I'm back to this one, but now looking at DC2 processing which represents a more "vanilla" coaddition, i.e. no external (fgcm/jointcal) global calibrations in play. The coadds are not bitwise identical between gen2 & gen3, so "something" is happening differently. First checks have all proven bitwise identical, including: comparison of directWarps: image, variance, mask planes, photoCalibs, PSFs, WCSs comparison of psfMatchedWarps: image, variance, mask planes, photoCalibs, PSFs, WCSs Moving on to looking at the "deepCoadd_calexp_background" dataset...yep, they are slightly different (at the ~4% level): gen2: -0.00567247 vs gen3: -0.00545275 (diff = 0.00021972) Looking more closely at the coadd image difference, it is indeed roughly consistent with an offset equivalent to the above background level difference): imageDiff minmax: -0.00024414062 -0.00012207031 median: -0.00021971297 I've been looking at the configs, but don't see any smoking guns.
Hide
Lauren MacArthur added a comment - - edited

Ok, I think I have narrowed this down to a random number seed difference between gen2 & gen3.  It is happening in the dynamic detection stage on the coadds, which uses an expId as a seed, and this expId gets computed differently in the two middlewares.  Ideally, the gen2 computation could be changed to match the gen3 one.  The relevant code that would need changing is:

Jim Bosch, if you (or anyone else watching!) happen to know how to adapt these gen2 mapper functions to mimic the gen3 equivalent, some guidance would be really appreciated!

The setting of the expIds happens here:

Show
Lauren MacArthur added a comment - - edited Ok, I think I have narrowed this down to a random number seed difference between gen2 & gen3.  It is happening in the dynamic detection stage on the coadds, which uses an expId as a seed, and this expId gets computed differently in the two middlewares.  Ideally, the gen2 computation could be changed to match the gen3 one.  The relevant code that would need changing is: obs_lsst : https://github.com/lsst/obs_lsst/blob/master/python/lsst/obs/lsst/lsstCamMapper.py#L256-L284 obs_subaru : https://github.com/lsst/obs_subaru/blob/master/python/lsst/obs/hsc/hscMapper.py#L288-L307 Jim Bosch , if you (or anyone else watching!) happen to know how to adapt these gen2 mapper functions to mimic the gen3 equivalent, some guidance would be really appreciated! The setting of the expIds happens here: https://github.com/lsst/pipe_tasks/blob/master/python/lsst/pipe/tasks/multiBand.py#L272-L296
Hide
Jim Bosch added a comment -

This is going to be a bit messy; to replicate the Gen3 behavior I think we really need a SkyMap object (at least), and we can't easily get one of those to the Gen2 Mapper methods.

My plan is to put a new utility method on the SkyMap object that will use Gen3 code under the hood, but accept Gen2 coadd data IDs.  We'll then modify the Gen2 tasks that use the Gen2 "coaddExposureId" etc. to read in the SkyMap (if they don't already) and call that method.

The big catch is that we'll need to somehow standardize the heterogeneous Gen2 coadd "filter" data ID values into the "band" value that Gen3 uses.  Lauren MacArthur I'm hoping you might already have some ideas on how to do that from e.g. pipe_analysis needs, and can take care of that problem while I write the SkyMap method.  The solution doesn't have to be pretty, because we can stuff it into the Gen2 tasks and hold our noses until they're gone.

Show
Jim Bosch added a comment - This is going to be a bit messy; to replicate the Gen3 behavior I think we really need a SkyMap object (at least), and we can't easily get one of those to the Gen2 Mapper methods. My plan is to put a new utility method on the SkyMap object that will use Gen3 code under the hood, but accept Gen2 coadd data IDs.  We'll then modify the Gen2 tasks that use the Gen2 "coaddExposureId" etc. to read in the SkyMap (if they don't already) and call that method. The big catch is that we'll need to somehow standardize the heterogeneous Gen2 coadd "filter" data ID values into the "band" value that Gen3 uses.  Lauren MacArthur I'm hoping you might already have some ideas on how to do that from e.g. pipe_analysis needs, and can take care of that problem while I write the SkyMap method.  The solution doesn't have to be pretty, because we can stuff it into the Gen2 tasks and hold our noses until they're gone.
Hide
Jim Bosch added a comment -

Ok, I've added a completely untested pack_data_id method to the tickets/DM-30284 branch of skymap; I'm hoping you can take it from here, but I'm happy to help (especially if it doesn't work).

https://github.com/lsst/skymap/pull/54/files

Show
Jim Bosch added a comment - Ok, I've added a completely untested pack_data_id method to the tickets/ DM-30284 branch of skymap; I'm hoping you can take it from here, but I'm happy to help (especially if it doesn't work). https://github.com/lsst/skymap/pull/54/files
Hide
Lauren MacArthur added a comment -

Awesome, thanks so much, Jim! I’ll take it for a spin and see if I can get it working.

Show
Lauren MacArthur added a comment - Awesome, thanks so much, Jim! I’ll take it for a spin and see if I can get it working.
 Link This issue relates to DM-32071 [ DM-32071 ]
Hide
Lauren MacArthur added a comment -

Ok, so getting all the exposure and source (factory) ids in sync between the two middlewares went a bit further down a rabbit hole than I was expecting (note the 7 branches –so far– attached to this ticket!), but I've now got that all working for LSSTCam-imSim (and HSC is in progress, but very close). I now get gen2/gen3 parity in everything in coaddition up until the mergeDetections stage EXCEPT for small differences in the PSFs attached to the deepCoadd (the pre-_calexp one). Here is an example (note the slightly obscured 1e-17, & 1e-14 scales on the colorbars of the difference panels):

These are tiny differences, and I have no idea where the difference is coming in (ideas welcome!), but this is the only gen2/gen3 difference I have noted at this stage, and I do start to see differences in the downstream tasks, starting with the deblender. Examples of some of rolled-up difference stats I print to the logs when running my script that makes the catalog comparisons (looping through all columns, so any not listed here are identical between the two middlewares – modulo deblend_error and deblend_modelType as these are string columns and can be read in with afwTable...) are:

 compareCoadd INFO: Comparing gen2 vs. gen3 deepCoadd_deblendedFlux catalogs compareCoadd WARN: ...patch 31: deblend_runtime Absolute diff : mean: 18.428 min: 0.000 max 10529.36 compareCoadd WARN: ...patch 31: deblend_iterations Absolute diff : mean: 0.200 min: 0.000 max 156.000 compareCoadd WARN: ...patch 31: deblend_peak_instFlux Absolute diff : mean: 0.004 min: 0.000 max 10.238 compareCoadd WARN: ...patch 31: deblend_scarletFlux Absolute diff : mean: 0.305 min: 0.000 max 667.090 compareCoadd WARN: ...patch 31: deblend_logL Absolute diff : mean: 11.287 min: 0.000 max 99287.750 

Fred Moolekamp, I alluded to this in a drp status report. Does it make any sense to you that such tiny PSF differences could lead to these levels of differences in some of the scarlet outputs (I don't think the runtime one matters, but the others probably do...)?

It took me "a minute" to notice these small PSF differneces as, when one tries to plot the PSF images with the oh-so-handy afwDiplay tools, one encounters the following:

 TypeError: getZScale(): incompatible function arguments. The following argument types are supported:  1. (image: lsst.afw.image.image.ImageU, nsamples: int = 1000, contrast: float = 0.25) -> Tuple[float, float]  2. (image: lsst.afw.image.image.ImageF, nsamples: int = 1000, contrast: float = 0.25) -> Tuple[float, float]   Invoked with: lsst.afw.image.image.ImageD=[[0. 0. 0. ... 0. 0. 0.]  [0. 0. 0. ... 0. 0. 0.]  [0. 0. 0. ... 0. 0. 0.]  ...  [0. 0. 0. ... 0. 0. 0.]  [0. 0. 0. ... 0. 0. 0.]  [0. 0. 0. ... 0. 0. 0.]], bbox=(minimum=(13971, 17971), maximum=(14029, 18029)), 1000, 0.25 

i.e. the PSFs are stored as ImageD and that is not an accepted type in getZScale(). So, rather than messing around with other scalings, I first just casted the PSF images to ImageFs (using getPsf().computeImage(position).convertFloat()), but then they do compare identical!  However, when I ran my bit-by-bit comparisons of "all the things", I noticed the wee PSF differences, so the above plots used some ugly linear+min+max scalings.

Show
Hide
Fred Moolekamp added a comment -

Does it make any sense to you that such tiny PSF differences could lead to these levels of differences in some of the scarlet outputs (I don't think the runtime one matters, but the others probably do...)?

Yes and no. The most informative column to look at is probably the deblend_iterations difference because I think that it explains the others. There is a convergence check after each iteration and your results show that on average most of the models converge at the same iteration with the difference PSFs, with many of them not changing at all. I'd be curious to see a histogram of the difference in iterations, without the absolute value, because as long as the vast majority of blends are converging at the same iteration, or maybe even +/- 1, then I'd say that everything is working as expected. Without that plot I'm just speculating that there are only a few catastrophic cases with blends that barely converge at lower iterations and then resizing, which occurs every 10 iterations, could potentially lead to much larger numbers of iterations. With longer iteration times there will also be slight changes in the models, with slightly better models for more iterations but (if the convergence check is set appropriately) there should rarely be a huge difference in the measured values, but it does happen. Since you just print the difference and not the percentage (difference/value) it's hard for me to know whether or not a substantial number of (or any) blends have measurements that are significantly different.

Anytime we have a new scarlet release we see similar types of issues, where the number of iterations and exact values of measurements may differ on a single blend even though overall (if we accept the changes) the results are improved.

These cases are rare but are also an unfortunate consequence of implementing a convergence test. If anyone has suggestions for how to make this more robust I'm all ears.

Show
Fred Moolekamp added a comment - Does it make any sense to you that such tiny PSF differences could lead to these levels of differences in some of the scarlet outputs (I don't think the runtime one matters, but the others probably do...)? Yes and no. The most informative column to look at is probably the deblend_iterations difference because I think that it explains the others. There is a convergence check after each iteration and your results show that on average most of the models converge at the same iteration with the difference PSFs, with many of them not changing at all. I'd be curious to see a histogram of the difference in iterations, without the absolute value, because as long as the vast majority of blends are converging at the same iteration, or maybe even +/- 1, then I'd say that everything is working as expected. Without that plot I'm just speculating that there are only a few catastrophic cases with blends that barely converge at lower iterations and then resizing, which occurs every 10 iterations, could potentially lead to much larger numbers of iterations. With longer iteration times there will also be slight changes in the models, with slightly better models for more iterations but (if the convergence check is set appropriately) there should rarely be a huge difference in the measured values, but it does happen. Since you just print the difference and not the percentage (difference/value) it's hard for me to know whether or not a substantial number of (or any) blends have measurements that are significantly different. Anytime we have a new scarlet release we see similar types of issues, where the number of iterations and exact values of measurements may differ on a single blend even though overall (if we accept the changes) the results are improved. These cases are rare but are also an unfortunate consequence of implementing a convergence test. If anyone has suggestions for how to make this more robust I'm all ears.
 Link This issue relates to DM-32766 [ DM-32766 ]
 Attachment skyCorr_w42_orig_gen2_vs_gen3.png [ 55134 ]
 Attachment skyCorr_w42_gen2_vs_Gen3new.png [ 55135 ]
Hide
Lauren MacArthur added a comment -

A few updates for this ticket.

First, I believe the scarlet deblender issues stemmed from a mild dependence on the input order of the bands (hence it not showing up in a test I did with the old deblender, which operates on a single band at a time). This is solved/made reproducible by sorting the bands (alphabetically, for no obviously "better" ordering) up front in the runQuantum method.

Second, I believe the PSF differences are also due to a mild dependence on the input order in assembleCoadd. We were already sorting by visit, but an additional sorting of the detectors within a visit was required.

Third, I have just confirmed that, once again, they skyCorr differences were due to a mild input order dependence. I happen to have the plots handy for this one:

Gen2 vs. Gen3 for w_2021_42 runs:

Gen2 vs. new sorted Gen3 skyCorr run (using collection HSC/runs/RC2/w_2021_42/DM-32248 as input):

I put in a print statement while testing and for the Gen3 run, and example initial ordering was:

 lauren@lsst-devl01:~/tickets/DM-30284/submit/u/lauren/DM-30284_skyCorr/20211201T022849Z/jobs/skyCorr $grep detectorOrder 1228/240_skyCorr_1228_.2985264.out  Gen3 detectorOrder = [56, 1, 97, 82, 5, 8, 28, 99, 36, 96, 66, 48, 54, 46, 50, 93, 6, 61, 35, 100, 68, 0, 57, 74, 87, 7, 71, 12, 24, 42, 49, 25, 72, 29, 102, 62, 53, 78, 58, 44, 2, 59, 15, 34, 40, 33, 38, 79, 95, 89, 81, 86, 73, 22, 70, 103, 32, 39, 16, 20, 94, 55, 45, 47, 64, 14, 91, 67, 51, 98, 75, 76, 26, 52, 77, 85, 60, 65, 13, 80, 101, 11, 17, 92, 41, 21, 23, 37, 69, 31, 3, 19, 63, 30, 88, 83, 4, 10, 90, 84, 18, 43, 27] Gen3 detectorOrder sorted = [0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103]  Things get somewhat muddled going forward on RC2 runs because of the influence of fgcm and jointcal differences, but I am now attempting to run makeWarp with these new skyCorr outputs to see how much difference it makes on the warps. Show Lauren MacArthur added a comment - A few updates for this ticket. First, I believe the scarlet deblender issues stemmed from a mild dependence on the input order of the bands (hence it not showing up in a test I did with the old deblender, which operates on a single band at a time). This is solved/made reproducible by sorting the bands (alphabetically, for no obviously "better" ordering) up front in the runQuantum method. Second, I believe the PSF differences are also due to a mild dependence on the input order in assembleCoadd . We were already sorting by visit, but an additional sorting of the detectors within a visit was required. Third, I have just confirmed that, once again, they skyCorr differences were due to a mild input order dependence. I happen to have the plots handy for this one: Gen2 vs. Gen3 for w_2021_42 runs: Gen2 vs. new sorted Gen3 skyCorr run (using collection HSC/runs/RC2/w_2021_42/ DM-32248 as input): I put in a print statement while testing and for the Gen3 run, and example initial ordering was: lauren@lsst - devl01:~ / tickets / DM - 30284 / submit / u / lauren / DM - 30284_skyCorr / 20211201T022849Z / jobs / skyCorr$ grep detectorOrder 1228 / 240_skyCorr_1228_ . 2985264.out Gen3 detectorOrder = [ 56 , 1 , 97 , 82 , 5 , 8 , 28 , 99 , 36 , 96 , 66 , 48 , 54 , 46 , 50 , 93 , 6 , 61 , 35 , 100 , 68 , 0 , 57 , 74 , 87 , 7 , 71 , 12 , 24 , 42 , 49 , 25 , 72 , 29 , 102 , 62 , 53 , 78 , 58 , 44 , 2 , 59 , 15 , 34 , 40 , 33 , 38 , 79 , 95 , 89 , 81 , 86 , 73 , 22 , 70 , 103 , 32 , 39 , 16 , 20 , 94 , 55 , 45 , 47 , 64 , 14 , 91 , 67 , 51 , 98 , 75 , 76 , 26 , 52 , 77 , 85 , 60 , 65 , 13 , 80 , 101 , 11 , 17 , 92 , 41 , 21 , 23 , 37 , 69 , 31 , 3 , 19 , 63 , 30 , 88 , 83 , 4 , 10 , 90 , 84 , 18 , 43 , 27 ] Gen3 detectorOrder sorted = [ 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17 , 18 , 19 , 20 , 21 , 22 , 23 , 24 , 25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 , 36 , 37 , 38 , 39 , 40 , 41 , 42 , 43 , 44 , 45 , 46 , 47 , 48 , 49 , 50 , 51 , 52 , 53 , 54 , 55 , 56 , 57 , 58 , 59 , 60 , 61 , 62 , 63 , 64 , 65 , 66 , 67 , 68 , 69 , 70 , 71 , 72 , 73 , 74 , 75 , 76 , 77 , 78 , 79 , 80 , 81 , 82 , 83 , 84 , 85 , 86 , 87 , 88 , 89 , 90 , 91 , 92 , 93 , 94 , 95 , 96 , 97 , 98 , 99 , 100 , 101 , 102 , 103 ] Things get somewhat muddled going forward on RC2 runs because of the influence of fgcm and jointcal differences, but I am now attempting to run makeWarp with these new skyCorr outputs to see how much difference it makes on the warps.
 Attachment directWarp_Gen3orig_Gen2orig_HSC-I_19660_4_9813.png [ 55198 ] Attachment directWarp_Gen3newer_Gen2orig_HSC-I_19660_4_9813.png [ 55199 ] Attachment directWarp_Gen3_newer_Gen3_orig_HSC-I_19660_4_9813.png [ 55200 ]
 Attachment compareVisit-v19660-diff_base_CircularApertureFlux_12_0-sky-gals.png [ 55201 ] Attachment compareVisit-v19660-diff_raCosDec-sky-gals.png [ 55202 ]
Hide
Lauren MacArthur added a comment - - edited

Had a look at the warps with the above fix and the difference is subtle, as expected.  So I dug further into the application of the skyCorr frames in the makeWarp task and, it seems the sky correction is not actually getting applied to the calexp image array.  I believe the problem originated in this commit: back in March (first tag is 22.0.0 / w_2021_10). While I’m not totally clear on the nuances of when object copy vs. reference is effected, in the above commit the calibrations are being directly applied to calexp.maskedImage, whereas the skyCorr gets applied to mi, which was set to mi = calexp.maskedImage. If mi was just a reference, I would’ve expected the two to be linked and all would be fine, but that appears not to be the case, and the calexp.maskedImage does not get the skyCorr applied.

The following plots demonstrate the difference if I replace mi with calexp.maskedImage where the skyCorr is supposed to be getting applied in the Gen3 code path:

Gen3 w42 vs. Gen3 this ticket branch:

Gen3 w42 vs. Gen2 w42:

Gen3 this ticket branch vs. Gen2 w42:

The remaining difference is likely attributed to the fact that we don't quite have gen2/gen3 parity for fgcm and jointcal, the difference of which look like this for this visit (note the scales are mmag and mas):

Show
Lauren MacArthur added a comment - - edited Had a look at the warps with the above fix and the difference is subtle, as expected.  So I dug further into the application of the skyCorr frames in the makeWarp task and, it seems the sky correction is not actually getting applied to the calexp image array.  I believe the problem originated in this commit: back in March (first tag is 22.0.0 / w_2021_10 ). While I’m not totally clear on the nuances of when object copy vs. reference is effected, in the above commit the calibrations are being directly applied to calexp.maskedImage , whereas the skyCorr gets applied to mi , which was set to mi = calexp.maskedImage . If mi was just a reference, I would’ve expected the two to be linked and all would be fine, but that appears not to be the case, and the calexp.maskedImage does not get the skyCorr applied. The following plots demonstrate the difference if I replace mi with calexp.maskedImage where the skyCorr is supposed to be getting applied in the Gen3 code path: Gen3 w42 vs. Gen3 this ticket branch: Gen3 w42 vs. Gen2 w42: Gen3 this ticket branch vs. Gen2 w42: The remaining difference is likely attributed to the fact that we don't quite have gen2/gen3 parity for fgcm and jointcal , the difference of which look like this for this visit (note the scales are mmag and mas):
Hide
Jim Bosch added a comment - - edited

I think the problem is in this line:

The property setter invokes Exposure.setMaskedImage, which invokes MaskedImage::operator=, which reassigns the data pointer inside the target MaskedImage to point to the same data pointer as the source MaskedImage. That disconnects the Exposure from any existing views of its MaskedImage, which still have the old data pointer. Note that this is exactly what would happen in pure Python code with Numpy array attributes of a simple class:

 class Holder:  def __init__(self):  self.array = np.zeros(4, dtype=int)   h = Holder() a = h.array h.array = np.ones(4, dtype=int) assert np.all(a != h.array) 

But it's also the kind of code where everyone who writes it expects it to do what they want, even if different people want and expect different things.

So, I don't think we want to change MaskedImage::operator=, and I don't think I want to deal with the potential pain of changing the behavior of Exposure.setMaskedImage, but we should add a bool deep=False argument to make it possible to change that behavior. We can then use deep=True in pipe_tasks to fix this problem; there is probably a way to do what we want now, but I haven't been able to quickly spot it by scanning the pybind11 source. I think we probably want to remove the setter for this property here:

https://github.com/lsst/afw/blob/main/python/lsst/afw/image/exposure/_exposure.cc#L96

To force people to at least look at setExposure and see that it has a bool deep option (even if that has a default) in the hopes of heading off similar problems in the future.

Show
 Link This issue is triggering DM-32827 [ DM-32827 ]
 Attachment ci_hsc_gen3_vs_gen2_DM-32084_w51.png [ 55507 ]
Hide
Lauren MacArthur added a comment -

With all the fixes in, I have run the ci_hsc's locally (having turned off the CiLimits in the Gen3 version to get consistent processing with Gen2)...drumroll....difference between the coadd calexps:

I have run the catalog comparisons now and they are all come out identical with a few exceptions.
Minor:

• the coadd source IDs, but we don't care as these don't affect any algorithms (it's just the exposureId's that get used to set rng seeds, and those are synched). Also, I think these come out equal when using the hsc_rings_v1 skymap as this is what the Gen2-hscMapper is tuned to (I will double-check that).
• deblend_runtime & modelfit_CModel_[initial/exp/dev]_time, which also seem to be of no consequence).

• a single entry (out of 9485) has slightly (at the 10e-11 pixel level) different numbers for base_SdssShape_[xx/yy/xy]
• two entries have small (absolute diff at the 10-12 nJ[?] level) for base_CircularApertureFlux_[35/50/70]_0_instFlux

Major:

• the forced_src catalogs have differences in all quantities!@#!

This last one was another rabbit-hole as I started looking for ordering/expId differences. Failing to find any, I started brute-force printing stuff out. Very long story short, the issue is that in Gen2 the input exposure to forcedPhotCoadd is deepCoadd_calexp (see here) whereas in Gen3 it is deepCoadd (see here)!!!

I confirmed this by changing the input exposure in Gen2 to be just deepCoadd (seemed easier!) and, indeed, when the input exposures agree, so do the output catalogs (but with the "slightly-less-minor" differences propagated at the same very low levels).

So...I believe this means that all Gen3 runs to date have been operating on the "wrong" input exposures (I'm assuming the Gen2 version has it right, although I'm sure there are some science cases where these would be preferred  ). My Gen3-repo-foo is still pretty weak, but I'm wondering if this has consequences for current repos that have the "wrong" dataType for that connection defined (Jim Bosch)? I think ci_hsc_gen3 should be ok, though, since it's made fresh each time? I'm checking on that locally and a new Jenkins is running.

Show
Lauren MacArthur added a comment - With all the fixes in, I have run the ci_hsc's locally (having turned off the CiLimits in the Gen3 version to get consistent processing with Gen2)...drumroll....difference between the coadd calexps: I have run the catalog comparisons now and they are all come out identical with a few exceptions. Minor: the coadd source IDs, but we don't care as these don't affect any algorithms (it's just the exposureId's that get used to set rng seeds, and those are synched). Also, I think these come out equal when using the hsc_rings_v1 skymap as this is what the Gen2- hscMapper is tuned to (I will double-check that). deblend_runtime & modelfit_CModel_ [initial/exp/dev] _time , which also seem to be of no consequence). Slightly Less Minor (in deepCoadd_meas ): a single entry (out of 9485) has slightly (at the 10e-11 pixel level) different numbers for base_SdssShape_ [xx/yy/xy] two entries have small (absolute diff at the 10-12 nJ [?] level) for  base_CircularApertureFlux_ [35/50/70] _0_instFlux Major: the forced_src catalogs have differences in all quantities!@#! This last one was another rabbit-hole as I started looking for ordering/expId differences. Failing to find any, I started brute-force printing stuff out. Very long story short, the issue is that in Gen2 the input exposure to forcedPhotCoadd  is deepCoadd_calexp (see here ) whereas in Gen3 it is deepCoadd (see here )!!! I confirmed this by changing the input exposure in Gen2 to be just deepCoadd (seemed easier!) and, indeed, when the input exposures agree, so do the output catalogs (but with the "slightly-less-minor" differences propagated at the same very low levels). So...I believe this means that all Gen3 runs to date have been operating on the "wrong" input exposures (I'm assuming the Gen2 version has it right, although I'm sure there are some science cases where these would be preferred  ). My Gen3-repo-foo is still pretty weak, but I'm wondering if this has consequences for current repos that have the "wrong" dataType for that connection defined ( Jim Bosch )? I think ci_hsc_gen3 should be ok, though, since it's made fresh each time? I'm checking on that locally and a new Jenkins is running .
Hide
Lauren MacArthur added a comment -

And with that, this is ready for review. I'd like you both to have a look and sign off. Feel free to decide for yourselves, but I think the following distribution of PRs seems reasonable:

Yusra AlSayyad – PRs mostly having to do with input ordering:
https://github.com/lsst/pipe_drivers/pull/100
https://github.com/lsst/ci_hsc_gen2/pull/43

Show
Lauren MacArthur added a comment - And with that, this is ready for review. I'd like you both to have a look and sign off. Feel free to decide for yourselves, but I think the following distribution of PRs seems reasonable: Jim Bosch – PRs mostly having to do with the Exposure Id wrangling: https://github.com/lsst/coadd_utils/pull/30 https://github.com/lsst/skymap/pull/54 https://github.com/lsst/obs_base/pull/399 https://github.com/lsst/meas_base/pull/200 https://github.com/lsst/obs_lsst/pull/369 https://github.com/lsst/obs_subaru/pull/400 Yusra AlSayyad – PRs mostly having to do with input ordering: https://github.com/lsst/pipe_drivers/pull/100 https://github.com/lsst/pipe_tasks/pull/599 https://github.com/lsst/ci_hsc_gen2/pull/43
 Reviewers Jim Bosch, Yusra AlSayyad [ jbosch, yusra ] Status In Progress [ 3 ] In Review [ 10004 ]
Hide
Jim Bosch added a comment -

I've taken a look at all of the PRs assigned to me; there are some comments that are worth addressing, but they should all be simple to address so I don't think I need to take another look unless you'd like me to.

The change in meas_base that switches the ForcedPhotCoadd is fine in terms of compatibility with existing repos - both dataset types already exist, and they have the same dimensions and storage class, so changing the name of the input is effectively just a change to the config defaults.

Show
Jim Bosch added a comment - I've taken a look at all of the PRs assigned to me; there are some comments that are worth addressing, but they should all be simple to address so I don't think I need to take another look unless you'd like me to. The change in meas_base that switches the ForcedPhotCoadd is fine in terms of compatibility with existing repos - both dataset types already exist, and they have the same dimensions and storage class, so changing the name of the input is effectively just a change to the config defaults.
 Reviewers Jim Bosch, Yusra AlSayyad [ jbosch, yusra ] Yusra AlSayyad [ yusra ]
Hide
Lauren MacArthur added a comment -

Thanks, Jim.  Another Jenkins with all your suggested changes is happy. Mergeing still awaits Yusra AlSayyads review.

Show
Lauren MacArthur added a comment - Thanks, Jim.  Another Jenkins with all your suggested changes is happy . Mergeing still awaits Yusra AlSayyad s review.
Hide

Primary comment is actually on the meas_base PR: the deepCoadd-> deepCoadd_calexp could go on its own ticket so that it's easier to just backport that.

I got confused that the getGen3ExposureId was for gen2 use only and why sometimes it need to be in a try block and sometimes not.

Also, was that ci_hsc_gen2 situation with detect_isPrimary. (i didn't understand the inline comment) Is that an oddity with the propagateVisitFlags, or is it always that hard to get primary sources these days?

Show
Yusra AlSayyad added a comment - - edited Primary comment is actually on the meas_base PR: the deepCoadd-> deepCoadd_calexp could go on its own ticket so that it's easier to just backport that. I got confused that the getGen3ExposureId was for gen2 use only and why sometimes it need to be in a try block and sometimes not. Also, was that ci_hsc_gen2 situation with detect_isPrimary. (i didn't understand the inline comment) Is that an oddity with the propagateVisitFlags, or is it always that hard to get primary sources these days?
 Status In Review [ 10004 ] Reviewed [ 10101 ]
Hide
Lauren MacArthur added a comment -

I tried to answer all your questions on the PRs. Let me know if I was able to make some semblance of sense of it all!

A new Jenkins is running.

Show
Lauren MacArthur added a comment - Separate ticket for deepCoadd -> deepCoadd_calexp is DM-33195 . I tried to answer all your questions on the PRs. Let me know if I was able to make some semblance of sense of it all! A new Jenkins is running .
Hide
Lauren MacArthur added a comment -

A final Jenkins was green (in addition to a paranoid post-merge Jenkins run on main). Merged and done!

Show
Lauren MacArthur added a comment - A final Jenkins was  green (in addition to a paranoid post-merge Jenkins run on main ). Merged and done!
 Resolution Done [ 10000 ] Status Reviewed [ 10101 ] Done [ 10002 ]
 Story Points 26
 Epic Link DM-30540 [ 511198 ] DM-30547 [ 511213 ]

#### People

Assignee:
Lauren MacArthur
Reporter:
Clare Saunders
Reviewers:
Watchers:
Clare Saunders, Dino Bektesevic, Fred Moolekamp, Jim Bosch, Lauren MacArthur, Yusra AlSayyad