Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-30284

Investigate differences in the coadd background between gen2 and gen3

    XMLWordPrintable

Details

    • Story
    • Status: Done
    • Resolution: Done
    • None
    • None
    • 26
    • Data Release Production
    • No

    Description

      While testing the streak masking code, I noticed that there are pretty significant differences between the background in gen2 and gen3 for the coadd in the attached image. The satellite trail visible in the gen2 version has also disappeared in the gen3 coadd.

      Attachments

        Issue Links

          Activity

            lauren Lauren MacArthur added a comment - - edited

            If the gen3 repo is using freshly made global SKY frames (as opposed to converted gen2 old versions), then this is entirely expected since the gen2 versions are way out of date. E.g.

            Currently used:

            vs. “freshly” made:

            lauren Lauren MacArthur added a comment - - edited If the gen3 repo is using freshly made global SKY frames (as opposed to converted gen2 old versions), then this is entirely expected since the gen2 versions are way out of date. E.g. Currently used: vs. “freshly” made:

            Ok, I think I have just ruled out the above conjecture.  I had a "guess" at the repos associated with the image csaunder attached to the ticket and, having created the two butlers, asked each which SKY frames they were using.  E.g.

            # Get SKY frame URI's from Clare's coadds
            visit, ccd, filter = 1228, 69, "HSC-I"
            rootDirGen2 = "/datasets/hsc/repo/rerun/private/csaunder/coadd_noMaskStreaks2"
            butlerGen2 = dafPersist.Butler(rootDirGen2)
            rootDirGen3 = "/repo/main"
            butlerGen3 = Butler(rootDirGen3, collections="u/csaunder/coadd_test", instrument="HSC")
            fileNameGen2 = butlerGen2.getUri("sky", visit=visit, ccd=ccd)
            fileNameGen3 = butlerGen3.getURI("sky", visit=visit, detector=ccd)
            fileNameGen3 = str(fileNameGen3).replace("file://", "")
            print("fileNameGen2 =", fileNameGen2)
            print("fileNameGen3 =", fileNameGen3)
             
            fileNameGen2 = /datasets/hsc/repo/CALIB/SKY/2014-07-14/HSC-I/SKY-2014-07-14-HSC-I-069.fits
            fileNameGen3 = /datasets/hsc/calib/20180117/SKY/2014-07-14/HSC-I/SKY-2014-07-14-HSC-I-069.fits
            

            These two files are indeed identical, so the SKY frames going into coaddition appear not to be the issue.

            lauren Lauren MacArthur added a comment - Ok, I think I have just ruled out the above conjecture.  I had a "guess" at the repos associated with the image csaunder  attached to the ticket and, having created the two butlers, asked each which SKY frames they were using.  E.g. # Get SKY frame URI's from Clare's coadds visit, ccd, filter = 1228 , 69 , "HSC-I" rootDirGen2 = "/datasets/hsc/repo/rerun/private/csaunder/coadd_noMaskStreaks2" butlerGen2 = dafPersist.Butler(rootDirGen2) rootDirGen3 = "/repo/main" butlerGen3 = Butler(rootDirGen3, collections = "u/csaunder/coadd_test" , instrument = "HSC" ) fileNameGen2 = butlerGen2.getUri( "sky" , visit = visit, ccd = ccd) fileNameGen3 = butlerGen3.getURI( "sky" , visit = visit, detector = ccd) fileNameGen3 = str (fileNameGen3).replace( "file://" , "") print ( "fileNameGen2 =" , fileNameGen2) print ( "fileNameGen3 =" , fileNameGen3)   fileNameGen2 = / datasets / hsc / repo / CALIB / SKY / 2014 - 07 - 14 / HSC - I / SKY - 2014 - 07 - 14 - HSC - I - 069.fits fileNameGen3 = / datasets / hsc / calib / 20180117 / SKY / 2014 - 07 - 14 / HSC - I / SKY - 2014 - 07 - 14 - HSC - I - 069.fits These two files are indeed identical, so the SKY frames going into coaddition appear not to be the issue.

            I went ahead and tried making gen2 and gen3 warps.  I used the RC2 w_2021_18 reruns as "input" to the task and, indeed, the problem is coming in during this phase.  The following plots the gen2 and gen3 warps for tract=9813 patch=4 (or 4,0), visit =19660 and their diff:

            and this is the SKY frame associated with the lower CCD section:

            which shows the same pattern as the difference image (rightmost panel above). So it seems there is a difference in the application of the SKY frames. I'll look further into this.

            lauren Lauren MacArthur added a comment - I went ahead and tried making gen2 and gen3 warps.  I used the RC2 w_2021_18 reruns as "input" to the task and, indeed, the problem is coming in during this phase.  The following plots the gen2 and gen3 warps for tract=9813 patch=4 (or 4,0), visit =19660 and their diff: and this is the SKY frame associated with the lower CCD section: which shows the same pattern as the difference image (rightmost panel above). So it seems there is a difference in the application of the SKY frames. I'll look further into this.

            Ok, so the first difference I'm seeing is actually in the skyCorr images, which czw already alluded to on DM-22534.  Here is an example difference image of the skyCorr.getImage() arrays of gen2 vs. gen3 for the w_2021_18 RC2 runs:

            These files are of type lsst.afw.math._backgroundList.BackgroundList and have 7 individual background entries (these all get summed when calling a getImage() on the skyCorr dataset).  It looks like the first 4 are identical, but those in 5-7 differ.  The first four come from the calexpBackground dataset and those are looking identical between gen2 & gen3, so this is coming in during skyCorrectionTask. The logs are a bit hard to parse, but it does look like the reported "Sky frame scale"s are indeed different for gen2 vs. gen3. I think I'll have to run skyCorrection manually to glean any more information (a really rough guess is that a difference in the order of the detectors can somehow make a small difference?  Other ideas welcome!)

            I also am not sure if this level of difference can compound all the way to the difference seen in the coadd backgrounds in the above attachment --this doesn't seem to be a big enough effect nor have enough structure that is already seen in the warps – but it would be nice to rule it out.

            lauren Lauren MacArthur added a comment - Ok, so the first difference I'm seeing is actually in the skyCorr images, which czw already alluded to on DM-22534 .  Here is an example difference image of the skyCorr.getImage() arrays of gen2 vs. gen3 for the w_2021_18 RC2 runs: These files are of type lsst.afw.math._backgroundList.BackgroundList and have 7 individual background entries (these all get summed when calling a getImage() on the skyCorr dataset).  It looks like the first 4 are identical, but those in 5-7 differ.  The first four come from the calexpBackground dataset and those are looking identical between gen2 & gen3, so this is coming in during skyCorrectionTask . The logs are a bit hard to parse, but it does look like the reported "Sky frame scale"s are indeed different for gen2 vs. gen3. I think I'll have to run skyCorrection manually to glean any more information (a really rough guess is that a difference in the order of the detectors can somehow make a small difference?  Other ideas welcome!) I also am not sure if this level of difference can compound all the way to the difference seen in the coadd backgrounds in the above attachment --this doesn't seem to be a big enough effect nor have enough structure that is already seen in the warps – but it would be nice to rule it out.

            Thanks for looking into this lauren. Sorry for leaving the source of the coadds a mystery. They are just from RC2 weeklies:

            butler_gen2 = Gen2Butler('/datasets/hsc/repo/rerun/RC/w_2021_14/DM-29519')
            butler_gen3 = Butler('/repo/main', collections="HSC/runs/RC2/w_2021_14/DM-29528")
            gen2_dataId = {'tract':9813, 'patch':"4,8", 'filter':'HSC-I'}
            gen3_dataId = {'tract': 9813, 'patch': 76, 'band': 'i'}
            gen2Coadd = butler_gen2.get('deepCoadd', dataId=gen2_dataId)
            gen3Coadd = butler_gen3.get('deepCoadd', dataId=gen3_dataId)
            

            However, using these two butlers, I did find the same results as lauren for the SKY frames used for visit=1228, ccd=69:

            fileNameGen2 = /datasets/hsc/repo/CALIB/SKY/2014-07-14/HSC-I/SKY-2014-07-14-HSC-I-069.fits
            fileNameGen3 = /datasets/hsc/calib/20180117/SKY/2014-07-14/HSC-I/SKY-2014-07-14-HSC-I-069.fits
            

            csaunder Clare Saunders added a comment - Thanks for looking into this lauren . Sorry for leaving the source of the coadds a mystery. They are just from RC2 weeklies: butler_gen2 = Gen2Butler( '/datasets/hsc/repo/rerun/RC/w_2021_14/DM-29519' ) butler_gen3 = Butler( '/repo/main' , collections= "HSC/runs/RC2/w_2021_14/DM-29528" ) gen2_dataId = { 'tract' : 9813 , 'patch' : "4,8" , 'filter' : 'HSC-I' } gen3_dataId = { 'tract' : 9813 , 'patch' : 76 , 'band' : 'i' } gen2Coadd = butler_gen2.get( 'deepCoadd' , dataId=gen2_dataId) gen3Coadd = butler_gen3.get( 'deepCoadd' , dataId=gen3_dataId) However, using these two butlers, I did find the same results as lauren for the SKY frames used for visit=1228, ccd=69: fileNameGen2 = /datasets/hsc/repo/CALIB/SKY/ 2014 - 07 - 14 /HSC-I/SKY- 2014 - 07 - 14 -HSC-I- 069 .fits fileNameGen3 = /datasets/hsc/calib/ 20180117 /SKY/ 2014 - 07 - 14 /HSC-I/SKY- 2014 - 07 - 14 -HSC-I- 069 .fits
            lauren Lauren MacArthur added a comment - - edited

            I'm back to this one, but now looking at DC2 processing which represents a more "vanilla" coaddition, i.e. no external (fgcm/jointcal) global calibrations in play. The coadds are not bitwise identical between gen2 & gen3, so "something" is happening differently. First checks have all proven bitwise identical, including:

            • comparison of directWarps: image, variance, mask planes, photoCalibs, PSFs, WCSs
            • comparison of psfMatchedWarps: image, variance, mask planes, photoCalibs, PSFs, WCSs

            Moving on to looking at the "deepCoadd_calexp_background" dataset...yep, they are slightly different (at the ~4% level):
            gen2: -0.00567247 vs gen3: -0.00545275 (diff = 0.00021972)

            Looking more closely at the coadd image difference, it is indeed roughly consistent with an offset equivalent to the above background level difference):

            imageDiff minmax: -0.00024414062 -0.00012207031 median: -0.00021971297

            I've been looking at the configs, but don't see any smoking guns.

            lauren Lauren MacArthur added a comment - - edited I'm back to this one, but now looking at DC2 processing which represents a more "vanilla" coaddition, i.e. no external (fgcm/jointcal) global calibrations in play. The coadds are not bitwise identical between gen2 & gen3, so "something" is happening differently. First checks have all proven bitwise identical, including: comparison of directWarps: image, variance, mask planes, photoCalibs, PSFs, WCSs comparison of psfMatchedWarps: image, variance, mask planes, photoCalibs, PSFs, WCSs Moving on to looking at the "deepCoadd_calexp_background" dataset...yep, they are slightly different (at the ~4% level): gen2: -0.00567247 vs gen3: -0.00545275 (diff = 0.00021972) Looking more closely at the coadd image difference, it is indeed roughly consistent with an offset equivalent to the above background level difference): imageDiff minmax: -0.00024414062 -0.00012207031 median: -0.00021971297 I've been looking at the configs, but don't see any smoking guns.
            lauren Lauren MacArthur added a comment - - edited

            Ok, I think I have narrowed this down to a random number seed difference between gen2 & gen3.  It is happening in the dynamic detection stage on the coadds, which uses an expId as a seed, and this expId gets computed differently in the two middlewares.  Ideally, the gen2 computation could be changed to match the gen3 one.  The relevant code that would need changing is:

            obs_lsst:
            https://github.com/lsst/obs_lsst/blob/master/python/lsst/obs/lsst/lsstCamMapper.py#L256-L284

            obs_subaru:
            https://github.com/lsst/obs_subaru/blob/master/python/lsst/obs/hsc/hscMapper.py#L288-L307

            jbosch, if you (or anyone else watching!) happen to know how to adapt these gen2 mapper functions to mimic the gen3 equivalent, some guidance would be really appreciated!

            The setting of the expIds happens here:
            https://github.com/lsst/pipe_tasks/blob/master/python/lsst/pipe/tasks/multiBand.py#L272-L296

            lauren Lauren MacArthur added a comment - - edited Ok, I think I have narrowed this down to a random number seed difference between gen2 & gen3.  It is happening in the dynamic detection stage on the coadds, which uses an expId as a seed, and this expId gets computed differently in the two middlewares.  Ideally, the gen2 computation could be changed to match the gen3 one.  The relevant code that would need changing is: obs_lsst : https://github.com/lsst/obs_lsst/blob/master/python/lsst/obs/lsst/lsstCamMapper.py#L256-L284 obs_subaru : https://github.com/lsst/obs_subaru/blob/master/python/lsst/obs/hsc/hscMapper.py#L288-L307 jbosch , if you (or anyone else watching!) happen to know how to adapt these gen2 mapper functions to mimic the gen3 equivalent, some guidance would be really appreciated! The setting of the expIds happens here: https://github.com/lsst/pipe_tasks/blob/master/python/lsst/pipe/tasks/multiBand.py#L272-L296
            jbosch Jim Bosch added a comment -

            This is going to be a bit messy; to replicate the Gen3 behavior I think we really need a SkyMap object (at least), and we can't easily get one of those to the Gen2 Mapper methods.

            My plan is to put a new utility method on the SkyMap object that will use Gen3 code under the hood, but accept Gen2 coadd data IDs.  We'll then modify the Gen2 tasks that use the Gen2 "coaddExposureId" etc. to read in the SkyMap (if they don't already) and call that method.

            The big catch is that we'll need to somehow standardize the heterogeneous Gen2 coadd "filter" data ID values into the "band" value that Gen3 uses.  lauren I'm hoping you might already have some ideas on how to do that from e.g. pipe_analysis needs, and can take care of that problem while I write the SkyMap method.  The solution doesn't have to be pretty, because we can stuff it into the Gen2 tasks and hold our noses until they're gone.

            jbosch Jim Bosch added a comment - This is going to be a bit messy; to replicate the Gen3 behavior I think we really need a SkyMap object (at least), and we can't easily get one of those to the Gen2 Mapper methods. My plan is to put a new utility method on the SkyMap object that will use Gen3 code under the hood, but accept Gen2 coadd data IDs.  We'll then modify the Gen2 tasks that use the Gen2 "coaddExposureId" etc. to read in the SkyMap (if they don't already) and call that method. The big catch is that we'll need to somehow standardize the heterogeneous Gen2 coadd "filter" data ID values into the "band" value that Gen3 uses.  lauren I'm hoping you might already have some ideas on how to do that from e.g. pipe_analysis needs, and can take care of that problem while I write the SkyMap method.  The solution doesn't have to be pretty, because we can stuff it into the Gen2 tasks and hold our noses until they're gone.
            jbosch Jim Bosch added a comment -

            Ok, I've added a completely untested pack_data_id method to the tickets/DM-30284 branch of skymap; I'm hoping you can take it from here, but I'm happy to help (especially if it doesn't work).

            https://github.com/lsst/skymap/pull/54/files

            jbosch Jim Bosch added a comment - Ok, I've added a completely untested pack_data_id method to the tickets/ DM-30284 branch of skymap; I'm hoping you can take it from here, but I'm happy to help (especially if it doesn't work). https://github.com/lsst/skymap/pull/54/files

            Awesome, thanks so much, Jim! I’ll take it for a spin and see if I can get it working.

            lauren Lauren MacArthur added a comment - Awesome, thanks so much, Jim! I’ll take it for a spin and see if I can get it working.

            Ok, so getting all the exposure and source (factory) ids in sync between the two middlewares went a bit further down a rabbit hole than I was expecting (note the 7 branches –so far– attached to this ticket!), but I've now got that all working for LSSTCam-imSim (and HSC is in progress, but very close). I now get gen2/gen3 parity in everything in coaddition up until the mergeDetections stage EXCEPT for small differences in the PSFs attached to the deepCoadd (the pre-_calexp one). Here is an example (note the slightly obscured 1e-17, & 1e-14 scales on the colorbars of the difference panels):

            These are tiny differences, and I have no idea where the difference is coming in (ideas welcome!), but this is the only gen2/gen3 difference I have noted at this stage, and I do start to see differences in the downstream tasks, starting with the deblender. Examples of some of rolled-up difference stats I print to the logs when running my script that makes the catalog comparisons (looping through all columns, so any not listed here are identical between the two middlewares – modulo deblend_error and deblend_modelType as these are string columns and can be read in with afwTable...) are:

            compareCoadd INFO: Comparing gen2 vs. gen3 deepCoadd_deblendedFlux catalogs
            compareCoadd WARN: ...patch 31: deblend_runtime Absolute diff : mean: 18.428  min: 0.000  max 10529.36
            compareCoadd WARN: ...patch 31: deblend_iterations Absolute diff : mean: 0.200  min: 0.000  max 156.000
            compareCoadd WARN: ...patch 31: deblend_peak_instFlux Absolute diff : mean: 0.004  min: 0.000  max 10.238
            compareCoadd WARN: ...patch 31: deblend_scarletFlux Absolute diff : mean: 0.305  min: 0.000  max 667.090
            compareCoadd WARN: ...patch 31: deblend_logL Absolute diff : mean: 11.287  min: 0.000  max 99287.750
            

            fred3m, I alluded to this in a drp status report. Does it make any sense to you that such tiny PSF differences could lead to these levels of differences in some of the scarlet outputs (I don't think the runtime one matters, but the others probably do...)?

            It took me "a minute" to notice these small PSF differneces as, when one tries to plot the PSF images with the oh-so-handy afwDiplay tools, one encounters the following:

            TypeError: getZScale(): incompatible function arguments. The following argument types are supported:
                1. (image: lsst.afw.image.image.ImageU, nsamples: int = 1000, contrast: float = 0.25) -> Tuple[float, float]
                2. (image: lsst.afw.image.image.ImageF, nsamples: int = 1000, contrast: float = 0.25) -> Tuple[float, float]
             
            Invoked with: lsst.afw.image.image.ImageD=[[0. 0. 0. ... 0. 0. 0.]
             [0. 0. 0. ... 0. 0. 0.]
             [0. 0. 0. ... 0. 0. 0.]
             ...
             [0. 0. 0. ... 0. 0. 0.]
             [0. 0. 0. ... 0. 0. 0.]
             [0. 0. 0. ... 0. 0. 0.]], bbox=(minimum=(13971, 17971), maximum=(14029, 18029)), 1000, 0.25
            

            i.e. the PSFs are stored as ImageD and that is not an accepted type in getZScale(). So, rather than messing around with other scalings, I first just casted the PSF images to ImageFs (using getPsf().computeImage(position).convertFloat()), but then they do compare identical!  However, when I ran my bit-by-bit comparisons of "all the things", I noticed the wee PSF differences, so the above plots used some ugly linear+min+max scalings.

            lauren Lauren MacArthur added a comment - Ok, so getting all the exposure and source (factory) ids in sync between the two middlewares went a bit further down a rabbit hole than I was expecting (note the 7 branches –so far– attached to this ticket!), but I've now got that all working for LSSTCam-imSim (and HSC is in progress, but very close). I now get gen2/gen3 parity in everything in coaddition up until the mergeDetections stage EXCEPT for small differences in the PSFs attached to the deepCoadd (the pre- _calexp one ). Here is an example (note the slightly obscured 1e-17, & 1e-14 scales on the colorbars of the difference panels): These are tiny differences, and I have no idea where the difference is coming in (ideas welcome!), but this is the only gen2/gen3 difference I have noted at this stage, and I do start to see differences in the downstream tasks, starting with the deblender. Examples of some of rolled-up difference stats I print to the logs when running my script that makes the catalog comparisons (looping through all columns, so any not listed here are identical between the two middlewares – modulo deblend_error and deblend_modelType as these are string columns and can be read in with afwTable ...) are: compareCoadd INFO: Comparing gen2 vs. gen3 deepCoadd_deblendedFlux catalogs compareCoadd WARN: ...patch 31 : deblend_runtime Absolute diff : mean: 18.428 min : 0.000 max 10529.36 compareCoadd WARN: ...patch 31 : deblend_iterations Absolute diff : mean: 0.200 min : 0.000 max 156.000 compareCoadd WARN: ...patch 31 : deblend_peak_instFlux Absolute diff : mean: 0.004 min : 0.000 max 10.238 compareCoadd WARN: ...patch 31 : deblend_scarletFlux Absolute diff : mean: 0.305 min : 0.000 max 667.090 compareCoadd WARN: ...patch 31 : deblend_logL Absolute diff : mean: 11.287 min : 0.000 max 99287.750 fred3m , I alluded to this in a drp status report. Does it make any sense to you that such tiny PSF differences could lead to these levels of differences in some of the scarlet outputs (I don't think the runtime one matters, but the others probably do...)? It took me "a minute" to notice these small PSF differneces as, when one tries to plot the PSF images with the oh-so-handy afwDiplay tools, one encounters the following: TypeError: getZScale(): incompatible function arguments. The following argument types are supported: 1. (image: lsst.afw.image.image.ImageU, nsamples: int = 1000 , contrast: float = 0.25 ) - > Tuple [ float , float ] 2. (image: lsst.afw.image.image.ImageF, nsamples: int = 1000 , contrast: float = 0.25 ) - > Tuple [ float , float ]   Invoked with: lsst.afw.image.image.ImageD = [[ 0. 0. 0. ... 0. 0. 0. ] [ 0. 0. 0. ... 0. 0. 0. ] [ 0. 0. 0. ... 0. 0. 0. ] ... [ 0. 0. 0. ... 0. 0. 0. ] [ 0. 0. 0. ... 0. 0. 0. ] [ 0. 0. 0. ... 0. 0. 0. ]], bbox = (minimum = ( 13971 , 17971 ), maximum = ( 14029 , 18029 )), 1000 , 0.25 i.e. the PSFs are stored as ImageD and that is not an accepted type in getZScale() . So, rather than messing around with other scalings, I first just casted the PSF images to ImageFs (using getPsf().computeImage(position).convertFloat()) , but then they do compare identical!  However, when I ran my bit-by-bit comparisons of "all the things", I noticed the wee PSF differences, so the above plots used some ugly linear+min+max scalings.

            Does it make any sense to you that such tiny PSF differences could lead to these levels of differences in some of the scarlet outputs (I don't think the runtime one matters, but the others probably do...)?

            Yes and no. The most informative column to look at is probably the `deblend_iterations` difference because I think that it explains the others. There is a convergence check after each iteration and your results show that on average most of the models converge at the same iteration with the difference PSFs, with many of them not changing at all. I'd be curious to see a histogram of the difference in iterations, without the absolute value, because as long as the vast majority of blends are converging at the same iteration, or maybe even +/- 1, then I'd say that everything is working as expected. Without that plot I'm just speculating that there are only a few catastrophic cases with blends that barely converge at lower iterations and then resizing, which occurs every 10 iterations, could potentially lead to much larger numbers of iterations. With longer iteration times there will also be slight changes in the models, with slightly better models for more iterations but (if the convergence check is set appropriately) there should rarely be a huge difference in the measured values, but it does happen. Since you just print the difference and not the percentage (difference/value) it's hard for me to know whether or not a substantial number of (or any) blends have measurements that are significantly different.

            Anytime we have a new scarlet release we see similar types of issues, where the number of iterations and exact values of measurements may differ on a single blend even though overall (if we accept the changes) the results are improved.

            These cases are rare but are also an unfortunate consequence of implementing a convergence test. If anyone has suggestions for how to make this more robust I'm all ears.

            fred3m Fred Moolekamp added a comment - Does it make any sense to you that such tiny PSF differences could lead to these levels of differences in some of the scarlet outputs (I don't think the runtime one matters, but the others probably do...)? Yes and no. The most informative column to look at is probably the `deblend_iterations` difference because I think that it explains the others. There is a convergence check after each iteration and your results show that on average most of the models converge at the same iteration with the difference PSFs, with many of them not changing at all. I'd be curious to see a histogram of the difference in iterations, without the absolute value, because as long as the vast majority of blends are converging at the same iteration, or maybe even +/- 1, then I'd say that everything is working as expected. Without that plot I'm just speculating that there are only a few catastrophic cases with blends that barely converge at lower iterations and then resizing, which occurs every 10 iterations, could potentially lead to much larger numbers of iterations. With longer iteration times there will also be slight changes in the models, with slightly better models for more iterations but (if the convergence check is set appropriately) there should rarely be a huge difference in the measured values, but it does happen. Since you just print the difference and not the percentage (difference/value) it's hard for me to know whether or not a substantial number of (or any) blends have measurements that are significantly different. Anytime we have a new scarlet release we see similar types of issues, where the number of iterations and exact values of measurements may differ on a single blend even though overall (if we accept the changes) the results are improved. These cases are rare but are also an unfortunate consequence of implementing a convergence test. If anyone has suggestions for how to make this more robust I'm all ears.

            A few updates for this ticket.

            First, I believe the scarlet deblender issues stemmed from a mild dependence on the input order of the bands (hence it not showing up in a test I did with the old deblender, which operates on a single band at a time). This is solved/made reproducible by sorting the bands (alphabetically, for no obviously "better" ordering) up front in the runQuantum method.

            Second, I believe the PSF differences are also due to a mild dependence on the input order in assembleCoadd. We were already sorting by visit, but an additional sorting of the detectors within a visit was required.

            Third, I have just confirmed that, once again, they skyCorr differences were due to a mild input order dependence. I happen to have the plots handy for this one:

            Gen2 vs. Gen3 for w_2021_42 runs:

            Gen2 vs. new sorted Gen3 skyCorr run (using collection HSC/runs/RC2/w_2021_42/DM-32248 as input):

            I put in a print statement while testing and for the Gen3 run, and example initial ordering was:

            lauren@lsst-devl01:~/tickets/DM-30284/submit/u/lauren/DM-30284_skyCorr/20211201T022849Z/jobs/skyCorr $ grep detectorOrder 1228/240_skyCorr_1228_.2985264.out 
            Gen3 detectorOrder =  [56, 1, 97, 82, 5, 8, 28, 99, 36, 96, 66, 48, 54, 46, 50, 93, 6, 61, 35, 100, 68, 0, 57, 74, 87, 7, 71, 12, 24, 42, 49, 25, 72, 29, 102, 62, 53, 78, 58, 44, 2, 59, 15, 34, 40, 33, 38, 79, 95, 89, 81, 86, 73, 22, 70, 103, 32, 39, 16, 20, 94, 55, 45, 47, 64, 14, 91, 67, 51, 98, 75, 76, 26, 52, 77, 85, 60, 65, 13, 80, 101, 11, 17, 92, 41, 21, 23, 37, 69, 31, 3, 19, 63, 30, 88, 83, 4, 10, 90, 84, 18, 43, 27]
            Gen3 detectorOrder sorted =  [0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103]
            

            Things get somewhat muddled going forward on RC2 runs because of the influence of fgcm and jointcal differences, but I am now attempting to run makeWarp with these new skyCorr outputs to see how much difference it makes on the warps.

            lauren Lauren MacArthur added a comment - A few updates for this ticket. First, I believe the scarlet deblender issues stemmed from a mild dependence on the input order of the bands (hence it not showing up in a test I did with the old deblender, which operates on a single band at a time). This is solved/made reproducible by sorting the bands (alphabetically, for no obviously "better" ordering) up front in the runQuantum method. Second, I believe the PSF differences are also due to a mild dependence on the input order in assembleCoadd . We were already sorting by visit, but an additional sorting of the detectors within a visit was required. Third, I have just confirmed that, once again, they skyCorr differences were due to a mild input order dependence. I happen to have the plots handy for this one: Gen2 vs. Gen3 for w_2021_42 runs: Gen2 vs. new sorted Gen3 skyCorr run (using collection HSC/runs/RC2/w_2021_42/ DM-32248 as input): I put in a print statement while testing and for the Gen3 run, and example initial ordering was: lauren@lsst - devl01:~ / tickets / DM - 30284 / submit / u / lauren / DM - 30284_skyCorr / 20211201T022849Z / jobs / skyCorr $ grep detectorOrder 1228 / 240_skyCorr_1228_ . 2985264.out Gen3 detectorOrder = [ 56 , 1 , 97 , 82 , 5 , 8 , 28 , 99 , 36 , 96 , 66 , 48 , 54 , 46 , 50 , 93 , 6 , 61 , 35 , 100 , 68 , 0 , 57 , 74 , 87 , 7 , 71 , 12 , 24 , 42 , 49 , 25 , 72 , 29 , 102 , 62 , 53 , 78 , 58 , 44 , 2 , 59 , 15 , 34 , 40 , 33 , 38 , 79 , 95 , 89 , 81 , 86 , 73 , 22 , 70 , 103 , 32 , 39 , 16 , 20 , 94 , 55 , 45 , 47 , 64 , 14 , 91 , 67 , 51 , 98 , 75 , 76 , 26 , 52 , 77 , 85 , 60 , 65 , 13 , 80 , 101 , 11 , 17 , 92 , 41 , 21 , 23 , 37 , 69 , 31 , 3 , 19 , 63 , 30 , 88 , 83 , 4 , 10 , 90 , 84 , 18 , 43 , 27 ] Gen3 detectorOrder sorted = [ 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17 , 18 , 19 , 20 , 21 , 22 , 23 , 24 , 25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 , 36 , 37 , 38 , 39 , 40 , 41 , 42 , 43 , 44 , 45 , 46 , 47 , 48 , 49 , 50 , 51 , 52 , 53 , 54 , 55 , 56 , 57 , 58 , 59 , 60 , 61 , 62 , 63 , 64 , 65 , 66 , 67 , 68 , 69 , 70 , 71 , 72 , 73 , 74 , 75 , 76 , 77 , 78 , 79 , 80 , 81 , 82 , 83 , 84 , 85 , 86 , 87 , 88 , 89 , 90 , 91 , 92 , 93 , 94 , 95 , 96 , 97 , 98 , 99 , 100 , 101 , 102 , 103 ] Things get somewhat muddled going forward on RC2 runs because of the influence of fgcm and jointcal differences, but I am now attempting to run makeWarp with these new skyCorr outputs to see how much difference it makes on the warps.
            lauren Lauren MacArthur added a comment - - edited

            Had a look at the warps with the above fix and the difference is subtle, as expected.  So I dug further into the application of the skyCorr frames in the makeWarp task and, it seems the sky correction is not actually getting applied to the calexp image array.  I believe the problem originated in this commit: back in March (first tag is 22.0.0 / w_2021_10). While I’m not totally clear on the nuances of when object copy vs. reference is effected, in the above commit the calibrations are being directly applied to calexp.maskedImage, whereas the skyCorr gets applied to mi, which was set to mi = calexp.maskedImage. If mi was just a reference, I would’ve expected the two to be linked and all would be fine, but that appears not to be the case, and the calexp.maskedImage does not get the skyCorr applied.

            The following plots demonstrate the difference if I replace mi with calexp.maskedImage where the skyCorr is supposed to be getting applied in the Gen3 code path:

            Gen3 w42 vs. Gen3 this ticket branch:

            Gen3 w42 vs. Gen2 w42:

            Gen3 this ticket branch vs. Gen2 w42:

            The remaining difference is likely attributed to the fact that we don't quite have gen2/gen3 parity for fgcm and jointcal, the difference of which look like this for this visit (note the scales are mmag and mas):

            lauren Lauren MacArthur added a comment - - edited Had a look at the warps with the above fix and the difference is subtle, as expected.  So I dug further into the application of the skyCorr frames in the makeWarp task and, it seems the sky correction is not actually getting applied to the calexp image array.  I believe the problem originated in this commit: back in March (first tag is 22.0.0 / w_2021_10 ). While I’m not totally clear on the nuances of when object copy vs. reference is effected, in the above commit the calibrations are being directly applied to calexp.maskedImage , whereas the skyCorr gets applied to mi , which was set to mi = calexp.maskedImage . If mi was just a reference, I would’ve expected the two to be linked and all would be fine, but that appears not to be the case, and the calexp.maskedImage does not get the skyCorr applied. The following plots demonstrate the difference if I replace mi with calexp.maskedImage where the skyCorr is supposed to be getting applied in the Gen3 code path: Gen3 w42 vs. Gen3 this ticket branch: Gen3 w42 vs. Gen2 w42: Gen3 this ticket branch vs. Gen2 w42: The remaining difference is likely attributed to the fact that we don't quite have gen2/gen3 parity for fgcm and jointcal , the difference of which look like this for this visit (note the scales are mmag and mas):
            jbosch Jim Bosch added a comment - - edited

            I think the problem is in this line:

            https://github.com/lsst/pipe_tasks/commit/5f2c73bc6d2717543350465c40f24c99390cb78d#diff-9f3d28e8a17600f68f8da69da616a3bf78b45600e94e7a988f0292b7fe2b248fR869

            The property setter invokes Exposure.setMaskedImage, which invokes MaskedImage::operator=, which reassigns the data pointer inside the target MaskedImage to point to the same data pointer as the source MaskedImage. That disconnects the Exposure from any existing views of its MaskedImage, which still have the old data pointer. Note that this is exactly what would happen in pure Python code with Numpy array attributes of a simple class:

            class Holder:
                def __init__(self):
                    self.array = np.zeros(4, dtype=int)
             
            h = Holder()
            a = h.array
            h.array = np.ones(4, dtype=int)
            assert np.all(a != h.array)
            

            But it's also the kind of code where everyone who writes it expects it to do what they want, even if different people want and expect different things.

            So, I don't think we want to change MaskedImage::operator=, and I don't think I want to deal with the potential pain of changing the behavior of Exposure.setMaskedImage, but we should add a bool deep=False argument to make it possible to change that behavior. We can then use deep=True in pipe_tasks to fix this problem; there is probably a way to do what we want now, but I haven't been able to quickly spot it by scanning the pybind11 source. I think we probably want to remove the setter for this property here:

            https://github.com/lsst/afw/blob/main/python/lsst/afw/image/exposure/_exposure.cc#L96

            To force people to at least look at setExposure and see that it has a bool deep option (even if that has a default) in the hopes of heading off similar problems in the future.

            jbosch Jim Bosch added a comment - - edited I think the problem is in this line: https://github.com/lsst/pipe_tasks/commit/5f2c73bc6d2717543350465c40f24c99390cb78d#diff-9f3d28e8a17600f68f8da69da616a3bf78b45600e94e7a988f0292b7fe2b248fR869 The property setter invokes Exposure.setMaskedImage , which invokes MaskedImage::operator= , which reassigns the data pointer inside the target MaskedImage to point to the same data pointer as the source MaskedImage . That disconnects the Exposure from any existing views of its MaskedImage , which still have the old data pointer. Note that this is exactly what would happen in pure Python code with Numpy array attributes of a simple class: class Holder: def __init__(self): self.array = np.zeros(4, dtype=int)   h = Holder() a = h.array h.array = np.ones(4, dtype=int) assert np.all(a != h.array) But it's also the kind of code where everyone who writes it expects it to do what they want, even if different people want and expect different things. So, I don't think we want to change MaskedImage::operator= , and I don't think I want to deal with the potential pain of changing the behavior of Exposure.setMaskedImage , but we should add a bool deep=False argument to make it possible to change that behavior. We can then use deep=True in pipe_tasks to fix this problem; there is probably a way to do what we want now, but I haven't been able to quickly spot it by scanning the pybind11 source. I think we probably want to remove the setter for this property here: https://github.com/lsst/afw/blob/main/python/lsst/afw/image/exposure/_exposure.cc#L96 To force people to at least look at setExposure and see that it has a bool deep option (even if that has a default) in the hopes of heading off similar problems in the future.

            With all the fixes in, I have run the ci_hsc's locally (having turned off the CiLimits in the Gen3 version to get consistent processing with Gen2)...drumroll....difference between the coadd calexps:

            I have run the catalog comparisons now and they are all come out identical with a few exceptions.
            Minor:

            • the coadd source IDs, but we don't care as these don't affect any algorithms (it's just the exposureId's that get used to set rng seeds, and those are synched). Also, I think these come out equal when using the hsc_rings_v1 skymap as this is what the Gen2-hscMapper is tuned to (I will double-check that).
            • deblend_runtime & modelfit_CModel_[initial/exp/dev]_time, which also seem to be of no consequence).

            Slightly Less Minor (in deepCoadd_meas):

            • a single entry (out of 9485) has slightly (at the 10e-11 pixel level) different numbers for base_SdssShape_[xx/yy/xy]
            • two entries have small (absolute diff at the 10-12 nJ[?] level) for base_CircularApertureFlux_[35/50/70]_0_instFlux

            Major:

            • the forced_src catalogs have differences in all quantities!@#!

            This last one was another rabbit-hole as I started looking for ordering/expId differences. Failing to find any, I started brute-force printing stuff out. Very long story short, the issue is that in Gen2 the input exposure to forcedPhotCoadd is deepCoadd_calexp (see here) whereas in Gen3 it is deepCoadd (see here)!!!

            I confirmed this by changing the input exposure in Gen2 to be just deepCoadd (seemed easier!) and, indeed, when the input exposures agree, so do the output catalogs (but with the "slightly-less-minor" differences propagated at the same very low levels).

            So...I believe this means that all Gen3 runs to date have been operating on the "wrong" input exposures (I'm assuming the Gen2 version has it right, although I'm sure there are some science cases where these would be preferred  ). My Gen3-repo-foo is still pretty weak, but I'm wondering if this has consequences for current repos that have the "wrong" dataType for that connection defined (jbosch)? I think ci_hsc_gen3 should be ok, though, since it's made fresh each time? I'm checking on that locally and a new Jenkins is running.

            lauren Lauren MacArthur added a comment - With all the fixes in, I have run the ci_hsc's locally (having turned off the CiLimits in the Gen3 version to get consistent processing with Gen2)...drumroll....difference between the coadd calexps: I have run the catalog comparisons now and they are all come out identical with a few exceptions. Minor: the coadd source IDs, but we don't care as these don't affect any algorithms (it's just the exposureId's that get used to set rng seeds, and those are synched). Also, I think these come out equal when using the hsc_rings_v1 skymap as this is what the Gen2- hscMapper is tuned to (I will double-check that). deblend_runtime & modelfit_CModel_ [initial/exp/dev] _time , which also seem to be of no consequence). Slightly Less Minor (in deepCoadd_meas ): a single entry (out of 9485) has slightly (at the 10e-11 pixel level) different numbers for base_SdssShape_ [xx/yy/xy] two entries have small (absolute diff at the 10-12 nJ [?] level) for  base_CircularApertureFlux_ [35/50/70] _0_instFlux Major: the forced_src catalogs have differences in all quantities!@#! This last one was another rabbit-hole as I started looking for ordering/expId differences. Failing to find any, I started brute-force printing stuff out. Very long story short, the issue is that in Gen2 the input exposure to forcedPhotCoadd  is deepCoadd_calexp (see here ) whereas in Gen3 it is deepCoadd (see here )!!! I confirmed this by changing the input exposure in Gen2 to be just deepCoadd (seemed easier!) and, indeed, when the input exposures agree, so do the output catalogs (but with the "slightly-less-minor" differences propagated at the same very low levels). So...I believe this means that all Gen3 runs to date have been operating on the "wrong" input exposures (I'm assuming the Gen2 version has it right, although I'm sure there are some science cases where these would be preferred  ). My Gen3-repo-foo is still pretty weak, but I'm wondering if this has consequences for current repos that have the "wrong" dataType for that connection defined ( jbosch )? I think ci_hsc_gen3 should be ok, though, since it's made fresh each time? I'm checking on that locally and a new Jenkins is running .

            And with that, this is ready for review. I'd like you both to have a look and sign off. Feel free to decide for yourselves, but I think the following distribution of PRs seems reasonable:

            jbosch – PRs mostly having to do with the Exposure Id wrangling:
            https://github.com/lsst/coadd_utils/pull/30
            https://github.com/lsst/skymap/pull/54
            https://github.com/lsst/obs_base/pull/399
            https://github.com/lsst/meas_base/pull/200
            https://github.com/lsst/obs_lsst/pull/369
            https://github.com/lsst/obs_subaru/pull/400

            yusra – PRs mostly having to do with input ordering:
            https://github.com/lsst/pipe_drivers/pull/100
            https://github.com/lsst/pipe_tasks/pull/599
            https://github.com/lsst/ci_hsc_gen2/pull/43

            lauren Lauren MacArthur added a comment - And with that, this is ready for review. I'd like you both to have a look and sign off. Feel free to decide for yourselves, but I think the following distribution of PRs seems reasonable: jbosch – PRs mostly having to do with the Exposure Id wrangling: https://github.com/lsst/coadd_utils/pull/30 https://github.com/lsst/skymap/pull/54 https://github.com/lsst/obs_base/pull/399 https://github.com/lsst/meas_base/pull/200 https://github.com/lsst/obs_lsst/pull/369 https://github.com/lsst/obs_subaru/pull/400 yusra – PRs mostly having to do with input ordering: https://github.com/lsst/pipe_drivers/pull/100 https://github.com/lsst/pipe_tasks/pull/599 https://github.com/lsst/ci_hsc_gen2/pull/43
            jbosch Jim Bosch added a comment -

            I've taken a look at all of the PRs assigned to me; there are some comments that are worth addressing, but they should all be simple to address so I don't think I need to take another look unless you'd like me to.

            The change in meas_base that switches the ForcedPhotCoadd is fine in terms of compatibility with existing repos - both dataset types already exist, and they have the same dimensions and storage class, so changing the name of the input is effectively just a change to the config defaults.

            jbosch Jim Bosch added a comment - I've taken a look at all of the PRs assigned to me; there are some comments that are worth addressing, but they should all be simple to address so I don't think I need to take another look unless you'd like me to. The change in meas_base that switches the ForcedPhotCoadd is fine in terms of compatibility with existing repos - both dataset types already exist, and they have the same dimensions and storage class, so changing the name of the input is effectively just a change to the config defaults.

            Thanks, Jim.  Another Jenkins with all your suggested changes is happy. Mergeing still awaits yusras review.

            lauren Lauren MacArthur added a comment - Thanks, Jim.  Another Jenkins with all your suggested changes is happy . Mergeing still awaits yusra s review.
            yusra Yusra AlSayyad added a comment - - edited

            Primary comment is actually on the meas_base PR: the deepCoadd-> deepCoadd_calexp could go on its own ticket so that it's easier to just backport that.

            I got confused that the getGen3ExposureId was for gen2 use only and why sometimes it need to be in a try block and sometimes not.

            Also, was that ci_hsc_gen2 situation with detect_isPrimary. (i didn't understand the inline comment) Is that an oddity with the propagateVisitFlags, or is it always that hard to get primary sources these days?

            yusra Yusra AlSayyad added a comment - - edited Primary comment is actually on the meas_base PR: the deepCoadd-> deepCoadd_calexp could go on its own ticket so that it's easier to just backport that. I got confused that the getGen3ExposureId was for gen2 use only and why sometimes it need to be in a try block and sometimes not. Also, was that ci_hsc_gen2 situation with detect_isPrimary. (i didn't understand the inline comment) Is that an oddity with the propagateVisitFlags, or is it always that hard to get primary sources these days?

            Separate ticket for deepCoadd -> deepCoadd_calexp is DM-33195.

            I tried to answer all your questions on the PRs. Let me know if I was able to make some semblance of sense of it all!

            A new Jenkins is running.

            lauren Lauren MacArthur added a comment - Separate ticket for deepCoadd -> deepCoadd_calexp is DM-33195 . I tried to answer all your questions on the PRs. Let me know if I was able to make some semblance of sense of it all! A new Jenkins is running .

            A final Jenkins was green (in addition to a paranoid post-merge Jenkins run on main). Merged and done!

            lauren Lauren MacArthur added a comment - A final Jenkins was  green (in addition to a paranoid post-merge Jenkins run on main ). Merged and done!

            People

              lauren Lauren MacArthur
              csaunder Clare Saunders
              Yusra AlSayyad
              Clare Saunders, Dino Bektesevic, Fred Moolekamp, Jim Bosch, Lauren MacArthur, Yusra AlSayyad
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Jenkins

                  No builds found.