Uploaded image for project: 'Request For Comments'
  1. Request For Comments
  2. RFC-231

Request for Input on Planned changes to Mappers

    Details

    • Type: RFC
    • Status: Implemented
    • Resolution: Done
    • Component/s: DM
    • Labels:
      None

      Description

      Attempt to make the Coadd and Diffim dataset consistent across mappers.

        Attachments

          Issue Links

            Activity

            pgee Perry Gee created issue -
            Hide
            pgee Perry Gee added a comment - - edited

            RFC – Standardizing Coadd and Diff datasets

            This is part of an initiative to standardize the naming and definition of datasets which are used by the obs_* mappers. We would like to:

            • centralize common datasets in daf_butlerUtils.
            • make the naming of datasets and templates more consistent.
            • remove datasets which are not longer in use.

            Towards that end, we've noticed that there is a lot of inconsistencies in different dataset for Coadds and Difference images.

            In what follows, we use the following tags:
            (l=lsst,S=sdss,d=decam,t=test,m=megacam,h=hsc or suprimecam)

            ***DeepCoadd***

            HscMapper has a large number of deep_coadd and deepCoadd datasets, many of which are shared by the other mappers. However, it does have a few which are peculiar to it:

            datasets.deepCoadd_diffsrc: h
            datasets.deepCoadd_tempExp_diffsrc: h
            exposures.deepCoadd_bg: h
            exposures.deepCoadd_bgRef: h
            exposures.deepCoadd_diff: h

            These are in some of the other mappers but not HSC. They should be examined to understand why they are not used by all the mappers, and removed is they are no longer in use.

            deepCoadd_calexp_detBackground: -still in use or delete?
            deepCoadd_depth Slm -still in use or delete?
            deepCoadd_forced_schema: -same as deep_forced_src_schema?
            deepCoadd_modelfits: lm -still in use or delete?
            deepCoadd_modelfits_schema: lm -still in use or delete?
            deepCoadd_multiModelfits: lm -still in use or delete?
            deepCoadd_multiModelfits_schema:lm -still in use or delete?
            deep_makeSkyMap_config Smdl -add to datasets.yaml?
            deep_processCoadd m -same as deepCoadd?
            .
            :***DIFF***

            Most of the datasets containing diff have something to do with difference imaging. Seems like there could be a single set for all mappers.

            These are in HSC but none of the others:

            datasets:diffsources_schema: h
            datasets:diffsources: h
            datasets:deepCoadd_diffsrc: h
            datasets:deepCoadd_tempExp_diffsrc: h
            datasets:diffpsf: h
            exposures:diff: h
            exposures:deepCoadd_diff: h
            exposures:deepCoadd_tempExp_diff: h

            These are the “Diff” datasets which are in the other mappers, but not in HSC. The “deepDiff” datasets may overlap with the HSC ones above. The “goodSeeing” ones are not in HSC at all. Should they be removed?.

            datasets.crDiffimSrc: lt
            datasets.crDiffimSrc_schema: lt
            datasets.deepDiff_config: dSl
            datasets.deepDiff_diaSrc: dSl
            datasets.deepDiff_diaSrc_schema: dSl
            datasets.deepDiff_differenceExp: dSl
            datasets.deepDiff_kernelSrc: dSl
            datasets.deepDiff_matchedExp: dSl
            datasets.deepDiff_metadata: dSl
            datasets.goodSeeingDiff_config: dSl
            datasets.goodSeeingDiff_diaSrc: dSl
            datasets.goodSeeingDiff_diaSrc_schema: dSl
            datasets.goodSeeingDiff_differenceExp: dSl
            datasets.goodSeeingDiff_kernelSrc: dSl
            datasets.goodSeeingDiff_matchedExp: dSl
            datasets.goodSeeingDiff_metadata: dSl
            exposures.crDiffim: lt

            ***chiSquared and goodSeeing***

            Not sure if these two sets are still in use. HSC only uses the deepCoadd ones, not chiSquared or goodSeeing. If they are still in use, should they be added to datasets.yaml? Is not should they be deleted?

            datasets.chiSquared_assembleCoadd_config: l
            datasets.chiSquared_assembleCoadd_metadata: l
            datasets.chiSquaredCoadd_apCorr: Sl
            datasets.chiSquared_coadd_config: Sl
            datasets.chiSquaredCoadd_icMatch: Sl
            datasets.chiSquaredCoadd_icSrc_schema: Sl
            datasets.chiSquaredCoadd_icSrc: Sl
            datasets.chiSquaredCoaddId_bits: Slt
            datasets.chiSquaredCoaddId: Slt
            datasets.chiSquared_coadd_metadata: Sl
            datasets.chiSquaredCoadd_skyMap: Sl
            datasets.chiSquaredCoadd_src_schema: Sl
            datasets.chiSquaredCoadd_src: Sl
            datasets.chiSquared_makeSkyMap_config: Sl
            datasets.chiSquared_makeSkyMap_metadata: Sl
            datasets.chiSquared_processCoadd_config: Sl
            datasets.chiSquared_processCoadd_metadata: Sl
            exposures.chiSquaredCoadd_calexp: Sl
            exposures.chiSquaredCoadd: Sl

            datasets.goodSeeing_assembleCoadd_config: Sl
            datasets.goodSeeing_assembleCoadd_metadata: Sl
            datasets.goodSeeingCoadd_apCorr: Sl
            datasets.goodSeeingCoadd_calexpBackground: Sl
            datasets.goodSeeing_coadd_config: Sl
            datasets.goodSeeingCoadd_depth: Sl
            datasets.goodSeeingCoadd_icMatch: Sl
            datasets.goodSeeingCoadd_icSrc_schema: Sl
            datasets.goodSeeingCoadd_icSrc: Sl
            datasets.goodSeeingCoaddId_bits: Slt
            datasets.goodSeeingCoaddId: Slt
            datasets.goodSeeing_coadd_metadata: Sl
            datasets.goodSeeingCoadd_skyMap: dSl
            datasets.goodSeeingCoadd_src_schema: l
            datasets.goodSeeingCoadd_src: Sl
            datasets.goodSeeingDiff_config: dSl
            datasets.goodSeeingDiff_diaSrc: dSl
            datasets.goodSeeingDiff_diaSrc_schema: dSl
            datasets.goodSeeingDiff_differenceExp: dSl
            datasets.goodSeeingDiff_kernelSrc: dSl
            datasets.goodSeeingDiff_matchedExp: dSl
            datasets.goodSeeingDiff_metadata: dSl
            datasets.goodSeeing_makeCoaddTempExp_config: Sl
            datasets.goodSeeing_makeCoaddTempExp_metadata: Sl
            datasets.goodSeeing_makeSkyMap_config: dSl
            datasets.goodSeeing_makeSkyMap_metadata: dSl
            datasets.goodSeeing_processCoadd_config: Sl
            datasets.goodSeeing_processCoadd_metadata: Sl
            datasets.goodSeeing_srcMatch: l
            exposures.goodSeeingCoadd_calexp: dSl
            exposures.goodSeeingCoadd: dSl
            exposures.goodSeeingCoadd_tempExp: dSl

            Show
            pgee Perry Gee added a comment - - edited RFC – Standardizing Coadd and Diff datasets This is part of an initiative to standardize the naming and definition of datasets which are used by the obs_* mappers. We would like to: centralize common datasets in daf_butlerUtils. make the naming of datasets and templates more consistent. remove datasets which are not longer in use. Towards that end, we've noticed that there is a lot of inconsistencies in different dataset for Coadds and Difference images. In what follows, we use the following tags: (l=lsst,S=sdss,d=decam,t=test,m=megacam,h=hsc or suprimecam) *** DeepCoadd *** HscMapper has a large number of deep_coadd and deepCoadd datasets, many of which are shared by the other mappers. However, it does have a few which are peculiar to it: datasets.deepCoadd_diffsrc: h datasets.deepCoadd_tempExp_diffsrc: h exposures.deepCoadd_bg: h exposures.deepCoadd_bgRef: h exposures.deepCoadd_diff: h These are in some of the other mappers but not HSC. They should be examined to understand why they are not used by all the mappers, and removed is they are no longer in use. deepCoadd_calexp_detBackground: -still in use or delete? deepCoadd_depth Slm -still in use or delete? deepCoadd_forced_schema: -same as deep_forced_src_schema? deepCoadd_modelfits: lm -still in use or delete? deepCoadd_modelfits_schema: lm -still in use or delete? deepCoadd_multiModelfits: lm -still in use or delete? deepCoadd_multiModelfits_schema:lm -still in use or delete? deep_makeSkyMap_config Smdl -add to datasets.yaml? deep_processCoadd m -same as deepCoadd? . :*** DIFF *** Most of the datasets containing diff have something to do with difference imaging. Seems like there could be a single set for all mappers. These are in HSC but none of the others: datasets:diffsources_schema: h datasets:diffsources: h datasets:deepCoadd_diffsrc: h datasets:deepCoadd_tempExp_diffsrc: h datasets:diffpsf: h exposures:diff: h exposures:deepCoadd_diff: h exposures:deepCoadd_tempExp_diff: h These are the “Diff” datasets which are in the other mappers, but not in HSC. The “deepDiff” datasets may overlap with the HSC ones above. The “goodSeeing” ones are not in HSC at all. Should they be removed?. datasets.crDiffimSrc: lt datasets.crDiffimSrc_schema: lt datasets.deepDiff_config: dSl datasets.deepDiff_diaSrc: dSl datasets.deepDiff_diaSrc_schema: dSl datasets.deepDiff_differenceExp: dSl datasets.deepDiff_kernelSrc: dSl datasets.deepDiff_matchedExp: dSl datasets.deepDiff_metadata: dSl datasets.goodSeeingDiff_config: dSl datasets.goodSeeingDiff_diaSrc: dSl datasets.goodSeeingDiff_diaSrc_schema: dSl datasets.goodSeeingDiff_differenceExp: dSl datasets.goodSeeingDiff_kernelSrc: dSl datasets.goodSeeingDiff_matchedExp: dSl datasets.goodSeeingDiff_metadata: dSl exposures.crDiffim: lt *** chiSquared and goodSeeing *** Not sure if these two sets are still in use. HSC only uses the deepCoadd ones, not chiSquared or goodSeeing. If they are still in use, should they be added to datasets.yaml? Is not should they be deleted? datasets.chiSquared_assembleCoadd_config: l datasets.chiSquared_assembleCoadd_metadata: l datasets.chiSquaredCoadd_apCorr: Sl datasets.chiSquared_coadd_config: Sl datasets.chiSquaredCoadd_icMatch: Sl datasets.chiSquaredCoadd_icSrc_schema: Sl datasets.chiSquaredCoadd_icSrc: Sl datasets.chiSquaredCoaddId_bits: Slt datasets.chiSquaredCoaddId: Slt datasets.chiSquared_coadd_metadata: Sl datasets.chiSquaredCoadd_skyMap: Sl datasets.chiSquaredCoadd_src_schema: Sl datasets.chiSquaredCoadd_src: Sl datasets.chiSquared_makeSkyMap_config: Sl datasets.chiSquared_makeSkyMap_metadata: Sl datasets.chiSquared_processCoadd_config: Sl datasets.chiSquared_processCoadd_metadata: Sl exposures.chiSquaredCoadd_calexp: Sl exposures.chiSquaredCoadd: Sl datasets.goodSeeing_assembleCoadd_config: Sl datasets.goodSeeing_assembleCoadd_metadata: Sl datasets.goodSeeingCoadd_apCorr: Sl datasets.goodSeeingCoadd_calexpBackground: Sl datasets.goodSeeing_coadd_config: Sl datasets.goodSeeingCoadd_depth: Sl datasets.goodSeeingCoadd_icMatch: Sl datasets.goodSeeingCoadd_icSrc_schema: Sl datasets.goodSeeingCoadd_icSrc: Sl datasets.goodSeeingCoaddId_bits: Slt datasets.goodSeeingCoaddId: Slt datasets.goodSeeing_coadd_metadata: Sl datasets.goodSeeingCoadd_skyMap: dSl datasets.goodSeeingCoadd_src_schema: l datasets.goodSeeingCoadd_src: Sl datasets.goodSeeingDiff_config: dSl datasets.goodSeeingDiff_diaSrc: dSl datasets.goodSeeingDiff_diaSrc_schema: dSl datasets.goodSeeingDiff_differenceExp: dSl datasets.goodSeeingDiff_kernelSrc: dSl datasets.goodSeeingDiff_matchedExp: dSl datasets.goodSeeingDiff_metadata: dSl datasets.goodSeeing_makeCoaddTempExp_config: Sl datasets.goodSeeing_makeCoaddTempExp_metadata: Sl datasets.goodSeeing_makeSkyMap_config: dSl datasets.goodSeeing_makeSkyMap_metadata: dSl datasets.goodSeeing_processCoadd_config: Sl datasets.goodSeeing_processCoadd_metadata: Sl datasets.goodSeeing_srcMatch: l exposures.goodSeeingCoadd_calexp: dSl exposures.goodSeeingCoadd: dSl exposures.goodSeeingCoadd_tempExp: dSl
            Hide
            wmwood-vasey Michael Wood-Vasey added a comment -

            In concept I very much support this.

            Thoughts:

            1. I would like to see a wide discussion on this RFC and carefully attention from Architecture and each of the subgroups to make sure there's a coherent plan for what the default data sets should be. The naming of datasets borders on being a schema and thus easily leads to implied logical structure of the processing (e.g., see point 4 below) and we should make sure that the dataset schema does reflect a reasonable suggested use of the pipeline and a reasonable set of defaults covering common use cases.
            2. This is a likely breaking change (it will break the use of modern code on older datasets). Should we coordinate this breaking change with other breaking changes planned for the cycle in which this work will be done?
            3. Ensure that it's easy, clear, and well-documented how to override datasets to (a) adopt to future needs; and (b) ensure backward compatibility for an existing data set.
            4. A specific recommendation: the difference imaging data sets types should include the simple image-image case as well as the image-template case. "goodSeeing" and "deepCoadd" sometimes seem to specific. They are also not really enforced, so perhaps they are misleadingly specific.

            Show
            wmwood-vasey Michael Wood-Vasey added a comment - In concept I very much support this. Thoughts: 1. I would like to see a wide discussion on this RFC and carefully attention from Architecture and each of the subgroups to make sure there's a coherent plan for what the default data sets should be. The naming of datasets borders on being a schema and thus easily leads to implied logical structure of the processing (e.g., see point 4 below) and we should make sure that the dataset schema does reflect a reasonable suggested use of the pipeline and a reasonable set of defaults covering common use cases. 2. This is a likely breaking change (it will break the use of modern code on older datasets). Should we coordinate this breaking change with other breaking changes planned for the cycle in which this work will be done? 3. Ensure that it's easy, clear, and well-documented how to override datasets to (a) adopt to future needs; and (b) ensure backward compatibility for an existing data set. 4. A specific recommendation: the difference imaging data sets types should include the simple image-image case as well as the image-template case. "goodSeeing" and "deepCoadd" sometimes seem to specific. They are also not really enforced, so perhaps they are misleadingly specific.
            tjenness Tim Jenness made changes -
            Field Original Value New Value
            Link This issue relates to RFC-233 [ RFC-233 ]
            Hide
            swinbank John Swinbank added a comment -

            This RFC doesn't have any "triggering" tickets, and it's not generated a lot of discussion. What do you regard as the success criteria? When can we mark it as "implemented"?

            Show
            swinbank John Swinbank added a comment - This RFC doesn't have any "triggering" tickets, and it's not generated a lot of discussion. What do you regard as the success criteria? When can we mark it as "implemented"?
            Hide
            pgee Perry Gee added a comment - - edited

            Jim has asked me to put together a proposal in response to this RFC. As it stands, it is just a call for consistency in the coadd and differencing datasets. I will try to put this through a series of responses to this RFC.

            -------------------------------------------------------------------------
            This first comment has to do with the HscMapper, which has more or less been used as a model. Its naming is not that consistent, and perhaps we should fix any problems here before tackling the others.

            First, I withdraw my earlier comment about the following datasets. They are actually consistent with the naming convention of CoaddName + '_' + taskname. where CoaddName = "deep". The dataset names for goodSeeing should all be made consistent with this. We should probably also have a config and metadata consistently defined, even if it isn't used.

            deep_assembleCoadd_config and metadata
            deep_safeClipAssembleCoadd_config and metadata
            deep_makeSkyMap_metadata (there is no config).
            deep_makeDiscreteSkyMap_metadata (no config)

            There are several new datasets defined in multiBand.py which are only defined in HscMapper.py. The include the ones with "det", "ref", and "mergeDet" roots. These appear to be intended to be used with CoaddNames other than "deep". But there is other code in multiBand.py which appears to assume that CoaddName = "deep". Should we and can we add these datasets for "goodSeeing" as well?

            It appears that some of the datasets defined in HscMapper are no longer in use, or at least I can't find them. I propose that these be deleted if someone doesn't vouch for them.

            Coadd_depth
            Coadd_bg
            Coadd_bgRef
            Coadd_forced
            Coadd_forced_src

            Finally, it is not clear what the relationship is between _meas and coaddName + '_Coadd_src', which is used in some of the mappers other than Hsc. Are these the same?

            Show
            pgee Perry Gee added a comment - - edited Jim has asked me to put together a proposal in response to this RFC. As it stands, it is just a call for consistency in the coadd and differencing datasets. I will try to put this through a series of responses to this RFC. ------------------------------------------------------------------------- This first comment has to do with the HscMapper, which has more or less been used as a model. Its naming is not that consistent, and perhaps we should fix any problems here before tackling the others. First, I withdraw my earlier comment about the following datasets. They are actually consistent with the naming convention of CoaddName + '_' + taskname. where CoaddName = "deep". The dataset names for goodSeeing should all be made consistent with this. We should probably also have a config and metadata consistently defined, even if it isn't used. deep_assembleCoadd_config and metadata deep_safeClipAssembleCoadd_config and metadata deep_makeSkyMap_metadata (there is no config). deep_makeDiscreteSkyMap_metadata (no config) There are several new datasets defined in multiBand.py which are only defined in HscMapper.py. The include the ones with "det", "ref", and "mergeDet" roots. These appear to be intended to be used with CoaddNames other than "deep". But there is other code in multiBand.py which appears to assume that CoaddName = "deep". Should we and can we add these datasets for "goodSeeing" as well? It appears that some of the datasets defined in HscMapper are no longer in use, or at least I can't find them. I propose that these be deleted if someone doesn't vouch for them. Coadd_depth Coadd_bg Coadd_bgRef Coadd_forced Coadd_forced_src Finally, it is not clear what the relationship is between _meas and coaddName + '_Coadd_src', which is used in some of the mappers other than Hsc. Are these the same?
            Hide
            jbosch Jim Bosch added a comment -

            deep_assembleCoadd_config and metadata
            deep_safeClipAssembleCoadd_config and metadata

            These are a bit tricky: when you run assembleCoadd.py, by default it actually runs SafeClipAssembleCoaddTask, and so in that default configuration it writes to deep_safeClipAssembleCoadd_config. If you run it with --legacyCoadd, it runs the original AssembleCoaddTask, and writes deep_assembleCoadd_config. We can't merge them because they need to have different Python types, but it certainly true that all mappers should define these the same way and I think the right thing to do is to define them both.

            deep_makeSkyMap_metadata (there is no config).
            deep_makeDiscreteSkyMap_metadata (no config)

            These probably should have config datasets, but MakeDiscreteSkyMapTask will also need to be modified to write metadata and config (by making its _getConfigName and _getMetadataName methods return something other than None.

            The dataset names for goodSeeing should all be made consistent with (deep).

            +1

            The include the ones with "det", "ref", and "mergeDet" roots. These appear to be intended to be used with CoaddNames other than "deep". But there is other code in multiBand.py which appears to assume that CoaddName = "deep". Should we and can we add these datasets for "goodSeeing" as well?

            I think we should either add them or drop all "goodSeeing" datasets; there's no point in halfway supporting "goodSeeing" coadd datasets. I'm ambivalent about which of these we choose.

            <name>Coadd_depth

            I also don't know if this is still in use, and I don't see it in the codebase either. If it is, I hope someone chimes in.

            <name>Coadd_bg
            <name>Coadd_bgRef

            We need a coadd background dataset, but I think that's now <name>Coadd_calexp_background, so there's no need for <name>Coadd_bg? I think the same applies for <name>Coadd_detBackground.

            <name>Coadd_bgRef appears to be related to background matching, but doesn't seem to be used by our code that does background matching.

            I'd like to get Paul Price's and Yusra AlSayyad's opinions on status of these two.

            <name>Coadd_forced
            <name>Coadd_forced_src
            <name>Coadd_forced_schema

            We are definitely using <name>Coadd_forced_src. I believe <name>Coadd_forced can be removed. <name>Coadd_forced_schema is indeed redundant with <name>Coadd_forced_src_schema; we should only have the latter.

            All of the "modelfits" and "multiModelfits" can be removed. These are used only by deprecated code in meas_modelfit that is being kept around only because it's used in hard-to-reimplement test, and that test doesn't make use of any mappers.

            I mostly don't know much about all of the difference imaging datasets, so I haven't tried to comment on them at all yet.

            Show
            jbosch Jim Bosch added a comment - deep_assembleCoadd_config and metadata deep_safeClipAssembleCoadd_config and metadata These are a bit tricky: when you run assembleCoadd.py , by default it actually runs SafeClipAssembleCoaddTask , and so in that default configuration it writes to deep_safeClipAssembleCoadd_config . If you run it with --legacyCoadd , it runs the original AssembleCoaddTask , and writes deep_assembleCoadd_config . We can't merge them because they need to have different Python types, but it certainly true that all mappers should define these the same way and I think the right thing to do is to define them both. deep_makeSkyMap_metadata (there is no config). deep_makeDiscreteSkyMap_metadata (no config) These probably should have config datasets, but MakeDiscreteSkyMapTask will also need to be modified to write metadata and config (by making its _getConfigName and _getMetadataName methods return something other than None . The dataset names for goodSeeing should all be made consistent with (deep). +1 The include the ones with "det", "ref", and "mergeDet" roots. These appear to be intended to be used with CoaddNames other than "deep". But there is other code in multiBand.py which appears to assume that CoaddName = "deep". Should we and can we add these datasets for "goodSeeing" as well? I think we should either add them or drop all "goodSeeing" datasets; there's no point in halfway supporting "goodSeeing" coadd datasets. I'm ambivalent about which of these we choose. <name>Coadd_depth I also don't know if this is still in use, and I don't see it in the codebase either. If it is, I hope someone chimes in. <name>Coadd_bg <name>Coadd_bgRef We need a coadd background dataset, but I think that's now <name>Coadd_calexp_background , so there's no need for <name>Coadd_bg ? I think the same applies for <name>Coadd_detBackground . <name>Coadd_bgRef appears to be related to background matching, but doesn't seem to be used by our code that does background matching. I'd like to get Paul Price 's and Yusra AlSayyad 's opinions on status of these two. <name>Coadd_forced <name>Coadd_forced_src <name>Coadd_forced_schema We are definitely using <name>Coadd_forced_src . I believe <name>Coadd_forced can be removed. <name>Coadd_forced_schema is indeed redundant with <name>Coadd_forced_src_schema ; we should only have the latter. All of the "modelfits" and "multiModelfits" can be removed. These are used only by deprecated code in meas_modelfit that is being kept around only because it's used in hard-to-reimplement test, and that test doesn't make use of any mappers. I mostly don't know much about all of the difference imaging datasets, so I haven't tried to comment on them at all yet.
            Hide
            pgee Perry Gee added a comment -

            My second comment is about the various "diff" datasets. I propose to retain the datasets defined in pipe_tasks imageDifference.py, and to define all of them for coaddName = "deep" and "goodSeeing"

            I do not see any references to the following datasets and I propose that we remove them:

            diffsources (and diffsource_*)
            crDiffimSrc (and crDiffimSrc_*)
            *_diffsrc
            diffpsf
            *Diff_subtractedExp
            *Diff_src

            or the exposures:
            diff
            *_diff
            crDiffim

            I am no expert on this code, so someone who actually runs difference imaging, please reply!

            Show
            pgee Perry Gee added a comment - My second comment is about the various "diff" datasets. I propose to retain the datasets defined in pipe_tasks imageDifference.py, and to define all of them for coaddName = "deep" and "goodSeeing" I do not see any references to the following datasets and I propose that we remove them: diffsources (and diffsource_*) crDiffimSrc (and crDiffimSrc_*) *_diffsrc diffpsf *Diff_subtractedExp *Diff_src or the exposures: diff *_diff crDiffim I am no expert on this code, so someone who actually runs difference imaging, please reply!
            Hide
            price Paul Price added a comment -

            I added <name>Coadd_bgRef and I think <name>Coadd_bg to HSC when I was playing around with background matching reference construction. That code hasn't been merged, and isn't likely to. I suggest dropping them.

            I'm not remembering anything about <name>Coadd_detBackground; maybe it was 'replaced' by <name>Coadd_calexp_background.

            Show
            price Paul Price added a comment - I added <name>Coadd_bgRef and I think <name>Coadd_bg to HSC when I was playing around with background matching reference construction. That code hasn't been merged, and isn't likely to. I suggest dropping them. I'm not remembering anything about <name>Coadd_detBackground ; maybe it was 'replaced' by <name>Coadd_calexp_background .
            Hide
            pgee Perry Gee added a comment -

            BTW, the crDiffim filenames are input directly into ip_diffim diaSourceAnalysis.py as arguments, but I do not see where they are produced, nor do I see any dataset or butler usage in this code.

            I do not have a complete test setup for difference images, so it could be that there a missing scripts which do use the butler. If you know about this, please speak up.

            Show
            pgee Perry Gee added a comment - BTW, the crDiffim filenames are input directly into ip_diffim diaSourceAnalysis.py as arguments, but I do not see where they are produced, nor do I see any dataset or butler usage in this code. I do not have a complete test setup for difference images, so it could be that there a missing scripts which do use the butler. If you know about this, please speak up.
            Hide
            wmwood-vasey Michael Wood-Vasey added a comment -

            Perry Gee What would be the default dataset name be for a subtraction of just two images?

            Show
            wmwood-vasey Michael Wood-Vasey added a comment - Perry Gee What would be the default dataset name be for a subtraction of just two images?
            Hide
            pgee Perry Gee added a comment -

            Which two images do you mean? If you mean an arbitrary pair of images, there isn't one.

            Show
            pgee Perry Gee added a comment - Which two images do you mean? If you mean an arbitrary pair of images, there isn't one.
            Hide
            wmwood-vasey Michael Wood-Vasey added a comment -

            Yes, I mean an arbitrary pair of images. I don't have a suggestion in mind, but it seemed (in my biased perspective) a common use case.

            Show
            wmwood-vasey Michael Wood-Vasey added a comment - Yes, I mean an arbitrary pair of images. I don't have a suggestion in mind, but it seemed (in my biased perspective) a common use case.
            Hide
            jbosch Jim Bosch added a comment -

            I think butler-based CmdLineTasks are generically a poor fit to processing arbitrary images. That's essentially the long-standing ProcessFile vs. obs_file problem.

            Show
            jbosch Jim Bosch added a comment - I think butler-based CmdLineTasks are generically a poor fit to processing arbitrary images. That's essentially the long-standing ProcessFile vs. obs_file problem.
            Hide
            pgee Perry Gee added a comment -

            The butler and its datasets are typically for known inputs and outputs, and which require a fixed location in the directory structure. There are places in the stack where people have provided python code or shell scripts which take inputs and outputs as command line arguments (such as the diaSourceAnalysis.py I mentioned above). But there is no organized toolset of this kind.

            Show
            pgee Perry Gee added a comment - The butler and its datasets are typically for known inputs and outputs, and which require a fixed location in the directory structure. There are places in the stack where people have provided python code or shell scripts which take inputs and outputs as command line arguments (such as the diaSourceAnalysis.py I mentioned above). But there is no organized toolset of this kind.
            Hide
            wmwood-vasey Michael Wood-Vasey added a comment -

            Jim Bosch Thanks for bringing out that the general dataset naming question as being related to the ProcessFile vs. obs_file issue. I hadn't quite thought of it that way.

            I guess I'm thinking of the situation where one has a set of template images that one wants to use (so not, in general, completely arbitrary). There is a way to define these template images, but they may not be coadds. I fear the naming of deepCoadd or goodSeeingCoadd might imply that you can't just use a designated set of individual images.

            In the scenario I'm envisioning there may be a step of declare_templates instead of make_deep_coadd or make_good_seeing_coadd (I made up these names). This would be useful, e.g., in commissioning and in the first 12 where one wants to quickly run subtractions, or flexibly compare several different subtraction templates.

            Show
            wmwood-vasey Michael Wood-Vasey added a comment - Jim Bosch Thanks for bringing out that the general dataset naming question as being related to the ProcessFile vs. obs_file issue. I hadn't quite thought of it that way. I guess I'm thinking of the situation where one has a set of template images that one wants to use (so not, in general, completely arbitrary). There is a way to define these template images, but they may not be coadds. I fear the naming of deepCoadd or goodSeeingCoadd might imply that you can't just use a designated set of individual images. In the scenario I'm envisioning there may be a step of declare_templates instead of make_deep_coadd or make_good_seeing_coadd (I made up these names). This would be useful, e.g., in commissioning and in the first 12 where one wants to quickly run subtractions, or flexibly compare several different subtraction templates.
            Hide
            jbosch Jim Bosch added a comment -

            If the templates you're imagining are always organized in tracts and patches like coadds are, I think we probably want to just label them as a "templateCoadd" dataset, even if they'll sometimes only be one exposure deep; that would let us use the same code for ad-hoc processing as we do in official processing runs, which would simplify things. If they can sometimes be other visit-level images that haven't been warped to a coadd coordinate system, I think we probably just want to have a separate driver CmdLineTask to do that that delegates to the same lower-level algorithms. It's not just the bookkeeping that will be different there; I think the geometry of the result (and possibly some of the processing steps) will have to be different as well.

            Show
            jbosch Jim Bosch added a comment - If the templates you're imagining are always organized in tracts and patches like coadds are, I think we probably want to just label them as a "templateCoadd" dataset, even if they'll sometimes only be one exposure deep; that would let us use the same code for ad-hoc processing as we do in official processing runs, which would simplify things. If they can sometimes be other visit-level images that haven't been warped to a coadd coordinate system, I think we probably just want to have a separate driver CmdLineTask to do that that delegates to the same lower-level algorithms. It's not just the bookkeeping that will be different there; I think the geometry of the result (and possibly some of the processing steps) will have to be different as well.
            Hide
            wmwood-vasey Michael Wood-Vasey added a comment -

            Jim Bosch Sounds reasonable. If it maps to tract, patch then it's just templateCoadd. But if it's two arbitrary images that are not already aligned in pixel space then that's a different processing flow.

            Show
            wmwood-vasey Michael Wood-Vasey added a comment - Jim Bosch Sounds reasonable. If it maps to tract, patch then it's just templateCoadd . But if it's two arbitrary images that are not already aligned in pixel space then that's a different processing flow.
            Hide
            pgee Perry Gee added a comment -

            Ticket will support both first pass at the changes (to help with RFC evaluation) and the final RFC result.

            Show
            pgee Perry Gee added a comment - Ticket will support both first pass at the changes (to help with RFC evaluation) and the final RFC result.
            pgee Perry Gee made changes -
            Link This issue is triggering DM-7884 [ DM-7884 ]
            Hide
            pgee Perry Gee added a comment -

            Paul Price Can I assume that the datasets for multiBand.py will be used by all the cameras eventually?

            Show
            pgee Perry Gee added a comment - Paul Price Can I assume that the datasets for multiBand.py will be used by all the cameras eventually?
            Hide
            price Paul Price added a comment -

            Can I assume that the datasets for multiBand.py will be used by all the cameras eventually?

            Yes.

            Show
            price Paul Price added a comment - Can I assume that the datasets for multiBand.py will be used by all the cameras eventually? Yes.
            Hide
            pgee Perry Gee added a comment -

            This is a copy of a private email discussion between Nate Pease, Perry Gee, and Kian-Tat Lim

            This should have been conducted on this RFC, but it was a very long communication, which I try to summarize here.
            -------------------------------------------------
            From Perry:

            One problem with centralizing is that some of these datasets differ in the template because of the identifiers in the dataIds which Identify individual exposures. For example, the difference image in SdssMapper.paf looks like this:  

             deepDiff_differenceExp:

            {         template:      "deepDiff/%(run)d/%(camcol)d/ ...         python:        "lsst.afw.image.ExposureF"         persistable:   "ExposureF"         storage:       "FitsStorage"         tables:        raw         tables:        raw_skyTile     }

            while in DecamMapper.paf it looks like this:

                deepDiff_differenceExp:

            {         template:      "deepDiff/v%(visit)d/diffexp- ...         python:        "lsst.afw.image.ExposureF"         persistable:   "ExposureF"         storage:       "FitsStorage"         tables:        raw         tables:        raw_skyTile     }

            There is nothing in the logic of the pipeline which makes these two datasets different.  They are requested by code which knows nothing about the dataId entries (run, camcol, field) or (visit, ccdnum).  
            I think it would make more sense if this dataset were defined in obs_base, not obs_decam or obs_sdss, with only the things which are different in the Mapper.paf files.

            Jim thinks that there is a ticket which has been sitting around for a very long time talking about this issue, so he wants to know if you have plans to work on this issue before I propose anything. 

            The current policy code already allows a dataset to be defined in the parent mapper and modified in the child (in obs_*).  Because each item in the dataset can be individually defined at any level, we could put everything except for the "template" in obs_base/policy/datasets.yaml, and then put a template definition in each of the mappers:

            In obs_sdss this template "override" would look like:

            deepDiff_differenceExp:

            {         template:      "deepDiff/%(run)d/%(camcol)d/ ... }

            And in obs_decam it would look like:

            deepDiff_differenceExp:

            {     template:      "deepDiff/v%(visit)d/diffexp- ... }

             Not sure if this is a acceptable solution or not, but it does put the pieces that belong in obs_base there, and the pieces which belong in obs_sdss or obs_decam in the individual mapper definitions.

            From Nate:
            .
            Template override seems like an interesting idea. Did you say that this was possible with already-existing infrastructure? I think it is, but have not tested it using this pattern. I’d like to know what K-T and Jim think about using this policy definition pattern, I’ve added them to the thread.

            From K-T:
                    I was hoping that the python/persistable/storage information
            would move to the "genre"/"prototype" level of configuration, with the
            template still specified (in part) by the per-camera configuration (and
            in part by a per-dataset-type definition), when we moved to dynamic
            dataset type definitions.  This seems to be essentially what Perry is
            asking for.  If we can move partway there already, using existing code,
            that's great.

            From Nate:

            Thanks K-T. Perry, it sounds like it makes sense to go ahead with what you’re saying. How much code is there to change?  - If there is a way to do a minimal set of changes and set up a PR to get any input that might be a good way to proceed?

            From Perry:

            My belief is that no code changes are required.  The existing code seems to understand a partial dataset definition as its own node in the policy tree, so if the dataset is already defined at the parent (obs_base) level.  If the dataset is partially defined in the child, it just adds (or overrides) the ones defined in the child.

            So when I defined everything for "dataset" except for "template" in obs_base, then define "dataset.template" in the child (the mapper.paf file), I get the dataset as defined in obs_base with the child template added to it.

            I'm pretty sure that is the way the policy system was intended to work.

            Show
            pgee Perry Gee added a comment - This is a copy of a private email discussion between Nate Pease , Perry Gee , and Kian-Tat Lim This should have been conducted on this RFC, but it was a very long communication, which I try to summarize here. ------------------------------------------------- From Perry: One problem with centralizing is that some of these datasets differ in the template because of the identifiers in the dataIds which Identify individual exposures. For example, the difference image in SdssMapper.paf looks like this:    deepDiff_differenceExp: {         template:      "deepDiff/%(run)d/%(camcol)d/ ...         python:        "lsst.afw.image.ExposureF"         persistable:   "ExposureF"         storage:       "FitsStorage"         tables:        raw         tables:        raw_skyTile     } while in DecamMapper.paf it looks like this:     deepDiff_differenceExp: {         template:      "deepDiff/v%(visit)d/diffexp- ...         python:        "lsst.afw.image.ExposureF"         persistable:   "ExposureF"         storage:       "FitsStorage"         tables:        raw         tables:        raw_skyTile     } There is nothing in the logic of the pipeline which makes these two datasets different.  They are requested by code which knows nothing about the dataId entries (run, camcol, field) or (visit, ccdnum).   I think it would make more sense if this dataset were defined in obs_base, not obs_decam or obs_sdss, with only the things which are different in the Mapper.paf files. Jim thinks that there is a ticket which has been sitting around for a very long time talking about this issue, so he wants to know if you have plans to work on this issue before I propose anything.  The current policy code already allows a dataset to be defined in the parent mapper and modified in the child (in obs_*).  Because each item in the dataset can be individually defined at any level, we could put everything except for the "template" in obs_base/policy/datasets.yaml, and then put a template definition in each of the mappers: In obs_sdss this template "override" would look like: deepDiff_differenceExp: {         template:      "deepDiff/%(run)d/%(camcol)d/ ... } And in obs_decam it would look like: deepDiff_differenceExp: {     template:      "deepDiff/v%(visit)d/diffexp- ... }  Not sure if this is a acceptable solution or not, but it does put the pieces that belong in obs_base there, and the pieces which belong in obs_sdss or obs_decam in the individual mapper definitions. From Nate: . Template override seems like an interesting idea. Did you say that this was possible with already-existing infrastructure? I think it is, but have not tested it using this pattern. I’d like to know what K-T and Jim think about using this policy definition pattern, I’ve added them to the thread. From K-T:         I was hoping that the python/persistable/storage information would move to the "genre"/"prototype" level of configuration, with the template still specified (in part) by the per-camera configuration (and in part by a per-dataset-type definition), when we moved to dynamic dataset type definitions.  This seems to be essentially what Perry is asking for.  If we can move partway there already, using existing code, that's great. From Nate: Thanks K-T. Perry, it sounds like it makes sense to go ahead with what you’re saying. How much code is there to change?  - If there is a way to do a minimal set of changes and set up a PR to get any input that might be a good way to proceed? From Perry: My belief is that no code changes are required.  The existing code seems to understand a partial dataset definition as its own node in the policy tree, so if the dataset is already defined at the parent (obs_base) level.  If the dataset is partially defined in the child, it just adds (or overrides) the ones defined in the child. So when I defined everything for "dataset" except for "template" in obs_base, then define "dataset.template" in the child (the mapper.paf file), I get the dataset as defined in obs_base with the child template added to it. I'm pretty sure that is the way the policy system was intended to work.
            Hide
            pgee Perry Gee added a comment -

            I propose that we adopt the following as a result of this RFC. This will be tested on DM-7884 to be sure that this does not cause any problems with deepCoadd or imageDifference, at least for Sdss, Decam, and HSC. We do not have test sets for Megacam and lsstSim.

            1. Delete all goodSeeing and chiSquared entries.

            2. Delete the following list of what appear to be obsolete datasets:

            deepCoadd_depth
            deepCoadd_bg deepCoadd_bgRef
            deepCoadd_forced deepCoadd_forced_schema
            modelfits modelfits_schema deepCoadd_modelfits deepCoadd_modelfits_schema
            deepCoadd_multiModelfits deepCoadd_multiModelfits_schema
            diffsources_schema diffsources diffpsf
            diff
            crDiffimSrc crDiffimSrc_schema crDiffim

            3. Move the currently active datasets for deepCoadd and imageDifference to obs_base/policy datasets.yaml and exposures.yaml. Where these differ (usually in the template) from what is already in the Mapper.paf files, keep the differences in the Mapper.paf files. This will insure that none of the of the current definitions of storage type/locations are modified. (see the previous comment for discussion)

            Show
            pgee Perry Gee added a comment - I propose that we adopt the following as a result of this RFC. This will be tested on DM-7884 to be sure that this does not cause any problems with deepCoadd or imageDifference, at least for Sdss, Decam, and HSC. We do not have test sets for Megacam and lsstSim. 1. Delete all goodSeeing and chiSquared entries. 2. Delete the following list of what appear to be obsolete datasets: deepCoadd_depth deepCoadd_bg deepCoadd_bgRef deepCoadd_forced deepCoadd_forced_schema modelfits modelfits_schema deepCoadd_modelfits deepCoadd_modelfits_schema deepCoadd_multiModelfits deepCoadd_multiModelfits_schema diffsources_schema diffsources diffpsf diff crDiffimSrc crDiffimSrc_schema crDiffim 3. Move the currently active datasets for deepCoadd and imageDifference to obs_base/policy datasets.yaml and exposures.yaml. Where these differ (usually in the template) from what is already in the Mapper.paf files, keep the differences in the Mapper.paf files. This will insure that none of the of the current definitions of storage type/locations are modified. (see the previous comment for discussion)
            Hide
            jbosch Jim Bosch added a comment - - edited

            +1 on the proposal in Perry Gee's most recent comment.

            I'm glad to hear that the proposal for shrinking the mapper policy differences is something Kian-Tat Lim and Nate Pease are happy with. Moving the coadd and difference datasets to obs_base with template-only overrides in the per-camera policy files sounds like an excellent way to start implementing that. If all goes according to plan I imagine we could then move most of the datasets involve raw data IDs as well (though that's probably best considered out of scope for this RFC).

            Show
            jbosch Jim Bosch added a comment - - edited +1 on the proposal in Perry Gee 's most recent comment. I'm glad to hear that the proposal for shrinking the mapper policy differences is something Kian-Tat Lim and Nate Pease are happy with. Moving the coadd and difference datasets to obs_base with template-only overrides in the per-camera policy files sounds like an excellent way to start implementing that. If all goes according to plan I imagine we could then move most of the datasets involve raw data IDs as well (though that's probably best considered out of scope for this RFC).
            Hide
            pgee Perry Gee added a comment -

            The last comments in this RFC reflect the decisions which were made during the course of the discussion. The work is being done on DM-7884

            Show
            pgee Perry Gee added a comment - The last comments in this RFC reflect the decisions which were made during the course of the discussion. The work is being done on DM-7884
            pgee Perry Gee made changes -
            Resolution Done [ 10000 ]
            Status Proposed [ 10805 ] Adopted [ 10806 ]
            tjenness Tim Jenness made changes -
            Status Adopted [ 10806 ] Implemented [ 11105 ]

              People

              • Assignee:
                pgee Perry Gee
                Reporter:
                pgee Perry Gee
                Watchers:
                Jim Bosch, John Parejko, John Swinbank, Michael Wood-Vasey, Paul Price, Perry Gee
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:
                  Planned End:

                  Summary Panel