# Gen 2->3 conversion of DECam repositories can give duplicate defects

XMLWordPrintable

#### Details

• Type: Bug
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
• Story Points:
2
• Team:
Architecture
• Urgent?:
No

#### Description

If I convert a DECam Gen 2 repository containing defects, the conversion succeeds, but running ProcessCcd on the result gives an error:

 lsst.pipe.base.connections.ScalarError: Found multiple datasets {instrument: DECam, calibration_label: defects/2015-01-05T01:15:00/60, detector: 60}, {instrument: DECam, calibration_label: gen2/defects_2015-01-05T01:15:00_060, detector: 60} for scalar connection defects (defects). 

The immediate cause is that one copy of the defects is from the Gen 2 repository, the other copy was ingested during the conversion.

Discussion on #dm-middleware suggests that this is due to conflicting responsibilities:

The above behavior implies that DECam repositories with defects cannot be converted using the default configs, which goes against our general aim to provide working defaults for all Tasks.

Possible solutions:

• Do away with ConvertRepoConfig.curatedCalibrations, as proposed by Tim Jenness. However, this does raise the question of how to handle observatory-specific curated calibrations.
• Have DECam explicitly register defects in curatedCalibrations, as HSC does. May be hard to scale to other instruments.
• Duplicate the dataset types hardcoded into Instrument in the default ConvertRepoConfig.curatedCalibrations. Won't benefit configs that overwrite this field instead of appending to it.

#### Activity

No builds found.
Krzysztof Findeisen created issue -
Field Original Value New Value
Description If I convert a DECam Gen 2 repository containing defects, the conversion succeeds, but running {{ProcessCcd}} on the result gives an error:
{noformat}
lsst.pipe.base.connections.ScalarError: Found multiple datasets {instrument: DECam, calibration_label: defects/2015-01-05T01:15:00/60, detector: 60}, {instrument: DECam, calibration_label: gen2/defects_2015-01-05T01:15:00_060, detector: 60} for scalar connection defects (defects).
{noformat}
The immediate cause is that one copy of the defects is from the Gen 2 repository, the other copy was ingested during the conversion.

Discussion on [#dm-middleware|https://lsstc.slack.com/archives/C2JPT1KB7/p1590719387081900] suggests that this is due to conflicting responsibilities:
* {{obs.base.Instrument}} has some [hard-coded definitions|https://github.com/lsst/obs_base/blob/a51665517dd0ecacd8d72fbcead186475fcf66d3/python/lsst/obs/base/instrument.py#L38-L45] of curated files, which are processed any time {{ConvertRepoConfig.doWriteCuratedCalibrations}} is set. Defects are included.
* {{ConvertRepoConfig}} also has a [{{curatedCalibrations}} field|https://github.com/lsst/obs_base/blob/7c071951e367c4bef83840b75921f3a2f3af29cb/python/lsst/obs/base/gen2to3/convertRepo.py#L252-L263], which presumably allows other datasets to be handled using {{writeCuratedCalibrations}}. This list does not include defects by default.

Both HSC and DECam override the defaults for {{ConvertRepoConfig}}, as follows:
* HSC [adds defects to {{ConvertRepoConfig.curatedCalibrations}}|https://github.com/lsst/obs_subaru/blob/c641360ad764cf8db63ecf1d14af6c93660160ff/config/hsc/convertRepo.py#L35-L36]. The bug described above *does not* occur.
* DECam makes no changes to {{curatedCalibrations}}, and its [config override file|https://github.com/lsst/obs_decam/blob/8567ae41add358da7b9af65fc81c39e39078b975/config/convertRepo.py] makes no mention of defects. The bug *does* occur.

Possible solutions:
* Do away with {{ConvertRepoConfig.curatedCalibrations}}, as proposed by [~tjenness]. However, this does raise the question of how to handle observatory-specific curated calibrations; HSC in particular has a fairly long list of them.
* Have DECam explicitly register defects in {{curatedCalibrations}}, as HSC does.
* Duplicate the dataset types hardcoded into {{Instrument}} in the default {{ConvertRepoConfig}}. May break configs that overwrite this field instead of appending to it.
If I convert a DECam Gen 2 repository containing defects, the conversion succeeds, but running {{ProcessCcd}} on the result gives an error:
{noformat}
lsst.pipe.base.connections.ScalarError: Found multiple datasets {instrument: DECam, calibration_label: defects/2015-01-05T01:15:00/60, detector: 60}, {instrument: DECam, calibration_label: gen2/defects_2015-01-05T01:15:00_060, detector: 60} for scalar connection defects (defects).
{noformat}
The immediate cause is that one copy of the defects is from the Gen 2 repository, the other copy was ingested during the conversion.

Discussion on [#dm-middleware|https://lsstc.slack.com/archives/C2JPT1KB7/p1590719387081900] suggests that this is due to conflicting responsibilities:
* {{obs.base.Instrument}} has some [hard-coded definitions|https://github.com/lsst/obs_base/blob/a51665517dd0ecacd8d72fbcead186475fcf66d3/python/lsst/obs/base/instrument.py#L38-L45] of curated files, which are processed any time {{ConvertRepoConfig.doWriteCuratedCalibrations}} is set. Defects are included.
* {{ConvertRepoConfig}} also has a [{{curatedCalibrations}} field|https://github.com/lsst/obs_base/blob/7c071951e367c4bef83840b75921f3a2f3af29cb/python/lsst/obs/base/gen2to3/convertRepo.py#L252-L263], which presumably allows other datasets to be handled using {{writeCuratedCalibrations}}. This list does not include defects by default.

Both HSC and DECam override the defaults for {{ConvertRepoConfig}}, as follows:
* HSC [adds defects to {{ConvertRepoConfig.curatedCalibrations}}|https://github.com/lsst/obs_subaru/blob/c641360ad764cf8db63ecf1d14af6c93660160ff/config/hsc/convertRepo.py#L35-L36]. The bug described above *does not* occur.
* DECam makes no changes to {{curatedCalibrations}}, and its [config override file|https://github.com/lsst/obs_decam/blob/8567ae41add358da7b9af65fc81c39e39078b975/config/convertRepo.py] makes no mention of defects. The bug *does* occur.

Possible solutions:
* Do away with {{ConvertRepoConfig.curatedCalibrations}}, as proposed by [~tjenness]. However, this does raise the question of how to handle observatory-specific curated calibrations.
* Have DECam explicitly register defects in {{curatedCalibrations}}, as HSC does.
* Duplicate the dataset types hardcoded into {{Instrument}} in the default {{ConvertRepoConfig}}. May break configs that overwrite this field instead of appending to it.
 Description If I convert a DECam Gen 2 repository containing defects, the conversion succeeds, but running {{ProcessCcd}} on the result gives an error: {noformat} lsst.pipe.base.connections.ScalarError: Found multiple datasets {instrument: DECam, calibration_label: defects/2015-01-05T01:15:00/60, detector: 60}, {instrument: DECam, calibration_label: gen2/defects_2015-01-05T01:15:00_060, detector: 60} for scalar connection defects (defects). {noformat} The immediate cause is that one copy of the defects is from the Gen 2 repository, the other copy was ingested during the conversion. Discussion on [#dm-middleware|https://lsstc.slack.com/archives/C2JPT1KB7/p1590719387081900] suggests that this is due to conflicting responsibilities: * {{obs.base.Instrument}} has some [hard-coded definitions|https://github.com/lsst/obs_base/blob/a51665517dd0ecacd8d72fbcead186475fcf66d3/python/lsst/obs/base/instrument.py#L38-L45] of curated files, which are processed any time {{ConvertRepoConfig.doWriteCuratedCalibrations}} is set. Defects are included. * {{ConvertRepoConfig}} also has a [{{curatedCalibrations}} field|https://github.com/lsst/obs_base/blob/7c071951e367c4bef83840b75921f3a2f3af29cb/python/lsst/obs/base/gen2to3/convertRepo.py#L252-L263], which presumably allows other datasets to be handled using {{writeCuratedCalibrations}}. This list does not include defects by default. Both HSC and DECam override the defaults for {{ConvertRepoConfig}}, as follows: * HSC [adds defects to {{ConvertRepoConfig.curatedCalibrations}}|https://github.com/lsst/obs_subaru/blob/c641360ad764cf8db63ecf1d14af6c93660160ff/config/hsc/convertRepo.py#L35-L36]. The bug described above *does not* occur. * DECam makes no changes to {{curatedCalibrations}}, and its [config override file|https://github.com/lsst/obs_decam/blob/8567ae41add358da7b9af65fc81c39e39078b975/config/convertRepo.py] makes no mention of defects. The bug *does* occur. Possible solutions: * Do away with {{ConvertRepoConfig.curatedCalibrations}}, as proposed by [~tjenness]. However, this does raise the question of how to handle observatory-specific curated calibrations. * Have DECam explicitly register defects in {{curatedCalibrations}}, as HSC does. * Duplicate the dataset types hardcoded into {{Instrument}} in the default {{ConvertRepoConfig}}. May break configs that overwrite this field instead of appending to it. If I convert a DECam Gen 2 repository containing defects, the conversion succeeds, but running {{ProcessCcd}} on the result gives an error: {noformat} lsst.pipe.base.connections.ScalarError: Found multiple datasets {instrument: DECam, calibration_label: defects/2015-01-05T01:15:00/60, detector: 60}, {instrument: DECam, calibration_label: gen2/defects_2015-01-05T01:15:00_060, detector: 60} for scalar connection defects (defects). {noformat} The immediate cause is that one copy of the defects is from the Gen 2 repository, the other copy was ingested during the conversion. Discussion on [#dm-middleware|https://lsstc.slack.com/archives/C2JPT1KB7/p1590719387081900] suggests that this is due to conflicting responsibilities: * {{obs.base.Instrument}} has some [hard-coded definitions|https://github.com/lsst/obs_base/blob/a51665517dd0ecacd8d72fbcead186475fcf66d3/python/lsst/obs/base/instrument.py#L38-L45] of curated files, which are processed any time {{ConvertRepoConfig.doWriteCuratedCalibrations}} is set. Defects are included. * {{ConvertRepoConfig}} also has a [{{curatedCalibrations}} field|https://github.com/lsst/obs_base/blob/7c071951e367c4bef83840b75921f3a2f3af29cb/python/lsst/obs/base/gen2to3/convertRepo.py#L252-L263], which presumably allows other datasets to be handled using {{writeCuratedCalibrations}}. This list does not include defects by default. Both HSC and DECam override the defaults for {{ConvertRepoConfig}}, as follows: * HSC [adds defects to {{ConvertRepoConfig.curatedCalibrations}}|https://github.com/lsst/obs_subaru/blob/c641360ad764cf8db63ecf1d14af6c93660160ff/config/hsc/convertRepo.py#L35-L36]. The bug described above *does not* occur. * DECam makes no changes to {{curatedCalibrations}}, and its [config override file|https://github.com/lsst/obs_decam/blob/8567ae41add358da7b9af65fc81c39e39078b975/config/convertRepo.py] makes no mention of defects. The bug *does* occur. Possible solutions: * Do away with {{ConvertRepoConfig.curatedCalibrations}}, as proposed by [~tjenness]. However, this does raise the question of how to handle observatory-specific curated calibrations. * Have DECam explicitly register defects in {{curatedCalibrations}}, as HSC does. * Duplicate the dataset types hardcoded into {{Instrument}} in the default {{ConvertRepoConfig.curatedCalibrations}}. May break configs that overwrite this field instead of appending to it.
Hide
Tim Jenness added a comment -

If conversion calls writeCuratedCalibrations at all then we know exactly what datasets will be written for a given instrument. At the moment you can't tell writeCuratedCalibrations to skip certain items or restrict itself to specific calibrations. If you could do that then the config option would make more sense.

Show
Tim Jenness added a comment - If conversion calls writeCuratedCalibrations at all then we know exactly what datasets will be written for a given instrument. At the moment you can't tell writeCuratedCalibrations to skip certain items or restrict itself to specific calibrations. If you could do that then the config option would make more sense.
 Labels gen2-deprecation-blocker gen3-middleware
Hide
Tim Jenness added a comment -

Does anyone want to object to my plan of removing the config item completely and having the Instrument class report the list of curated calibrations that it is going to write itself?

Show
Tim Jenness added a comment - Does anyone want to object to my plan of removing the config item completely and having the Instrument class report the list of curated calibrations that it is going to write itself?
Hide
Krzysztof Findeisen added a comment -

By "the config item", do you mean ConvertRepoConfig.curatedCalibrations, ConvertRepoConfig.datasetIgnorePatterns, or both? Removing both would cause trouble for converting some Gen 2 repositories.

Show
Krzysztof Findeisen added a comment - By "the config item", do you mean ConvertRepoConfig.curatedCalibrations , ConvertRepoConfig.datasetIgnorePatterns , or both? Removing both would cause trouble for converting some Gen 2 repositories.
Hide
Tim Jenness added a comment -

I mean that the list of curatedCalibrations in a config would no longer be needed.

I think it makes sense for the code that uses datasetIgnorePatterns to ignore the curatedCalibrations known to the instrument automatically.

Show
Tim Jenness added a comment - I mean that the list of curatedCalibrations in a config would no longer be needed. I think it makes sense for the code that uses datasetIgnorePatterns to ignore the curatedCalibrations known to the instrument automatically.
 Story Points 2
 Status To Do [ 10001 ] In Progress [ 3 ]
 Assignee Tim Jenness [ tjenness ]
 Team Data Release Production [ 10301 ] Architecture [ 10304 ]
 Component/s obs_lsst [ 16504 ]
Hide
Tim Jenness added a comment -

Krzysztof Findeisen this should be a small review. No rush for it. I've removed the config item completely and now the converter asks Instrument for the calibrations that should be skipped.

Show
Tim Jenness added a comment - Krzysztof Findeisen this should be a small review. No rush for it. I've removed the config item completely and now the converter asks Instrument for the calibrations that should be skipped.
 Reviewers Krzysztof Findeisen [ krzys ] Status In Progress [ 3 ] In Review [ 10004 ]
 Watchers John Parejko, Krzysztof Findeisen, Tim Jenness [ John Parejko, Krzysztof Findeisen, Tim Jenness ] Jim Bosch, John Parejko, Krzysztof Findeisen, Tim Jenness [ Jim Bosch, John Parejko, Krzysztof Findeisen, Tim Jenness ]
 Status In Review [ 10004 ] Reviewed [ 10101 ]
 Resolution Done [ 10000 ] Status Reviewed [ 10101 ] Done [ 10002 ]

#### People

Assignee:
Tim Jenness
Reporter:
Krzysztof Findeisen
Reviewers:
Krzysztof Findeisen
Watchers:
Jim Bosch, John Parejko, Krzysztof Findeisen, Tim Jenness