Uploaded image for project: 'Request For Comments'
  1. Request For Comments
  2. RFC-697

Change CalibrateTask refcat defaults to Gaia DR2 for astrometry and PS1 for photometry

    Details

    • Type: RFC
    • Status: Proposed
    • Resolution: Unresolved
    • Component/s: DM

      Description

      Currently, CalibrateTask uses the default LoadIndexedReferenceObjectTask dataset name, "cal_ref_cat", for both the astrometry and photometry reference catalogs. This means that each obs package must configure its reference catalogs individually: there is no reference catalog named "cal_ref_cat". This results in a lot of duplicated config blocks across the obs packages that slowly drift in and out of sync as our reference catalogs are updated, and results in user confusion because they don't know what a "cal_ref_cat" is.

      We now have two good quality reference catalogs that should be acceptable defaults for most people wanting to process data with the stack: gaia dr2 and ps1_pv3.

      I propose that we use CalibrateConfig.setDefaults() to set these defaults, and remove such settings from the obs packages, except for any specific filterMaps they may need for PS1 (e.g. `i2`->`i`):

      astromRefObjLoader.ref_dataset_name = "gaia_dr2_20200414"
      astromRefObjLoader.filterMap = {"u": "phot_g_mean",
                                      "g": "phot_g_mean",
                                      "r": "phot_g_mean",
                                      "i": "phot_g_mean",
                                      "z": "phot_g_mean",
                                      "y": "phot_g_mean"}
      photoRefObjLoader.ref_dataset_name = "ps1_pv3_3pi_20170110"
      

      I am aware that PS1 is not ideal for LSST operations because of depth and sky coverage, but nothing better currently exists. When a better photometry reference catalog becomes available (e.g. when Gaia releases their coarse spectra--assuming we have a way to use it), we can update the photometry default to better reflect LSSTCam processing.

      Any user that doesn't have copies of the above two catalogs can override with their own values: the only change for such a user is that the error message they receive with the default values will refer to an actual existing refcat (potentially prompting them to search for where to get a copy), instead of one that does not exist anywhere.

      HSC can keep their config overrides (PS1 for both) after this is implemented while they explore how Gaia DR2 changes their astrometric calibration.

        Attachments

          Issue Links

            Activity

            Hide
            jbosch Jim Bosch added a comment -

            I'd argue for a bit more than that:

            • "i2" shouldn't appear anywhere (it should be "HSC-I2" or just "i")
            • no dict that might ever have "HSC-I2" in its keys should ever have "i" in its keys (ditto for values - though a mapping from one to the other is of course fine).

            In other words, those are different kind of filters and unless we're conforming very high level inputs from science users, we should not mix them.

            Show
            jbosch Jim Bosch added a comment - I'd argue for a bit more than that: "i2" shouldn't appear anywhere (it should be "HSC-I2" or just "i") no dict that might ever have "HSC-I2" in its keys should ever have "i" in its keys (ditto for values - though a mapping from one to the other is of course fine). In other words, those are different kind of filters and unless we're conforming very high level inputs from science users, we should not mix them.
            Hide
            Parejkoj John Parejko added a comment -

            Even though for these specific purposes (finding the refcat filter to use with the given survey filter, but not for doing any photometric conversions), "i" and "i2" are to be treated identically?

            Show
            Parejkoj John Parejko added a comment - Even though for these specific purposes (finding the refcat filter to use with the given survey filter, but not for doing any photometric conversions), "i" and "i2" are to be treated identically?
            Hide
            jbosch Jim Bosch added a comment -

            If you're saying that the keys of this dictionary are logically abstract filters, and this is basically a workaround for the fact that the afw Filter stuff gives us strings like "i2" instead of "i" sometimes, fine - but we should acknowledge that as a workaround, and otherwise be consistent about these always being camera-agnostic filter names.

            Well, mostly fine - I'm a little worried that we're trading away flexibility to get convenience by saying that the keys here are camera agnostic. Should we always assume there is one abstract, instrument-agnostic filter associated with a single-epoch image, which does have a well-defined physical, instrument-specific filter? Gen3 does assume exactly that in its current data model, but I've long planned to eventually make that a many-to-many relationship with some kind of "filter system" that defines a particular mapping from physical->abstract. Put another way: we're starting with a well-defined physical filter when we want to use this mapping. If we need to obtain an abstract filter to use as a key for this mapping, we might need to also specify what filter system we're using.

            A lot of these concerns are not really things we can do anything about yet - I'm just trying to make the point that all of this RFC is really building on sand because filters are such a mess overall, so we should aim to minimize disruption and effort in anything we do right now, rather than aiming to really "get it right".

            Show
            jbosch Jim Bosch added a comment - If you're saying that the keys of this dictionary are logically abstract filters, and this is basically a workaround for the fact that the afw Filter stuff gives us strings like "i2" instead of "i" sometimes, fine - but we should acknowledge that as a workaround, and otherwise be consistent about these always being camera-agnostic filter names. Well, mostly fine - I'm a little worried that we're trading away flexibility to get convenience by saying that the keys here are camera agnostic. Should we always assume there is one abstract, instrument-agnostic filter associated with a single-epoch image, which does have a well-defined physical, instrument-specific filter? Gen3 does assume exactly that in its current data model, but I've long planned to eventually make that a many-to-many relationship with some kind of "filter system" that defines a particular mapping from physical->abstract. Put another way: we're starting with a well-defined physical filter when we want to use this mapping. If we need to obtain an abstract filter to use as a key for this mapping, we might need to also specify what filter system we're using. A lot of these concerns are not really things we can do anything about yet - I'm just trying to make the point that all of this RFC is really building on sand because filters are such a mess overall, so we should aim to minimize disruption and effort in anything we do right now, rather than aiming to really "get it right".
            Hide
            nlust Nate Lust added a comment -

            I just wanted to chime in, and say this leads to the larger questions of how configs are organized in general. Up until now we have always associated a config with a python task in a 1:1 way. The built in assumption was that a task had one role to to, so there would be one configuration for that task, with overrides for instruments, and possibly a specific processing.

            In keeping with the theme "gen 3 will fix everything", I want to highlight the role that pipelines will play in the near future and how they will fit in with configs. Pipelines are a way to define an action or outcome, and all the steps needed to achieve that, such as single frame processing or difference imaging.

            Each of these actions are comprised of one or more tasks and their configurations, and you can still override pipeline configurations. This solves a problem that is hinted at with this rfc, and has long been debated without realizing the underlying problem is the 1:1 config overrides in obs packages to tasks. While there is certainly camera specific config options that must be set for a given instrument because of things like filters, or electronics or the nature of the data, we have fallen into the trap of putting things into camera configs because a) its a centralized location and b) they are related to the fact that data from those cameras are often used in to do specific actions, so configs for those actions were put in the obs package.

            This model breaks down when tasks are not used in a standard way, or are used more than once, and causes the need for config override files that users maintain and pass around. One such example is the monthly rc processing that maintains its own flow and configs.

            As mentioned pipeline files are tied to specific actions a user wants to do, and provide a way for configs needed for that action to be stored with the definition of that action. A good example of that can bee seen here https://github.com/lsst/cp_pipe/tree/master/pipelines. ISRTask is used in each of these pipelines, but it has a different role, and thus a different config, though there is still a notion of shared defaults at a camera level.

            I said all that to say I hope we can start a discussion where we decide what is the bare minimum config to task configuration needed for a camera, and then start associating configs with pipelines. If there is a need to share configs between pipelines we would then do that in a separate config override file that each pipeline can read and override, rather than what we do now where most overrides are stored at the camera level or each user needing to maintain their own override files.

            This may be long and rambling and unclear, so I am happy to clarify anything if people ask, but I wanted to get this comment out there.

            Show
            nlust Nate Lust added a comment - I just wanted to chime in, and say this leads to the larger questions of how configs are organized in general. Up until now we have always associated a config with a python task in a 1:1 way. The built in assumption was that a task had one role to to, so there would be one configuration for that task, with overrides for instruments, and possibly a specific processing. In keeping with the theme "gen 3 will fix everything", I want to highlight the role that pipelines will play in the near future and how they will fit in with configs. Pipelines are a way to define an action or outcome, and all the steps needed to achieve that, such as single frame processing or difference imaging. Each of these actions are comprised of one or more tasks and their configurations, and you can still override pipeline configurations. This solves a problem that is hinted at with this rfc, and has long been debated without realizing the underlying problem is the 1:1 config overrides in obs packages to tasks. While there is certainly camera specific config options that must be set for a given instrument because of things like filters, or electronics or the nature of the data, we have fallen into the trap of putting things into camera configs because a) its a centralized location and b) they are related to the fact that data from those cameras are often used in to do specific actions, so configs for those actions were put in the obs package. This model breaks down when tasks are not used in a standard way, or are used more than once, and causes the need for config override files that users maintain and pass around. One such example is the monthly rc processing that maintains its own flow and configs. As mentioned pipeline files are tied to specific actions a user wants to do, and provide a way for configs needed for that action to be stored with the definition of that action. A good example of that can bee seen here https://github.com/lsst/cp_pipe/tree/master/pipelines . ISRTask is used in each of these pipelines, but it has a different role, and thus a different config, though there is still a notion of shared defaults at a camera level. I said all that to say I hope we can start a discussion where we decide what is the bare minimum config to task configuration needed for a camera, and then start associating configs with pipelines. If there is a need to share configs between pipelines we would then do that in a separate config override file that each pipeline can read and override, rather than what we do now where most overrides are stored at the camera level or each user needing to maintain their own override files. This may be long and rambling and unclear, so I am happy to clarify anything if people ask, but I wanted to get this comment out there.
            Hide
            Parejkoj John Parejko added a comment -

            After offline discussion with Jim Bosch, we decided that the best approach would be to use the defaultFilter config in the refcats for Gaia DR2, since there really isn't a "filter map", but rather "always use this filter, no matter what". This also should make it a bit more clear that Gaia is to be used for astrometry only, and we can make that more clear in the task documentation (e.g., add a note to the defaultFilter docs to the effect of "this option is useful for astrometric refcats, but should not be used for refcats that are used for photometry"). I think this takes care of Yusra AlSayyad and Jim Bosch's objections about defining default filterMaps, and it even further simplifies refcat configuration: if you're using this RFC's Gaia defaults, you won't have to specify anything at all in your obs package.

            However, some experimenting with this in jointcal suggests that our defaultFilter implementation in the loader is broken: if defaultFilter is supplied without a filterMap, and the user specifies filter=something in loadSkyCircle() (as one normally does, because when you load the refcat you don't know how its filter mappings are configured), getRefFluxField() will fail. I think the fix is fairly trivial (try camFlux if x_camflux doesn't exist), but this is yet another point on the "our refcat filter system poorly specified and too complicated" list. Whether finally implementing RFC-624 will help with that is a separate question.

            Show
            Parejkoj John Parejko added a comment - After offline discussion with Jim Bosch , we decided that the best approach would be to use the defaultFilter config in the refcats for Gaia DR2, since there really isn't a "filter map", but rather "always use this filter, no matter what". This also should make it a bit more clear that Gaia is to be used for astrometry only, and we can make that more clear in the task documentation (e.g., add a note to the defaultFilter docs to the effect of "this option is useful for astrometric refcats, but should not be used for refcats that are used for photometry"). I think this takes care of Yusra AlSayyad and Jim Bosch 's objections about defining default filterMaps , and it even further simplifies refcat configuration: if you're using this RFC's Gaia defaults, you won't have to specify anything at all in your obs package. However, some experimenting with this in jointcal suggests that our defaultFilter implementation in the loader is broken: if defaultFilter is supplied without a filterMap , and the user specifies filter=something in loadSkyCircle() (as one normally does, because when you load the refcat you don't know how its filter mappings are configured), getRefFluxField() will fail. I think the fix is fairly trivial (try camFlux if x_camflux doesn't exist), but this is yet another point on the "our refcat filter system poorly specified and too complicated" list. Whether finally implementing RFC-624 will help with that is a separate question.

              People

              • Assignee:
                Parejkoj John Parejko
                Reporter:
                Parejkoj John Parejko
                Watchers:
                Eli Rykoff, Jim Bosch, John Parejko, John Swinbank, Krzysztof Findeisen, Meredith Rawls, Nate Lust, Paul Price, Yusra AlSayyad
              • Votes:
                0 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Planned End:

                  Summary Panel