Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-3445

Overhaul the way the right colorterm dictionary is selected

    XMLWordPrintable

    Details

    • Team:
      Data Access and Database

      Description

      There are some configuration issues associated with reference catalogs, including:

      1) Determining the source of the reference catalog. This information is used in two ways:

      • mapping the filter used to take an exposure to a filter in a reference catalog, for matching sources to reference objects (e.g. for astrometry and photometry)
      • picking the correct set of colorterms based on the filter and type of CCD used for the science image and filters in the reference catalog
        At present this information is encoded in the eups version name of the astrometry_net_data package used. This requires the use of eups and assumes astrometry.net index files. This is the most serious immediate issue.

      2) Colorterm correction data is saved in obs_* packages as config files. This makes it harder to include a temporal component.

      This ticket is a request for a cleaner solution.

      The plan has been to wait for "the new butler", e.g. DM-2404 (taking this use case into account in its design) before designing the solution. However, if we need something sooner, one possibility for fixing the first problem is to put the source of the reference catalog into metadata attached to the reference catalog.

        Attachments

          Issue Links

            Activity

            No builds found.
            rowen Russell Owen created issue -
            rowen Russell Owen made changes -
            Field Original Value New Value
            Link This issue relates to DM-3442 [ DM-3442 ]
            Hide
            rhl Robert Lupton added a comment -

            At present this is done using the version name of the astrometry.net data file being used. This requires the use of eups and assumes astrometry.net index files.

            I don't understand this comment. Parsing a filename neither requires eups nor assumes astrometry.net. It is a bad solution, and I agree that we need a better one, but let's not confuse our various problems.

            Show
            rhl Robert Lupton added a comment - At present this is done using the version name of the astrometry.net data file being used. This requires the use of eups and assumes astrometry.net index files. I don't understand this comment. Parsing a filename neither requires eups nor assumes astrometry.net. It is a bad solution, and I agree that we need a better one, but let's not confuse our various problems.
            Hide
            rowen Russell Owen added a comment - - edited

            I'll add the word "eups" to "version name" in hopes that will clarify. If that is not sufficient then please restate your question more fully. You said something about file names, but I don't see how file names are relevant.

            Show
            rowen Russell Owen added a comment - - edited I'll add the word "eups" to "version name" in hopes that will clarify. If that is not sufficient then please restate your question more fully. You said something about file names, but I don't see how file names are relevant.
            rowen Russell Owen made changes -
            Description There are some configuration issues associated with reference catalogs, including:
            - mapping the right filter used to take an exposure to the right filter in a reference catalog for matching sources to reference objects (e.g. for astrometry and photometry)
            - picking the correct set of colorterms based on the filter and type of CCD used for the science image and similar data for the reference catalog

            At present this is done using the version name of the astrometry.net data file being used. This requires the use of eups and assumes astrometry.net index files. Also, much of the data ends up in a frozen config, which may not be a viable long-term solution; in particular, we hope to get the necessary information from the butler, at which point it will likely be too late to modify a config.

            This ticket is a request for a cleaner solution.

            We hope we can wait for butler 2 (taking this use case into account in its design) before designing the solution. However, if we need something sooner, one possibility is that we associate metadata with the reference catalog (something the reference object loader can do) and use that metadata to determine the mappings above.
            There are some configuration issues associated with reference catalogs, including:
            - mapping the right filter used to take an exposure to the right filter in a reference catalog for matching sources to reference objects (e.g. for astrometry and photometry)
            - picking the correct set of colorterms based on the filter and type of CCD used for the science image and similar data for the reference catalog

            At present this is done using the eups version name of the astrometry.net data file being used. This requires the use of eups and assumes astrometry.net index files. Also, much of the data ends up in a frozen config, which may not be a viable long-term solution; in particular, we hope to get the necessary information from the butler, at which point it will likely be too late to modify a config.

            This ticket is a request for a cleaner solution.

            We hope we can wait for butler 2 (taking this use case into account in its design) before designing the solution. However, if we need something sooner, one possibility is that we associate metadata with the reference catalog (something the reference object loader can do) and use that metadata to determine the mappings above.
            Hide
            rhl Robert Lupton added a comment -

            At present this is done using the eups version name of the astrometry.net data file being used

            Ah, I see the confusion. You didn't mean the eups version name of the "data file", but rather the version of the astrometry.net.data product being used.

            You can easily add a tiny config file to the astrometry.net.data directory giving this information – maybe that's what you're going to propose. Trivial and not adding any difficulties with upgrading or backwards compatibility.

            Show
            rhl Robert Lupton added a comment - At present this is done using the eups version name of the astrometry.net data file being used Ah, I see the confusion. You didn't mean the eups version name of the "data file", but rather the version of the astrometry.net.data product being used. You can easily add a tiny config file to the astrometry.net.data directory giving this information – maybe that's what you're going to propose. Trivial and not adding any difficulties with upgrading or backwards compatibility.
            rowen Russell Owen made changes -
            Description There are some configuration issues associated with reference catalogs, including:
            - mapping the right filter used to take an exposure to the right filter in a reference catalog for matching sources to reference objects (e.g. for astrometry and photometry)
            - picking the correct set of colorterms based on the filter and type of CCD used for the science image and similar data for the reference catalog

            At present this is done using the eups version name of the astrometry.net data file being used. This requires the use of eups and assumes astrometry.net index files. Also, much of the data ends up in a frozen config, which may not be a viable long-term solution; in particular, we hope to get the necessary information from the butler, at which point it will likely be too late to modify a config.

            This ticket is a request for a cleaner solution.

            We hope we can wait for butler 2 (taking this use case into account in its design) before designing the solution. However, if we need something sooner, one possibility is that we associate metadata with the reference catalog (something the reference object loader can do) and use that metadata to determine the mappings above.
            There are some configuration issues associated with reference catalogs, including:

            1) Determining the source of the reference catalog. This information is used in two ways:
            - mapping the filter used to take an exposure to a filter in a reference catalog, for matching sources to reference objects (e.g. for astrometry and photometry)
            - picking the correct set of colorterms based on the filter and type of CCD used for the science image and filters in the reference catalog
            At present this information is encoded in the eups version name of the astrometry_net_data package used. This requires the use of eups and assumes astrometry.net index files.

            2) Colorterm correction data is saved in obs_* packages as config files. This makes it harder to include a temporal component.

            This ticket is a request for a cleaner solution.

            The plan has been to wait for butler 2 (taking this use case into account in its design) before designing the solution. However, if we need something sooner, one possibility is that we associate metadata with the reference catalog (something the reference object loader can do) and use that metadata to determine the mappings above.
            rowen Russell Owen made changes -
            Watchers Jim Bosch, John Swinbank, Kian-Tat Lim, Paul Price, Robert Lupton, Russell Owen, Simon Krughoff, Tim Jenness [ Jim Bosch, John Swinbank, Kian-Tat Lim, Paul Price, Robert Lupton, Russell Owen, Simon Krughoff, Tim Jenness ] Jim Bosch, John Swinbank, Kian-Tat Lim, Nate Pease, Paul Price, Robert Lupton, Russell Owen, Simon Krughoff, Tim Jenness [ Jim Bosch, John Swinbank, Kian-Tat Lim, Nate Pease, Paul Price, Robert Lupton, Russell Owen, Simon Krughoff, Tim Jenness ]
            rowen Russell Owen made changes -
            Description There are some configuration issues associated with reference catalogs, including:

            1) Determining the source of the reference catalog. This information is used in two ways:
            - mapping the filter used to take an exposure to a filter in a reference catalog, for matching sources to reference objects (e.g. for astrometry and photometry)
            - picking the correct set of colorterms based on the filter and type of CCD used for the science image and filters in the reference catalog
            At present this information is encoded in the eups version name of the astrometry_net_data package used. This requires the use of eups and assumes astrometry.net index files.

            2) Colorterm correction data is saved in obs_* packages as config files. This makes it harder to include a temporal component.

            This ticket is a request for a cleaner solution.

            The plan has been to wait for butler 2 (taking this use case into account in its design) before designing the solution. However, if we need something sooner, one possibility is that we associate metadata with the reference catalog (something the reference object loader can do) and use that metadata to determine the mappings above.
            There are some configuration issues associated with reference catalogs, including:

            1) Determining the source of the reference catalog. This information is used in two ways:
            - mapping the filter used to take an exposure to a filter in a reference catalog, for matching sources to reference objects (e.g. for astrometry and photometry)
            - picking the correct set of colorterms based on the filter and type of CCD used for the science image and filters in the reference catalog
            At present this information is encoded in the eups version name of the astrometry_net_data package used. This requires the use of eups and assumes astrometry.net index files. This is the most serious immediate issue.

            2) Colorterm correction data is saved in obs_* packages as config files. This makes it harder to include a temporal component.

            This ticket is a request for a cleaner solution.

            The plan has been to wait for butler 2 (taking this use case into account in its design) before designing the solution. However, if we need something sooner, one possibility for fixing the first problem is to put the source of the reference catalog into metadata attached to the reference catalog.
            Hide
            rowen Russell Owen added a comment -

            Robert Lupton Thank you. Now I understand. I have reworded the ticket accordingly. Please see if you think it is clearer now.

            Show
            rowen Russell Owen added a comment - Robert Lupton Thank you. Now I understand. I have reworded the ticket accordingly. Please see if you think it is clearer now.
            tjenness Tim Jenness made changes -
            Link This issue is duplicated by DM-167 [ DM-167 ]
            rowen Russell Owen made changes -
            Description There are some configuration issues associated with reference catalogs, including:

            1) Determining the source of the reference catalog. This information is used in two ways:
            - mapping the filter used to take an exposure to a filter in a reference catalog, for matching sources to reference objects (e.g. for astrometry and photometry)
            - picking the correct set of colorterms based on the filter and type of CCD used for the science image and filters in the reference catalog
            At present this information is encoded in the eups version name of the astrometry_net_data package used. This requires the use of eups and assumes astrometry.net index files. This is the most serious immediate issue.

            2) Colorterm correction data is saved in obs_* packages as config files. This makes it harder to include a temporal component.

            This ticket is a request for a cleaner solution.

            The plan has been to wait for butler 2 (taking this use case into account in its design) before designing the solution. However, if we need something sooner, one possibility for fixing the first problem is to put the source of the reference catalog into metadata attached to the reference catalog.
            There are some configuration issues associated with reference catalogs, including:

            1) Determining the source of the reference catalog. This information is used in two ways:
            - mapping the filter used to take an exposure to a filter in a reference catalog, for matching sources to reference objects (e.g. for astrometry and photometry)
            - picking the correct set of colorterms based on the filter and type of CCD used for the science image and filters in the reference catalog
            At present this information is encoded in the eups version name of the astrometry_net_data package used. This requires the use of eups and assumes astrometry.net index files. This is the most serious immediate issue.

            2) Colorterm correction data is saved in obs_* packages as config files. This makes it harder to include a temporal component.

            This ticket is a request for a cleaner solution.

            The plan has been to wait for "the new butler", e.g. DM-2404 (taking this use case into account in its design) before designing the solution. However, if we need something sooner, one possibility for fixing the first problem is to put the source of the reference catalog into metadata attached to the reference catalog.
            rowen Russell Owen made changes -
            Link This issue relates to DM-2404 [ DM-2404 ]
            frossie Frossie Economou made changes -
            Assignee Kian-Tat Lim [ ktl ]
            Hide
            frossie Frossie Economou added a comment -

            There's a suggestion from John Swinbank that this is a WONTFIX because the butler should take care of it. Assigning to Kian-Tat Lim for comment.

            Show
            frossie Frossie Economou added a comment - There's a suggestion from John Swinbank that this is a WONTFIX because the butler should take care of it. Assigning to Kian-Tat Lim for comment.
            Hide
            swinbank John Swinbank added a comment -

            To be a bit more explicit: the issue text itself says that the Butler ought to take care of it, and this is a request for a workaround until it does. I see no slack in the schedule for us to spend effort on the workaround.

            Show
            swinbank John Swinbank added a comment - To be a bit more explicit: the issue text itself says that the Butler ought to take care of it, and this is a request for a workaround until it does. I see no slack in the schedule for us to spend effort on the workaround.
            Hide
            rowen Russell Owen added a comment -

            John Swinbank your reading of the ticket is exactly the opposite of what I intended when I wrote it. I'm sorry I was not clearer.

            What I want is a clean solution for this problem. That implies not a short-term workaround, but a good, solid fix. I think it should be fixed in the new butler, and that this represents an unusual and important use case for that butler.

            However, if somebody does have a better short-term solution, I'm willing to accept it.

            Show
            rowen Russell Owen added a comment - John Swinbank your reading of the ticket is exactly the opposite of what I intended when I wrote it. I'm sorry I was not clearer. What I want is a clean solution for this problem. That implies not a short-term workaround, but a good, solid fix. I think it should be fixed in the new butler, and that this represents an unusual and important use case for that butler. However, if somebody does have a better short-term solution, I'm willing to accept it.
            Hide
            swinbank John Swinbank added a comment -

            Sounds like we're all agreed that this is a Butler issue, anyway. I think it's up to Kian-Tat Lim & Nate Pease [X] whether they want to just adopt this issue or incorporate it into their other plans.

            Show
            swinbank John Swinbank added a comment - Sounds like we're all agreed that this is a Butler issue, anyway. I think it's up to Kian-Tat Lim & Nate Pease [X] whether they want to just adopt this issue or incorporate it into their other plans.
            Hide
            ktl Kian-Tat Lim added a comment -

            Nate Pease [X] and I talked about versioning of repositories and datasets a bit yesterday with regard to DM-4168. Versioning of reference catalog repositories and selection of the appropriate version would rely on that ticket, but that does not seem to be what is requested here.

            For this use case, it sounds like retrieving reference catalogs and colorterm dictionaries as Butler datasets would allow the filter and other information in the dataId to be mapped appropriately by camera-specific mapper code to the closest equivalents in the reference catalog and for validity dates to be specified for colorterms. It sounds like it might be easier to write the camera-specific code if a consistent metadata definition for reference catalog repositories were to be defined by the Science Pipelines teams. All of this can I think be done using the current Butler; some of the code would be in the daf.butlerUtils.CameraMapper class (or perhaps a subclass of the proposed new Repository class) but much would go into pipe_base and the obs_* packages. While implementing this will likely require some detailed understanding of the use case, it may be possible for Nate Pease [X] to write the code with some assistance, but that would mean postponement of more foundational and generic Butler work. Alternatively, I could sketch out how I think this would work to someone like Russell Owen.

            Please advise as to the urgency/priority of this work and who you think is best able to tackle it.

            Show
            ktl Kian-Tat Lim added a comment - Nate Pease [X] and I talked about versioning of repositories and datasets a bit yesterday with regard to DM-4168 . Versioning of reference catalog repositories and selection of the appropriate version would rely on that ticket, but that does not seem to be what is requested here. For this use case, it sounds like retrieving reference catalogs and colorterm dictionaries as Butler datasets would allow the filter and other information in the dataId to be mapped appropriately by camera-specific mapper code to the closest equivalents in the reference catalog and for validity dates to be specified for colorterms. It sounds like it might be easier to write the camera-specific code if a consistent metadata definition for reference catalog repositories were to be defined by the Science Pipelines teams. All of this can I think be done using the current Butler; some of the code would be in the daf.butlerUtils.CameraMapper class (or perhaps a subclass of the proposed new Repository class) but much would go into pipe_base and the obs_* packages. While implementing this will likely require some detailed understanding of the use case, it may be possible for Nate Pease [X] to write the code with some assistance, but that would mean postponement of more foundational and generic Butler work. Alternatively, I could sketch out how I think this would work to someone like Russell Owen . Please advise as to the urgency/priority of this work and who you think is best able to tackle it.
            Hide
            rhl Robert Lupton added a comment -

            I agree that we need to think a bit harder about what is being requested.

            My guess is that what the pipelines want to say is

            Give me the colorterms for using PanSTARRS photometry version 2.3 with the LSST camera on 2028-02-05

            Somewhere on the pipeline side could resolve "colorterms for using PanSTARRS photometry version 2.3 with the LSST camera" to a filename, but then we'd need to add mapper entries for N catalogues to each obs_camera. That would be a possibility. Another would be for the pipelines to define the template for the N*M possibilities, then pass that to the butler with appropriate dataId entries to resolve to a particular file (modulo date – I think we all agree that this is a butler issue).

            Show
            rhl Robert Lupton added a comment - I agree that we need to think a bit harder about what is being requested. My guess is that what the pipelines want to say is Give me the colorterms for using PanSTARRS photometry version 2.3 with the LSST camera on 2028-02-05 Somewhere on the pipeline side could resolve "colorterms for using PanSTARRS photometry version 2.3 with the LSST camera" to a filename, but then we'd need to add mapper entries for N catalogues to each obs_camera. That would be a possibility. Another would be for the pipelines to define the template for the N*M possibilities, then pass that to the butler with appropriate dataId entries to resolve to a particular file (modulo date – I think we all agree that this is a butler issue).
            tjenness Tim Jenness made changes -
            Assignee Kian-Tat Lim [ ktl ] Nate Pease [ npease ]
            tjenness Tim Jenness made changes -
            Team Data Access and Database [ 10204 ]
            gcomoretto Gabriele Comoretto [X] (Inactive) made changes -
            Remote Link This issue links to "Page (Confluence)" [ 21439 ]
            Hide
            ktl Kian-Tat Lim added a comment -

            While RFC-624 doesn't solve this problem, we should make sure it is compatible with potential future solutions to this problem (which in turn should likely be the subject of Community and/or RFC discussion).

            Show
            ktl Kian-Tat Lim added a comment - While RFC-624 doesn't solve this problem, we should make sure it is compatible with potential future solutions to this problem (which in turn should likely be the subject of Community and/or RFC discussion).
            tjenness Tim Jenness made changes -
            Component/s obs_lsstSim [ 10764 ]
            Hide
            tjenness Tim Jenness added a comment -

            Can someone tell me whether this ticket is still relevant and also whether it needs new butler functionality as implied in the description.

            Show
            tjenness Tim Jenness added a comment - Can someone tell me whether this ticket is still relevant and also whether it needs new butler functionality as implied in the description.
            Hide
            jbosch Jim Bosch added a comment -

            The issue identifies a still-existing problem - the colorterms system is fragile - but it's a low-priority one to me because I think what we have now is mostly relevant for precursor datasets; for LSST DRP, where FGCM will have the final say, and for LSST AP, where I expect us to use a DRP-generated reference catalog, the role of colorterms in the future is unclear, and certainly diminished.

            I also don't think I see any middleware work to do here anymore - I think the main limitation was the difficulty of adding new dataset types in the Gen2. I think in Gen3 it'd be quite natural to put the colorterms data in butler datasets with instrument+physical_filter dimension and a dataset type name based on the corresponding reference catalog name. There would still be fragility in making sure the reference catalog and colorterms were loaded consistently, but that's something that pipeline contracts would help with.

            Show
            jbosch Jim Bosch added a comment - The issue identifies a still-existing problem - the colorterms system is fragile - but it's a low-priority one to me because I think what we have now is mostly relevant for precursor datasets; for LSST DRP, where FGCM will have the final say, and for LSST AP, where I expect us to use a DRP-generated reference catalog, the role of colorterms in the future is unclear, and certainly diminished. I also don't think I see any middleware work to do here anymore - I think the main limitation was the difficulty of adding new dataset types in the Gen2. I think in Gen3 it'd be quite natural to put the colorterms data in butler datasets with instrument + physical_filter dimension and a dataset type name based on the corresponding reference catalog name. There would still be fragility in making sure the reference catalog and colorterms were loaded consistently, but that's something that pipeline contracts would help with.
            tjenness Tim Jenness made changes -
            Resolution Done [ 10000 ]
            Status To Do [ 10001 ] Invalid [ 11005 ]
            Hide
            tjenness Tim Jenness added a comment -

            Closing this since it seems from the discussion that it's all under control.

            Show
            tjenness Tim Jenness added a comment - Closing this since it seems from the discussion that it's all under control.

              People

              Assignee:
              npease Nate Pease [X] (Inactive)
              Reporter:
              rowen Russell Owen
              Watchers:
              Frossie Economou, Jim Bosch, John Swinbank, Kian-Tat Lim, Nate Pease [X] (Inactive), Paul Price, Robert Lupton, Russell Owen, Simon Krughoff, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins Builds

                  No builds found.