Uploaded image for project: 'Request For Comments'
  1. Request For Comments
  2. RFC-441

Request to adopt HiPS and MOC as standard DM data products

    XMLWordPrintable

    Details

    • Type: RFC
    • Status: Adopted
    • Resolution: Unresolved
    • Component/s: DM
    • Labels:

      Description

      According DMS-REQ-0329, Data Release Production is responsible for generating “co-adds suitable for use in all-sky visualization tools”.

      HiPS (http://www.ivoa.net/documents/HiPS/index.html) and MOC(http://www.ivoa.net/documents/MOC/index.html) are becoming the standard in representing all-sky images and coverages. This RFC requests that DM adopt those two standards and make the corresponding data products to fulfill the above DMS requirement.

      This does not replace any other data products that DM already committed to generate. 

      This could allow us to use HEALPix to do all-sky binning, and spatial index. 

        Attachments

          Issue Links

            Activity

            Hide
            jbosch Jim Bosch added a comment - - edited

            I've recently finally understood the big disadvantage of HEALPix relative to its primary competitors (HTM, Q3C, and Google's S2): the boundaries of HEALPix pixels are not great circles, and that makes it considerably harder to relate them to spherical polygons.

            I don't think that's a reason to remove HEALPix from consideration; HEALPix has a clear advantage in external tooling and a possible advantage in having equal area pixels (i.e HEALPix is definitely equal area and the others definitely aren't, but I don't have a good sense for how important that is).  But it does make me a lot more cautious about adopting it.

            I'd also say that I don't think that great-circle boundaries are particularly important for the multi-level visualization context that HiPS is concerned with, but I think it is quite important for the coverage maps that MOC is used for.  Of course, I also imagine it'd be quite convenient to use the same sky pixelization for both.

            Personally I'd feel most comfortable if we could defer discussion of this until we have a chance to understand the use cases and performance implications better; I think it's possible we'd want to instead propose to the VO a MOC-like standard using some other pixelization (and perhaps a HiPS-like counterpart).  However, I would also be okay with promising to support HiPS/MOC now on the basis of them being important interchange formats, with the understanding that this would not imply that they would be our internal formats, and in particular that conversion from the native format to them could be both lossy and inefficient.

             

            Show
            jbosch Jim Bosch added a comment - - edited I've recently finally understood the big disadvantage of HEALPix relative to its primary competitors (HTM, Q3C, and Google's S2): the boundaries of HEALPix pixels are not great circles, and that makes it considerably harder to relate them to spherical polygons. I don't think that's a reason to remove HEALPix from consideration; HEALPix has a clear advantage in external tooling and a possible advantage in having equal area pixels (i.e HEALPix is definitely equal area and the others definitely aren't, but I don't have a good sense for how important that is).  But it does make me a lot more cautious about adopting it. I'd also say that I don't think that great-circle boundaries are particularly important for the multi-level visualization context that HiPS is concerned with, but I think it is quite important for the coverage maps that MOC is used for.  Of course, I also imagine it'd be quite convenient to use the same sky pixelization for both. Personally I'd feel most comfortable if we could defer discussion of this until we have a chance to understand the use cases and performance implications better; I think it's possible we'd want to instead propose to the VO a MOC-like standard using some other pixelization (and perhaps a HiPS-like counterpart).  However, I would also be okay with promising to support HiPS/MOC now on the basis of them being important interchange formats, with the understanding that this would not imply that they would be our internal formats, and in particular that conversion from the native format to them could be both lossy and inefficient.  
            Hide
            jbosch Jim Bosch added a comment -

            I had a brief follow-up conversation on Slack with Kian-Tat Lim where I clarified some of the above; I should probably post what said there here as well:

            I'm not completely sure that we [need to relate our masks to spherical polygons]. I can think of some things we might want to do with spherical polygons, but nothing that I'm certain about.  The big one is that the outlines of tracts and patches are great circles if we use a TAN projection (I think; I should check that). But it's the sensor boundaries that are more relevant for the masks, I think, and the fact that those are mapped to the sky with an additional distortion means that they aren't just straightforward spherical polygons.

            I'm not at all certain that MOC is bad for this.  I just want to be careful; MOC is big with astro data centers but to my knowledge it hasn't really been used by the big Stage III surveys or the scientists behind them, so I don't think it's actually been as well-vetted as it might seem.

            Show
            jbosch Jim Bosch added a comment - I had a brief follow-up conversation on Slack with Kian-Tat Lim  where I clarified some of the above; I should probably post what said there here as well: I'm not completely sure that we [need to relate our masks to spherical polygons] . I can think of some things we might want to do with spherical polygons, but nothing that I'm certain about.  The big one is that the outlines of tracts and patches are great circles if we use a TAN projection (I think; I should check that). But it's the sensor boundaries that are more relevant for the masks, I think, and the fact that those are mapped to the sky with an additional distortion means that they aren't just straightforward spherical polygons. I'm not at all certain that MOC is bad for this.  I just want to be careful; MOC is big with astro data centers but to my knowledge it hasn't really been used by the big Stage III surveys or the scientists behind them, so I don't think it's actually been as well-vetted as it might seem.
            Hide
            ctslater Colin Slater added a comment -

            It would be helpful to know more about the specific goal of this RFC in order to understand what level of analysis is required. I can imagine that there are one of two objectives: 1) define a pixelization format now so that science users can develop the tools they need to access published data products or 2) define an interchange format between two parts of DM, in this case presumably between DRP who produces these maps and SUIT who presents them to the user.

             

            If this RFC is meant to only satisfy #2, then it seems reasonable to resolve this largely on a technical basis between the two groups. If this is also meant to serve as a format that users directly interact with (case #1), then I would think tying ourselves (and our users) to one of these formats requires an analysis of the relevant use cases and the ability of these formats to support those use cases. Is this the format for providing all-sky mask and depth map information, e.g., and how suitable is it for these purposes? Are we saying we will provide data in both formats? (Can we store that?) I would at least hope to see a demo of these formats before formally saying we will provide them to users.

             

             

            Show
            ctslater Colin Slater added a comment - It would be helpful to know more about the specific goal of this RFC in order to understand what level of analysis is required. I can imagine that there are one of two objectives: 1) define a pixelization format now so that science users can develop the tools they need to access published data products or 2) define an interchange format between two parts of DM, in this case presumably between DRP who produces these maps and SUIT who presents them to the user.   If this RFC is meant to only satisfy #2, then it seems reasonable to resolve this largely on a technical basis between the two groups. If this is also meant to serve as a format that users directly interact with (case #1), then I would think tying ourselves (and our users) to one of these formats requires an analysis of the relevant use cases and the ability of these formats to support those use cases. Is this the format for providing all-sky mask and depth map information, e.g., and how suitable is it for these purposes? Are we saying we will provide data in both formats? (Can we store that?) I would at least hope to see a demo of these formats before formally saying we will provide them to users.    
            Hide
            tjenness Tim Jenness added a comment -

            At minimum, this RFC is a request that we explicitly add HPX projection images to DRP. We had discussed this being done as an afterburner from the TAN images but we fully expect to get better quality images if we make the HPX coadds natively. It is clear that providing images in HiPS format will make it much easier for use to visualize the data in standard tools, comparing it to other surveys. EPO also have an interest in all sky visualization formats and will probably go with the scheme that is adopted by the LSP so that they don't have to regenerate images themselves. I have added Ben Emmons [X] to the RFC.

            Show
            tjenness Tim Jenness added a comment - At minimum, this RFC is a request that we explicitly add HPX projection images to DRP. We had discussed this being done as an afterburner from the TAN images but we fully expect to get better quality images if we make the HPX coadds natively. It is clear that providing images in HiPS format will make it much easier for use to visualize the data in standard tools, comparing it to other surveys. EPO also have an interest in all sky visualization formats and will probably go with the scheme that is adopted by the LSP so that they don't have to regenerate images themselves. I have added Ben Emmons [X] to the RFC.
            Hide
            xiuqin Xiuqin Wu [X] (Inactive) added a comment -

            Thank you for the comments, Jim Bosch, Colin Slater, Tim Jenness.  The minimal goal is to get HiPS images generated according to VO standard so LSP could use them for large area image visualization, to display coverage of raft-level or focal-plane-leve single-visit image  as stated by LDM-554 "Data Management  LSSTScience Platform Requirements" DMS-PRTL-REQ-0063.  Also there are 5 requirements on All-Sky visualization: DMS-PRTL-REQ-0078, 79, 80, 81, 82. They will require the generation of HEALPix  images and variety of all-sky metrics, diagnostics, and other artifacts in HEALPix format. 

             

            Show
            xiuqin Xiuqin Wu [X] (Inactive) added a comment - Thank you for the comments, Jim Bosch , Colin Slater , Tim Jenness .  The minimal goal is to get HiPS images generated according to VO standard so LSP could use them for large area image visualization, to display coverage of raft-level or focal-plane-leve single-visit image  as stated by LDM-554 "Data Management  LSSTScience Platform Requirements" DMS-PRTL-REQ-0063.  Also there are 5 requirements on All-Sky visualization: DMS-PRTL-REQ-0078, 79, 80, 81, 82. They will require the generation of HEALPix  images and variety of all-sky metrics, diagnostics, and other artifacts in HEALPix format.   
            Hide
            gpdf Gregory Dubois-Felsmann added a comment -

            As Xiuqin Wu [X] says, the primary goal here is to say that LSST will produce all-sky data products in this format, so that both SUIT and the community can use this established format, as well as community tools, for overall navigation within the dataset.

            We are not here proposing that we revisit the existing decision that the science-grade coadds, with optimally defined PSFs, variances, etc., will be generated as overlapping tangent planes.  It was generally recognized at the time of that decision that tangent planes simplify the mathematics and are appropriate for supporting downstream image processing by users.

            The all-sky visualization tools we are developing are specifically designed to facilitate users making the transition from navigating in the all-sky map to discovering, visualizing, and downloading / accessing in the Notebook the underlying science-grade coadd (and single-epoch) images.

            Show
            gpdf Gregory Dubois-Felsmann added a comment - As Xiuqin Wu [X] says, the primary goal here is to say that LSST will produce all-sky data products in this format, so that both SUIT and the community can use this established format, as well as community tools, for overall navigation within the dataset. We are not here proposing that we revisit the existing decision that the science-grade coadds, with optimally defined PSFs, variances, etc., will be generated as overlapping tangent planes.  It was generally recognized at the time of that decision that tangent planes simplify the mathematics and are appropriate for supporting downstream image processing by users. The all-sky visualization tools we are developing are specifically designed to facilitate users making the transition from navigating in the all-sky map to discovering, visualizing, and downloading / accessing in the Notebook the underlying science-grade coadd (and single-epoch) images.
            Hide
            gpdf Gregory Dubois-Felsmann added a comment -

            Tim Jenness wrote:

            {quote}We had discussed this being done as an afterburner from the TAN images but we fully expect to get better quality images if we make the HPX coadds natively.{quote}

            Is this true at a non-trivial level?  If we define "quality" as "accurate point estimates of the flux, sampled on the HEALPix grid", at a spatial resolution comparable to the CCD pixel scale, I would have thought that if the tangent-plane images are adequately sampled and modeled, the HEALPix flux re-sampling could be computed reliably.  

            (I understood the problems with HEALPix to be more of the nature that it is then difficult to compute PSFs and other quantities, perhaps including variances, that are directly usable for science-grade image analysis in the HEALPix images themselves.)

            I have heard it said that the problem is that at the edges of the tangent-plane images, where the tracts overlap and the distortion is greatest, it will be difficult to match the HEALPix fluxes across the boundary.  If that is true, does that not mean that - even ignoring the HEALPix issue altogether - the atlas of tract images is insufficient for us to provide an authoritative flux estimate for our users in the overlap regions?

            Show
            gpdf Gregory Dubois-Felsmann added a comment - Tim Jenness wrote: {quote}We had discussed this being done as an afterburner from the TAN images but we fully expect to get better quality images if we make the HPX coadds natively.{quote} Is this true at a non-trivial level?  If we define "quality" as "accurate point estimates of the flux, sampled on the HEALPix grid", at a spatial resolution comparable to the CCD pixel scale, I would have thought that if the tangent-plane images are adequately sampled and modeled, the HEALPix flux re-sampling could be computed reliably.   (I understood the problems with HEALPix to be more of the nature that it is then difficult to compute PSFs and other quantities, perhaps including variances, that are directly usable for science-grade image analysis in the HEALPix images themselves.) I have heard it said that the problem is that at the edges of the tangent-plane images, where the tracts overlap and the distortion is greatest, it will be difficult to match the HEALPix fluxes across the boundary.  If that is true, does that not mean that - even ignoring the HEALPix issue altogether - the atlas of tract images is insufficient for us to provide an authoritative flux estimate for our users in the overlap regions?
            Hide
            jbosch Jim Bosch added a comment -

            I have heard it said that the problem is that at the edges of the tangent-plane images, where the tracts overlap and the distortion is greatest, it will be difficult to match the HEALPix fluxes across the boundary.  If that is true, does that not mean that - even ignoring the HEALPix issue altogether - the atlas of tract images is insufficient for us to provide an authoritative flux estimate for our users in the overlap regions?

            The question of being able to construct accurate HEALPix fluxes at the tract boundary from our TAN coadds should indeed be the same as the question of being able produce flux estimates for any other purpose.  Making sure the distortion at the edges doesn't cause a problem is a reason to define our tract size down.  Making sure our estimates from different-tract contributors to the overlap regions agree at the boundary where we change which is canonical is a matter of general pipeline quality, though it may involve making the overlap regions larger.

            It's possible that the problem of how to efficiently implement construct HEALPix images from our coadd in a way that preserves accuracy could be harder, however.  For example, we might have to remap those parts of our images into an intermediate local tangent plane to be able to use certain existing HEALPix map-making tools.

            Show
            jbosch Jim Bosch added a comment - I have heard it said that the problem is that at the edges of the tangent-plane images, where the tracts overlap and the distortion is greatest, it will be difficult to match the HEALPix fluxes across the boundary.  If that is true, does that not mean that - even ignoring the HEALPix issue altogether - the atlas of tract images is insufficient for us to provide an authoritative flux estimate for our users in the overlap regions? The question of being able to construct accurate HEALPix fluxes at the tract boundary from our TAN coadds should indeed be the same as the question of being able produce flux estimates for any other purpose.  Making sure the distortion at the edges doesn't cause a problem is a reason to define our tract size down.  Making sure our estimates from different-tract contributors to the overlap regions agree at the boundary where we change which is canonical is a matter of general pipeline quality, though it may involve making the overlap regions larger. It's possible that the problem of how to efficiently implement construct HEALPix images from our coadd in a way that preserves accuracy could be harder, however.  For example, we might have to remap those parts of our images into an intermediate local tangent plane to be able to use certain existing HEALPix map-making tools.
            Hide
            gpdf Gregory Dubois-Felsmann added a comment -

            It's worth noting for the record that we have been assuming in SUIT that the underlying DAX services on which we depend will include a cutout service for the LSST Data Release coadd data products, one which produces a science-grade, authoritative answer at any point within the survey coverage, whether or not it is in a tract overlap region, and does not require the user to decide which of the overlapping tract images from the archive should be used as input.  (We might also provide a version that does allow the user to decide on the input image, which may be useful in certain special circumstances.)  In general we (DM) are in a better position to know how to give the best answer than the user might be.

            I'm not sure this point has been made completely clear in baselined documents, though, so perhaps it's not as settled as I imagine.

            (We certainly do not imagine that this cutout service would use the HEALPix image hierarchy as a base!)

            Show
            gpdf Gregory Dubois-Felsmann added a comment - It's worth noting for the record that we have been assuming in SUIT that the underlying DAX services on which we depend will include a cutout service for the LSST Data Release coadd data products, one which produces a science-grade, authoritative answer at any point within the survey coverage, whether or not it is in a tract overlap region, and does not require the user to decide which of the overlapping tract images from the archive should be used as input.  (We might also provide a version that does allow the user to decide on the input image, which may be useful in certain special circumstances.)  In general we (DM) are in a better position to know how to give the best answer than the user might be. I'm not sure this point has been made completely clear in baselined documents, though, so perhaps it's not as settled as I imagine. (We certainly do not imagine that this cutout service would use the HEALPix image hierarchy as a base!)
            Hide
            tjenness Tim Jenness added a comment -

            My comment was referring to a discussion I had with CDS where they felt that the quality of the images is better if you create them in HPX natively rather than reconstructing them from highly distorted tan images. Edge effects were a worry. Also, if you give the HiPS tool a bunch of tan images it doesn't parallelize the conversion to HiPS but if you start with HPX images the conversion can be easily parallelized. I think we can assume that going from TAN to HiPS is a good place to start but we should include a risk that we would need to generate the coadds again in HPX.

            Show
            tjenness Tim Jenness added a comment - My comment was referring to a discussion I had with CDS where they felt that the quality of the images is better if you create them in HPX natively rather than reconstructing them from highly distorted tan images. Edge effects were a worry. Also, if you give the HiPS tool a bunch of tan images it doesn't parallelize the conversion to HiPS but if you start with HPX images the conversion can be easily parallelized. I think we can assume that going from TAN to HiPS is a good place to start but we should include a risk that we would need to generate the coadds again in HPX.
            Hide
            gpdf Gregory Dubois-Felsmann added a comment - - edited

            You seem to be mixing two things here:

            • Whether we can use hipsgen out of the box to generate HiPS maps from tangent-plane coadds, or whether we need to write our own code in the afw framework to compute HEALPix-gridded fluxes from our tangent-plane coadds.
            • If neither of the above works, whether we need to re-coadd from scratch in the HEALPix pixel space in order to get accurate results.

            I am in no way claiming confidence that hipsgen will do the proper resampling, to the level of quality that is our general goal in our image processing algorithms.  I'm making a different point, which is that if it can't be done at all without re-coadding, then there seems to be something deeper wrong in our ability to understand the coadded flux and to do things like provide a no-fuss cutout service.  If we can't produce reliable flux estimates because our tangent-plane images are "highly distorted" at the edges, that applies to all uses of them, not just conversion to HEALPix, and we should be fixing that, as it would imply, for example, that the generic coadd cutout service would also need to redo coadds from scratch just to make properly tangential cutouts centered in the overlap regions. 

            My mental model of all this has long been that, assuming that our tangent-plane coadds and associated algorithms are good enough, with the edge distortions understood well enough, we would in fact write our own code to produce the base layer of the HiPS hierarchy, i.e., an all-sky image, in HEALPix coordinates, computed using afw from our tangent coadds.  We could then still use hipsgen to roll it up into a full HiPS tree.

            Maybe this is what you meant all along, but I interpreted you to be saying that the coaddition itself would need to be re-done in HEALPix space.

            Show
            gpdf Gregory Dubois-Felsmann added a comment - - edited You seem to be mixing two things here: Whether we can use  hipsgen out of the box to generate HiPS maps from tangent-plane coadds, or whether we need to write our own code in the afw framework to compute HEALPix-gridded fluxes from our tangent-plane coadds. If neither of the above works, whether we need to re-coadd from scratch in the HEALPix pixel space in order to get accurate results. I am in no way claiming confidence that hipsgen will do the proper resampling, to the level of quality that is our general goal in our image processing algorithms.  I'm making a different point, which is that if it can't be done at all without re-coadding , then there seems to be something deeper wrong in our ability to understand the coadded flux and to do things like provide a no-fuss cutout service.  If we can't produce reliable flux estimates because our tangent-plane images are "highly distorted" at the edges, that applies to all uses of them, not just conversion to HEALPix, and we should be fixing that, as it would imply, for example, that the generic coadd cutout service would also need to redo coadds from scratch just to make properly tangential cutouts centered in the overlap regions.  My mental model of all this has long been that, assuming that our tangent-plane coadds and associated algorithms  are good enough, with the edge distortions understood well enough, we would in fact write our own code to produce the base layer of the HiPS hierarchy, i.e., an all-sky image, in HEALPix coordinates, computed using afw from our tangent coadds .  We could then still use hipsgen to roll it up into a full HiPS tree. Maybe this is what you meant all along, but I interpreted you to be saying that the coaddition itself would need to be re-done in HEALPix space.
            Hide
            gpdf Gregory Dubois-Felsmann added a comment -

            Just to close this off: how we would generate HiPS (or for that matter, any other format of) all-sky maps in detail is not the subject of this RFC, just to state that we will do so somehow.  There is clearly still work to do to resolve the questions Tim Jenness and Jim Bosch and I were discussing above, and that would of course include assessing whether hipsgen is usable as-is or not.

            Show
            gpdf Gregory Dubois-Felsmann added a comment - Just to close this off: how we would generate HiPS (or for that matter, any other format of) all-sky maps in detail is not the subject of this RFC, just to state that we will do so somehow.  There is clearly still work to do to resolve the questions Tim Jenness and Jim Bosch and I were discussing above, and that would of course include assessing whether hipsgen is usable as-is or not.
            Hide
            tjenness Tim Jenness added a comment -

            Gregory Dubois-Felsmann what do you want to do with this RFC?

            Show
            tjenness Tim Jenness added a comment - Gregory Dubois-Felsmann what do you want to do with this RFC?
            Hide
            xiuqin Xiuqin Wu [X] (Inactive) added a comment -

            I want to declare it adopted. There should be other RFCs to plan HiPS and MOC data generation software development and operations, the effect on storage and computing resources. 

             

             

            Show
            xiuqin Xiuqin Wu [X] (Inactive) added a comment - I want to declare it adopted. There should be other RFCs to plan HiPS and MOC data generation software development and operations, the effect on storage and computing resources.     
            Hide
            tjenness Tim Jenness added a comment -

            Gregory Dubois-Felsmann which requirements document are you wanting to update? Are you also intending to update DPDD?

            Show
            tjenness Tim Jenness added a comment - Gregory Dubois-Felsmann which requirements document are you wanting to update? Are you also intending to update DPDD?
            Hide
            tjenness Tim Jenness added a comment -

            I don't see any objection to this RFC being adopted. Please ensure that the triggered tickets are clear as to which documents need to be modified to record the decision.

            Show
            tjenness Tim Jenness added a comment - I don't see any objection to this RFC being adopted. Please ensure that the triggered tickets are clear as to which documents need to be modified to record the decision.
            Hide
            xiuqin Xiuqin Wu [X] (Inactive) added a comment -

            DM-13967 has been created to for work to be done.

            Show
            xiuqin Xiuqin Wu [X] (Inactive) added a comment - DM-13967 has been created to for work to be done.

              People

              Assignee:
              xiuqin Xiuqin Wu [X] (Inactive)
              Reporter:
              xiuqin Xiuqin Wu [X] (Inactive)
              Watchers:
              Ben Emmons [X] (Inactive), Colin Slater, Eli Rykoff, Fritz Mueller, Gregory Dubois-Felsmann, Jim Bosch, John Swinbank, Leanne Guy, Paul Price, Tim Jenness, Xiuqin Wu [X] (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

                Dates

                Created:
                Updated:
                Planned End: