Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-23683

Revise DIAObject <--> Object Associations

    Details

    • Story Points:
      10
    • Team:
      DM Science
    • Urgent?:
      No

      Description

      The current plan is to provide Object ids for the three nearest galaxies, but the "nearest" galaxy isn't always the most likely host. Propose a more scientifically useful way to do association between DIAObjects and Objects during Alert Generation.

        Attachments

          Issue Links

            Activity

            Hide
            wmwood-vasey Michael Wood-Vasey added a comment - - edited

            I like this addition. This is a good write-up and a good idea.

            I agree that Simon Krughoff's name suggestions are an improvement. I might go for simpler potentialHostSeparation – "separation" implies "distance" and there's not a strict definition of "distance" that is obviously correct that we're trying to evoke here. One could consider bike-shedding further about the names.

            Implementation Suggestions with some Conceptual Impact:
            1. The question you really want to answer is what are all galaxies where the transient may be within several second-moment-effective-radii. This is the inverse search and formally means asking that question of every galaxy. However, I think you can pre-compute healpix this pretty straightforwardly. You could pre-compute for regions on the sky which galaxies contribute. Then you only need to compute the exact distance for the subset of galaxies that contribute to the healpix the transient is in. A brief study of densities and healpix levels will help guide the choices of what to store and when to compute more.

            2. [a bit of Scope Expansion]
            To more fully implement the conceptual goals here, I think it would be very good (if a bit awkward) to include a fuller galaxy catalog that includes galaxies that won't be in the LSST catalog. E.g.,
            Messier galaxies and even brighter NGC galaxies. The detection of transients with fainter absolute magnitude in nearby galaxies will be an important part of the new science reach of Rubin Observatory and the LSST. We're importing Gaia. Importing a galaxy catalog to cover nearby galaxies seems useful and reasonable as well.

            The "hey, you might be interested in knowing that this transient is within 10 kpc of M81" seems likely a really relevant thing that a user might want to know. It's certainly true that downstream brokers can also do such associations with purely public data. (And now that I've written suggestion number 1 above I know how I might choose to do this in a broker)

            3. There are some technical implementation details of how to store and use I_

            {xx,xy,yy}

            in the Object Table that are explored in https://jira.lsstcorp.org/browse/DM-19519

            Show
            wmwood-vasey Michael Wood-Vasey added a comment - - edited I like this addition. This is a good write-up and a good idea. I agree that Simon Krughoff 's name suggestions are an improvement. I might go for simpler potentialHostSeparation – "separation" implies "distance" and there's not a strict definition of "distance" that is obviously correct that we're trying to evoke here. One could consider bike-shedding further about the names. Implementation Suggestions with some Conceptual Impact: 1. The question you really want to answer is what are all galaxies where the transient may be within several second-moment-effective-radii. This is the inverse search and formally means asking that question of every galaxy. However, I think you can pre-compute healpix this pretty straightforwardly. You could pre-compute for regions on the sky which galaxies contribute. Then you only need to compute the exact distance for the subset of galaxies that contribute to the healpix the transient is in. A brief study of densities and healpix levels will help guide the choices of what to store and when to compute more. 2. [a bit of Scope Expansion] To more fully implement the conceptual goals here, I think it would be very good (if a bit awkward) to include a fuller galaxy catalog that includes galaxies that won't be in the LSST catalog. E.g., Messier galaxies and even brighter NGC galaxies. The detection of transients with fainter absolute magnitude in nearby galaxies will be an important part of the new science reach of Rubin Observatory and the LSST. We're importing Gaia. Importing a galaxy catalog to cover nearby galaxies seems useful and reasonable as well. The "hey, you might be interested in knowing that this transient is within 10 kpc of M81" seems likely a really relevant thing that a user might want to know. It's certainly true that downstream brokers can also do such associations with purely public data. (And now that I've written suggestion number 1 above I know how I might choose to do this in a broker) 3. There are some technical implementation details of how to store and use I_ {xx,xy,yy} in the Object Table that are explored in https://jira.lsstcorp.org/browse/DM-19519
            Hide
            mgraham Melissa Graham added a comment -

            Excellent, thank you both Simon Krughoff and Michael Wood-Vasey. I agree with all these suggestions and have incorporated them all into the DMTN (paraphrased in the list below).

            The draft is now an official DMTN at https://dmtn-151.lsst.io/, and I'm about to initiate the RFC where we can discuss further. Definitely open to more feedback.

             

            KSK: Section 3.5: galaxy's redshifts --> galaxy's redshift
            MLG: fixed!

            KSK: First para Section 4.1: missing parens at the end of the sentence.
            MLG: closing parenthesis fixed

            KSK: "potentialHost" and "potentialHostSepDist" might be better names than "nearbyPotHost" and "nearbyPotHostSepDist"
            MWV: further revise suggestion to "potentialHost" and "potentialHostSeparation" to remove the term 'distance' which could evoke a strict definition
            MLG: agree with both, went with the latter

            KSK: calculating separation distance for nearest 10 extended sources is probably too small a number
            MWV: instead of calculating separations for the N nearest extended sources, use healpix
            MLG: agreed. updated to specify a separation distance must be calculated for all extended sources within ~200 arcsec before the top 3 are identified (Appendix B motivates this in depth); added a note to say HEALpix could be used to identify these extended sources, but that it is an implementation detail

            MWV: (would require a scope increase) Messier galaxies and brighter NGC galaxies won't be in the LSST catalog, but their coordinates and second moments could be imported and included in the potential host association process
            MLG: added a note about this

            MWV: see the technical implemented details of how to store and use l_{xx,xy,yy} in the Object Table in DM-19519
            MLG: added a statement to the end of Appendix A about how the reference frame of Ixx Iyy Ixy should be verified at the time of implementation

            MLG: Reorganized Section 4.1 "Recommendations".

            MLG: Extended the draft RFC.

            Show
            mgraham Melissa Graham added a comment - Excellent, thank you both Simon Krughoff and Michael Wood-Vasey . I agree with all these suggestions and have incorporated them all into the DMTN (paraphrased in the list below). The draft is now an official DMTN at https://dmtn-151.lsst.io/ , and I'm about to initiate the RFC where we can discuss further.  Definitely open to more feedback.   KSK: Section 3.5: galaxy's redshifts --> galaxy's redshift MLG: fixed! KSK: First para Section 4.1: missing parens at the end of the sentence. MLG: closing parenthesis fixed KSK: "potentialHost" and "potentialHostSepDist" might be better names than "nearbyPotHost" and "nearbyPotHostSepDist" MWV: further revise suggestion to "potentialHost" and "potentialHostSeparation" to remove the term 'distance' which could evoke a strict definition MLG: agree with both, went with the latter KSK: calculating separation distance for nearest 10 extended sources is probably too small a number MWV: instead of calculating separations for the N nearest extended sources, use healpix MLG: agreed. updated to specify a separation distance must be calculated for all extended sources within ~200 arcsec before the top 3 are identified (Appendix B motivates this in depth ); added a note to say HEALpix could be used to identify these extended sources, but that it is an implementation detail MWV: (would require a scope increase) Messier galaxies and brighter NGC galaxies won't be in the LSST catalog, but their coordinates and second moments could be imported and included in the potential host association process MLG: added a note about this MWV: see the technical implemented details of how to store and use l_{xx,xy,yy} in the Object Table in DM-19519 MLG: added a statement to the end of Appendix A about how the reference frame of Ixx Iyy Ixy should be verified at the time of implementation MLG: Reorganized Section 4.1 "Recommendations". MLG: Extended the draft RFC.
            Hide
            mgraham Melissa Graham added a comment -

            RFC-695 has been created.

            Show
            mgraham Melissa Graham added a comment - RFC-695 has been created.
            Hide
            wmwood-vasey Michael Wood-Vasey added a comment -

            The concept of tracking overlaps differently at different scales in a hierarchical manner is what I was trying to get at.

            My idea was that at each scale one could compute overlapping scales for effective radii on the scale of that HEALpix level. E.g., for a level of effective resolution 1 degree, record the matches for galaxies with effective radii of that scale, e.g. LMC, SMC, and Andromeda (which is not in LSST's footprint). Then at the next level down, say 30' arcminutes, record the nearest galaxy for galaxies with effective radii on scales of 30 arcminutes. Then at the 10" level you will only record the nearest galaxies with effective radii on the scales of 10".

            There's some factor between the HEALpix resolution and the range of effective radii one wants to explore and maybe some detail about the shape of pixels.

            I agree that HEALpix is an implementation detail, any hierarchical mapping would be fine.

            Show
            wmwood-vasey Michael Wood-Vasey added a comment - The concept of tracking overlaps differently at different scales in a hierarchical manner is what I was trying to get at. My idea was that at each scale one could compute overlapping scales for effective radii on the scale of that HEALpix level. E.g., for a level of effective resolution 1 degree, record the matches for galaxies with effective radii of that scale, e.g. LMC, SMC, and Andromeda (which is not in LSST's footprint). Then at the next level down, say 30' arcminutes, record the nearest galaxy for galaxies with effective radii on scales of 30 arcminutes. Then at the 10" level you will only record the nearest galaxies with effective radii on the scales of 10". There's some factor between the HEALpix resolution and the range of effective radii one wants to explore and maybe some detail about the shape of pixels. I agree that HEALpix is an implementation detail, any hierarchical mapping would be fine.
            Hide
            mgraham Melissa Graham added a comment -

            Thank you Michael Wood-Vasey for the clarification and my apologies for not catching on to the full scope of your suggestion. I've added a subsection to DMTN-151 under "Recommendations" to elaborate on this option (pasted below) and am going to make a comment on it in the RFC to inspire further discussion on which option should be pursued.

             

            \subsection{Option: Hierarchical Association}

            Instead of associating a {\tt DIAObject} with the three extended {\tt Objects} with the lowest separation distance, associate with the three nearest neighbors at three size scales. For example, the nearest $R_e<10"$ neighbor within $d<100"$ (high-$z$ and small galaxies), the nearest $R_e<100"$ neighbor within $d<1000"$ (large low-$z$ galaxies), and the nearest $R_e<1000"$ neighbor within $d<10000"$ (very large nearby galaxies). In this option, the ``nearest neighbor" would be the extended {\tt Object} with the lowest separation distance, as calculated from, e.g., the second moments (Section \ref{ssec:options_mom}).

            However, this option does not avoid the issue of contamination by background galaxies as discussed in Appendix \ref{sec:appB}. To mitigate background interlopers, the nearest \emph{three} extended sources for each size scale should be included. This might seem unnecessary for the largest size scale but it would assist with identifying transients in galaxy groups and clusters, especially the rare transients with large offsets which might belong to intracluster stellar populations.

            This option would add unit64[9] and float[9] to the {\tt DIAObject} catalog and to each alert, instead of unit64[3] and float[3]. However, it would also vastly reduce the number of extended {\tt Objects} that are considered during the host association process, and would impose a lower computational load. Since 9 potential associations are reported instead of 3, this option would also lower the probability of the failure scenario (in which the true host galaxy is not associated with the {\tt DIAObject}) and increase the amount of contextual information passed in the alert, which would benefit science applications.

            Show
            mgraham Melissa Graham added a comment - Thank you Michael Wood-Vasey for the clarification and my apologies for not catching on to the full scope of your suggestion. I've added a subsection to DMTN-151 under "Recommendations" to elaborate on this option (pasted below) and am going to make a comment on it in the RFC to inspire further discussion on which option should be pursued.   \subsection{Option: Hierarchical Association} Instead of associating a {\tt DIAObject} with the three extended {\tt Objects} with the lowest separation distance, associate with the three nearest neighbors at three size scales. For example, the nearest $R_e<10"$ neighbor within $d<100"$ (high-$z$ and small galaxies), the nearest $R_e<100"$ neighbor within $d<1000"$ (large low-$z$ galaxies), and the nearest $R_e<1000"$ neighbor within $d<10000"$ (very large nearby galaxies). In this option, the ``nearest neighbor" would be the extended {\tt Object} with the lowest separation distance, as calculated from, e.g., the second moments (Section \ref{ssec:options_mom}). However, this option does not avoid the issue of contamination by background galaxies as discussed in Appendix \ref{sec:appB}. To mitigate background interlopers, the nearest \emph{three} extended sources for each size scale should be included. This might seem unnecessary for the largest size scale but it would assist with identifying transients in galaxy groups and clusters, especially the rare transients with large offsets which might belong to intracluster stellar populations. This option would add unit64 [9] and float [9] to the {\tt DIAObject} catalog and to each alert, instead of unit64 [3] and float [3] . However, it would also vastly reduce the number of extended {\tt Objects} that are considered during the host association process, and would impose a lower computational load. Since 9 potential associations are reported instead of 3, this option would also lower the probability of the failure scenario (in which the true host galaxy is not associated with the {\tt DIAObject}) and increase the amount of contextual information passed in the alert, which would benefit science applications.

              People

              • Assignee:
                mgraham Melissa Graham
                Reporter:
                mgraham Melissa Graham
                Watchers:
                Eric Bellm, Leanne Guy, Melissa Graham, Michael Wood-Vasey, Simon Krughoff
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Due:
                  Created:
                  Updated:

                  Summary Panel