Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-23683

Revise DIAObject <--> Object Associations

    XMLWordPrintable

    Details

    • Story Points:
      10
    • Team:
      DM Science
    • Urgent?:
      No

      Description

      The current plan is to provide Object ids for the three nearest galaxies, but the "nearest" galaxy isn't always the most likely host. Propose a more scientifically useful way to do association between DIAObjects and Objects during Alert Generation.

        Attachments

          Issue Links

            Activity

            No builds found.
            mgraham Melissa Graham created issue -
            mgraham Melissa Graham made changes -
            Field Original Value New Value
            Rank Ranked higher
            mgraham Melissa Graham made changes -
            Status To Do [ 10001 ] In Progress [ 3 ]
            mgraham Melissa Graham made changes -
            Team DM Science [ 12218 ]
            Hide
            mgraham Melissa Graham added a comment -

            A draft DMTN that describes a method to improve the association between DIAObjects and Objects (for non-stellar Objects), and proposes adding two new DIAObject elements to hold the Object ids and "separation distances" for these associated Objects, currently lives here:
            https://github.com/MelissaGraham/dmtn-alertoptions/blob/master/draft-hostassoc.pdf

            This document is ready for review and, if found to be adequate, the associated RFC draft (Section 4.1) could be shared first with the DM-SST and then posted for broader input.

            Leanne Guy I'd like to get this on a DM-SST agenda, probably after the Algorithms Workshop?

            Show
            mgraham Melissa Graham added a comment - A draft DMTN that describes a method to improve the association between DIAObjects and Objects (for non-stellar Objects), and proposes adding two new DIAObject elements to hold the Object ids and "separation distances" for these associated Objects, currently lives here: https://github.com/MelissaGraham/dmtn-alertoptions/blob/master/draft-hostassoc.pdf This document is ready for review and, if found to be adequate, the associated RFC draft (Section 4.1) could be shared first with the DM-SST and then posted for broader input. Leanne Guy  I'd like to get this on a DM-SST agenda, probably after the Algorithms Workshop?
            mgraham Melissa Graham made changes -
            Watchers Melissa Graham [ Melissa Graham ] Eric Bellm, Leanne Guy, Melissa Graham [ Eric Bellm, Leanne Guy, Melissa Graham ]
            mgraham Melissa Graham made changes -
            Due Date 31/Mar/20
            Hide
            krughoff Simon Krughoff added a comment -

            This looks great, Melissa Graham. I just have a couple of comments.

            I worry a little about the assumption that the nearest 10 extended sources will be the most likely host all (or a large fraction of) the time. I'm probably over estimating the importance of very nearby galaxies.

            I am not going to push back at all if you don't take this suggestion, but potentialHost and potentialHostSepDist are about the same number of characters and seem more understandable than nearbyPotHost and nearbyPotHostSepDist. I'd argue if it is a potential host, it should be assumed to be nearby in some space.

            Editorial comments:
            Section 3.5: galaxy's redshifts --> galaxy's redshift
            First para Section 4.1: missing parens at the end of the sentence.

            Show
            krughoff Simon Krughoff added a comment - This looks great, Melissa Graham . I just have a couple of comments. I worry a little about the assumption that the nearest 10 extended sources will be the most likely host all (or a large fraction of) the time. I'm probably over estimating the importance of very nearby galaxies. I am not going to push back at all if you don't take this suggestion, but potentialHost and potentialHostSepDist are about the same number of characters and seem more understandable than nearbyPotHost and nearbyPotHostSepDist . I'd argue if it is a potential host, it should be assumed to be nearby in some space. Editorial comments: Section 3.5: galaxy's redshifts --> galaxy's redshift First para Section 4.1: missing parens at the end of the sentence.
            Hide
            mgraham Melissa Graham added a comment -

            Thank you Simon KrughoffUnable to render embedded object: File ( I think you’re absolutely right about calculating separation distance for the only 10 nearest extended objects being inadequate. I should work out, e.g., the typical number of LSST extended objects that would fall within ~10 effective radii of a nearby low-z galaxy. It’s gotta be way way more than 10. I’m glad you caught that) not found.

            And then yes I very much like your suggestion for the new element names, much better. 

            Show
            mgraham Melissa Graham added a comment - Thank you Simon Krughoff Unable to render embedded object: File ( I think you’re absolutely right about calculating separation distance for the only 10 nearest extended objects being inadequate. I should work out, e.g., the typical number of LSST extended objects that would fall within ~10 effective radii of a nearby low-z galaxy. It’s gotta be way way more than 10. I’m glad you caught that) not found. And then yes I very much like your suggestion for the new element names, much better. 
            Hide
            mgraham Melissa Graham added a comment -

            For a rough estimate, say a typical galaxy effective radius Re~10 kpc, so 10Re~100kpc.

            At a distance of 50 Mpc the scale is 0.247 kpc/", and 10Re ~ 405", which is an area of 0.04 deg2.

            With 4e9 galaxies in the final LSST catalog, spread over 18000 deg2, there could be 8829 galaxies between a transient in the outskirts of a nearby host and the core of that host. Ish. So yeah, nearest 10 is a bit off eh!!!  

            Show
            mgraham Melissa Graham added a comment - For a rough estimate, say a typical galaxy effective radius Re~10 kpc, so 10Re~100kpc. At a distance of 50 Mpc the scale is 0.247 kpc/", and 10Re ~ 405", which is an area of 0.04 deg2. With 4e9 galaxies in the final LSST catalog, spread over 18000 deg2, there could be 8829 galaxies between a transient in the outskirts of a nearby host and the core of that host. Ish. So yeah, nearest 10 is a bit off eh!!!  
            Hide
            wmwood-vasey Michael Wood-Vasey added a comment - - edited

            I like this addition. This is a good write-up and a good idea.

            I agree that Simon Krughoff's name suggestions are an improvement. I might go for simpler potentialHostSeparation – "separation" implies "distance" and there's not a strict definition of "distance" that is obviously correct that we're trying to evoke here. One could consider bike-shedding further about the names.

            Implementation Suggestions with some Conceptual Impact:
            1. The question you really want to answer is what are all galaxies where the transient may be within several second-moment-effective-radii. This is the inverse search and formally means asking that question of every galaxy. However, I think you can pre-compute healpix this pretty straightforwardly. You could pre-compute for regions on the sky which galaxies contribute. Then you only need to compute the exact distance for the subset of galaxies that contribute to the healpix the transient is in. A brief study of densities and healpix levels will help guide the choices of what to store and when to compute more.

            2. [a bit of Scope Expansion]
            To more fully implement the conceptual goals here, I think it would be very good (if a bit awkward) to include a fuller galaxy catalog that includes galaxies that won't be in the LSST catalog. E.g.,
            Messier galaxies and even brighter NGC galaxies. The detection of transients with fainter absolute magnitude in nearby galaxies will be an important part of the new science reach of Rubin Observatory and the LSST. We're importing Gaia. Importing a galaxy catalog to cover nearby galaxies seems useful and reasonable as well.

            The "hey, you might be interested in knowing that this transient is within 10 kpc of M81" seems likely a really relevant thing that a user might want to know. It's certainly true that downstream brokers can also do such associations with purely public data. (And now that I've written suggestion number 1 above I know how I might choose to do this in a broker)

            3. There are some technical implementation details of how to store and use I_

            {xx,xy,yy}

            in the Object Table that are explored in https://jira.lsstcorp.org/browse/DM-19519

            Show
            wmwood-vasey Michael Wood-Vasey added a comment - - edited I like this addition. This is a good write-up and a good idea. I agree that Simon Krughoff 's name suggestions are an improvement. I might go for simpler potentialHostSeparation – "separation" implies "distance" and there's not a strict definition of "distance" that is obviously correct that we're trying to evoke here. One could consider bike-shedding further about the names. Implementation Suggestions with some Conceptual Impact: 1. The question you really want to answer is what are all galaxies where the transient may be within several second-moment-effective-radii. This is the inverse search and formally means asking that question of every galaxy. However, I think you can pre-compute healpix this pretty straightforwardly. You could pre-compute for regions on the sky which galaxies contribute. Then you only need to compute the exact distance for the subset of galaxies that contribute to the healpix the transient is in. A brief study of densities and healpix levels will help guide the choices of what to store and when to compute more. 2. [a bit of Scope Expansion] To more fully implement the conceptual goals here, I think it would be very good (if a bit awkward) to include a fuller galaxy catalog that includes galaxies that won't be in the LSST catalog. E.g., Messier galaxies and even brighter NGC galaxies. The detection of transients with fainter absolute magnitude in nearby galaxies will be an important part of the new science reach of Rubin Observatory and the LSST. We're importing Gaia. Importing a galaxy catalog to cover nearby galaxies seems useful and reasonable as well. The "hey, you might be interested in knowing that this transient is within 10 kpc of M81" seems likely a really relevant thing that a user might want to know. It's certainly true that downstream brokers can also do such associations with purely public data. (And now that I've written suggestion number 1 above I know how I might choose to do this in a broker) 3. There are some technical implementation details of how to store and use I_ {xx,xy,yy} in the Object Table that are explored in https://jira.lsstcorp.org/browse/DM-19519
            Hide
            mgraham Melissa Graham added a comment -

            Excellent, thank you both Simon Krughoff and Michael Wood-Vasey. I agree with all these suggestions and have incorporated them all into the DMTN (paraphrased in the list below).

            The draft is now an official DMTN at https://dmtn-151.lsst.io/, and I'm about to initiate the RFC where we can discuss further. Definitely open to more feedback.

             

            KSK: Section 3.5: galaxy's redshifts --> galaxy's redshift
            MLG: fixed!

            KSK: First para Section 4.1: missing parens at the end of the sentence.
            MLG: closing parenthesis fixed

            KSK: "potentialHost" and "potentialHostSepDist" might be better names than "nearbyPotHost" and "nearbyPotHostSepDist"
            MWV: further revise suggestion to "potentialHost" and "potentialHostSeparation" to remove the term 'distance' which could evoke a strict definition
            MLG: agree with both, went with the latter

            KSK: calculating separation distance for nearest 10 extended sources is probably too small a number
            MWV: instead of calculating separations for the N nearest extended sources, use healpix
            MLG: agreed. updated to specify a separation distance must be calculated for all extended sources within ~200 arcsec before the top 3 are identified (Appendix B motivates this in depth); added a note to say HEALpix could be used to identify these extended sources, but that it is an implementation detail

            MWV: (would require a scope increase) Messier galaxies and brighter NGC galaxies won't be in the LSST catalog, but their coordinates and second moments could be imported and included in the potential host association process
            MLG: added a note about this

            MWV: see the technical implemented details of how to store and use l_{xx,xy,yy} in the Object Table in DM-19519
            MLG: added a statement to the end of Appendix A about how the reference frame of Ixx Iyy Ixy should be verified at the time of implementation

            MLG: Reorganized Section 4.1 "Recommendations".

            MLG: Extended the draft RFC.

            Show
            mgraham Melissa Graham added a comment - Excellent, thank you both Simon Krughoff and Michael Wood-Vasey . I agree with all these suggestions and have incorporated them all into the DMTN (paraphrased in the list below). The draft is now an official DMTN at https://dmtn-151.lsst.io/ , and I'm about to initiate the RFC where we can discuss further.  Definitely open to more feedback.   KSK: Section 3.5: galaxy's redshifts --> galaxy's redshift MLG: fixed! KSK: First para Section 4.1: missing parens at the end of the sentence. MLG: closing parenthesis fixed KSK: "potentialHost" and "potentialHostSepDist" might be better names than "nearbyPotHost" and "nearbyPotHostSepDist" MWV: further revise suggestion to "potentialHost" and "potentialHostSeparation" to remove the term 'distance' which could evoke a strict definition MLG: agree with both, went with the latter KSK: calculating separation distance for nearest 10 extended sources is probably too small a number MWV: instead of calculating separations for the N nearest extended sources, use healpix MLG: agreed. updated to specify a separation distance must be calculated for all extended sources within ~200 arcsec before the top 3 are identified (Appendix B motivates this in depth ); added a note to say HEALpix could be used to identify these extended sources, but that it is an implementation detail MWV: (would require a scope increase) Messier galaxies and brighter NGC galaxies won't be in the LSST catalog, but their coordinates and second moments could be imported and included in the potential host association process MLG: added a note about this MWV: see the technical implemented details of how to store and use l_{xx,xy,yy} in the Object Table in DM-19519 MLG: added a statement to the end of Appendix A about how the reference frame of Ixx Iyy Ixy should be verified at the time of implementation MLG: Reorganized Section 4.1 "Recommendations". MLG: Extended the draft RFC.
            mgraham Melissa Graham made changes -
            Link This issue is parent task of RFC-695 [ RFC-695 ]
            Hide
            mgraham Melissa Graham added a comment -

            RFC-695 has been created.

            Show
            mgraham Melissa Graham added a comment - RFC-695 has been created.
            Hide
            wmwood-vasey Michael Wood-Vasey added a comment -

            The concept of tracking overlaps differently at different scales in a hierarchical manner is what I was trying to get at.

            My idea was that at each scale one could compute overlapping scales for effective radii on the scale of that HEALpix level. E.g., for a level of effective resolution 1 degree, record the matches for galaxies with effective radii of that scale, e.g. LMC, SMC, and Andromeda (which is not in LSST's footprint). Then at the next level down, say 30' arcminutes, record the nearest galaxy for galaxies with effective radii on scales of 30 arcminutes. Then at the 10" level you will only record the nearest galaxies with effective radii on the scales of 10".

            There's some factor between the HEALpix resolution and the range of effective radii one wants to explore and maybe some detail about the shape of pixels.

            I agree that HEALpix is an implementation detail, any hierarchical mapping would be fine.

            Show
            wmwood-vasey Michael Wood-Vasey added a comment - The concept of tracking overlaps differently at different scales in a hierarchical manner is what I was trying to get at. My idea was that at each scale one could compute overlapping scales for effective radii on the scale of that HEALpix level. E.g., for a level of effective resolution 1 degree, record the matches for galaxies with effective radii of that scale, e.g. LMC, SMC, and Andromeda (which is not in LSST's footprint). Then at the next level down, say 30' arcminutes, record the nearest galaxy for galaxies with effective radii on scales of 30 arcminutes. Then at the 10" level you will only record the nearest galaxies with effective radii on the scales of 10". There's some factor between the HEALpix resolution and the range of effective radii one wants to explore and maybe some detail about the shape of pixels. I agree that HEALpix is an implementation detail, any hierarchical mapping would be fine.
            Hide
            mgraham Melissa Graham added a comment -

            Thank you Michael Wood-Vasey for the clarification and my apologies for not catching on to the full scope of your suggestion. I've added a subsection to DMTN-151 under "Recommendations" to elaborate on this option (pasted below) and am going to make a comment on it in the RFC to inspire further discussion on which option should be pursued.

             

            \subsection{Option: Hierarchical Association}

            Instead of associating a {\tt DIAObject} with the three extended {\tt Objects} with the lowest separation distance, associate with the three nearest neighbors at three size scales. For example, the nearest $R_e<10"$ neighbor within $d<100"$ (high-$z$ and small galaxies), the nearest $R_e<100"$ neighbor within $d<1000"$ (large low-$z$ galaxies), and the nearest $R_e<1000"$ neighbor within $d<10000"$ (very large nearby galaxies). In this option, the ``nearest neighbor" would be the extended {\tt Object} with the lowest separation distance, as calculated from, e.g., the second moments (Section \ref{ssec:options_mom}).

            However, this option does not avoid the issue of contamination by background galaxies as discussed in Appendix \ref{sec:appB}. To mitigate background interlopers, the nearest \emph{three} extended sources for each size scale should be included. This might seem unnecessary for the largest size scale but it would assist with identifying transients in galaxy groups and clusters, especially the rare transients with large offsets which might belong to intracluster stellar populations.

            This option would add unit64[9] and float[9] to the {\tt DIAObject} catalog and to each alert, instead of unit64[3] and float[3]. However, it would also vastly reduce the number of extended {\tt Objects} that are considered during the host association process, and would impose a lower computational load. Since 9 potential associations are reported instead of 3, this option would also lower the probability of the failure scenario (in which the true host galaxy is not associated with the {\tt DIAObject}) and increase the amount of contextual information passed in the alert, which would benefit science applications.

            Show
            mgraham Melissa Graham added a comment - Thank you Michael Wood-Vasey for the clarification and my apologies for not catching on to the full scope of your suggestion. I've added a subsection to DMTN-151 under "Recommendations" to elaborate on this option (pasted below) and am going to make a comment on it in the RFC to inspire further discussion on which option should be pursued.   \subsection{Option: Hierarchical Association} Instead of associating a {\tt DIAObject} with the three extended {\tt Objects} with the lowest separation distance, associate with the three nearest neighbors at three size scales. For example, the nearest $R_e<10"$ neighbor within $d<100"$ (high-$z$ and small galaxies), the nearest $R_e<100"$ neighbor within $d<1000"$ (large low-$z$ galaxies), and the nearest $R_e<1000"$ neighbor within $d<10000"$ (very large nearby galaxies). In this option, the ``nearest neighbor" would be the extended {\tt Object} with the lowest separation distance, as calculated from, e.g., the second moments (Section \ref{ssec:options_mom}). However, this option does not avoid the issue of contamination by background galaxies as discussed in Appendix \ref{sec:appB}. To mitigate background interlopers, the nearest \emph{three} extended sources for each size scale should be included. This might seem unnecessary for the largest size scale but it would assist with identifying transients in galaxy groups and clusters, especially the rare transients with large offsets which might belong to intracluster stellar populations. This option would add unit64 [9] and float [9] to the {\tt DIAObject} catalog and to each alert, instead of unit64 [3] and float [3] . However, it would also vastly reduce the number of extended {\tt Objects} that are considered during the host association process, and would impose a lower computational load. Since 9 potential associations are reported instead of 3, this option would also lower the probability of the failure scenario (in which the true host galaxy is not associated with the {\tt DIAObject}) and increase the amount of contextual information passed in the alert, which would benefit science applications.
            Hide
            mgraham Melissa Graham added a comment -

            Updated RFC-695 and DMTN-151 with a simplified proposal.

            Posted to Community.lsst.org to seek feedback.

            Show
            mgraham Melissa Graham added a comment - Updated RFC-695 and DMTN-151 with a simplified proposal. Posted to Community.lsst.org to seek feedback.
            Hide
            mgraham Melissa Graham added a comment -

            There was one response from the community about the proposal to update the host association parameters was received, and it is favorable: https://community.lsst.org/t/host-galaxy-association-for-lsst-transients-request-for-comments/4812

            Show
            mgraham Melissa Graham added a comment - There was one response from the community about the proposal to update the host association parameters was received, and it is favorable:  https://community.lsst.org/t/host-galaxy-association-for-lsst-transients-request-for-comments/4812
            mgraham Melissa Graham made changes -
            Resolution Done [ 10000 ]
            Status In Progress [ 3 ] Done [ 10002 ]
            Hide
            mgraham Melissa Graham added a comment -

            Adopting RFC-695, implementation in https://jira.lsstcorp.org/browse/DM-29731

            This ticket is Done.

            Show
            mgraham Melissa Graham added a comment - Adopting RFC-695 , implementation in https://jira.lsstcorp.org/browse/DM-29731 This ticket is Done.

              People

              Assignee:
              mgraham Melissa Graham
              Reporter:
              mgraham Melissa Graham
              Watchers:
              Eric Bellm, Leanne Guy, Melissa Graham, Michael Wood-Vasey, Simon Krughoff
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Dates

                Due:
                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.