Uploaded image for project: 'Request For Comments'
  1. Request For Comments
  2. RFC-824

Add spherematch to Rubin conda environment

    XMLWordPrintable

    Details

    • Type: RFC
    • Status: Implemented
    • Resolution: Done
    • Component/s: DM
    • Labels:
      None

      Description

      spherematch is a small library of spherical matching routines. There are two parts to the library. The first part is based on healpix and has some handy memory-saving features that can be useful.

      The second part, which I worked on with Alex Drlica-Wagner and Matthew Becker, is a convenient wrapper around scipy's cKDTree. The key additions are:

      • Transparent wrapping of lon/lat to x/y/z conversions.
      • Default settings for cKDTree for best performance when matching lists of objects within a few arcseconds.
      • Self-matching of objects into clusters with clever sorting to ensure consistent outputs regardless of input order.
      • Can be used as a context manager for no-hassle memory cleanup.

      When using this matcher code for DM-33279, I am able to match all the isolated stars from HSC RC2 tract 9813 (117 visits) in 6 bands in 27 seconds on lsst-devl03, with peak memory usage of 9.7Gb. For DC2 deep tract 4431 (1186 visits) I can match 6 bands in 55 seconds, with peak memory usage of just under 15Gb.

      Note that the relevant code is less than 450 lines of python + tests. Therefore, it would be possible to simply copy this code into the science pipelines ("vendor" the code), which is what I had considered. However, there are considerable advantages to using the version from conda-forge which is being maintained by a team that is active in DESC.

        Attachments

          Issue Links

            Activity

            Hide
            erykoff Eli Rykoff added a comment -

            This is an n-way matcher. But interpreting those can be complicated, of course. For my task the solution to interpretation is to reject all stars that are close neighbors.

            As for the relative case, I know that right now faro explodes when doing matching for these tracts, but that's not quite the same use case so it's not a like-for-like comparison.

            My primary point is that the speed and memory footprint of this routine is sufficient on the tract scale to handle our toughest test datasets without a problem. Dealing with the Galactic plane is TBD.

            Show
            erykoff Eli Rykoff added a comment - This is an n-way matcher. But interpreting those can be complicated, of course. For my task the solution to interpretation is to reject all stars that are close neighbors. As for the relative case, I know that right now faro explodes when doing matching for these tracts, but that's not quite the same use case so it's not a like-for-like comparison. My primary point is that the speed and memory footprint of this routine is sufficient on the tract scale to handle our toughest test datasets without a problem. Dealing with the Galactic plane is TBD.
            Hide
            Parejkoj John Parejko added a comment -

            Ah! We don't have a generally useable n-way matcher in the stack, even for isolated sources. So this sounds like a great addition.

            Show
            Parejkoj John Parejko added a comment - Ah! We don't have a generally useable n-way matcher in the stack, even for isolated sources. So this sounds like a great addition.
            Hide
            ktl Kian-Tat Lim added a comment -

            This seems like something that will be generally useful in all environments and is low-overhead, so no particular objection to its addition.

            15 Gbytes for a full tract is not bad, although it would be nice if we could have a patch-wise algorithm that would use even less. I worry that LSST DDFs may still need an order of magnitude more, and that's if it scales linearly in visits.

            Show
            ktl Kian-Tat Lim added a comment - This seems like something that will be generally useful in all environments and is low-overhead, so no particular objection to its addition. 15 Gbytes for a full tract is not bad, although it would be nice if we could have a patch-wise algorithm that would use even less. I worry that LSST DDFs may still need an order of magnitude more, and that's if it scales linearly in visits.
            Hide
            erykoff Eli Rykoff added a comment -

            In terms of memory and scaling, the 15Gb is for 10 years equivalent of WFD data at high Galactic latitude (1100 visits in 6 bands). If the DDF is 10x as much per tract, then I will need to add the ability for the task to chunk things up by patch (or smaller healpix pixel) instead. I'm also concerned about Galactic plane scaling. However, this is not the fault of spherematch for this RFC, but more of a question for DM-33279 and follow-on tickets.

            Show
            erykoff Eli Rykoff added a comment - In terms of memory and scaling, the 15Gb is for 10 years equivalent of WFD data at high Galactic latitude (1100 visits in 6 bands). If the DDF is 10x as much per tract, then I will need to add the ability for the task to chunk things up by patch (or smaller healpix pixel) instead. I'm also concerned about Galactic plane scaling. However, this is not the fault of spherematch for this RFC, but more of a question for DM-33279 and follow-on tickets.
            Hide
            ktl Kian-Tat Lim added a comment - - edited

            Just checking here: you only intend to use spherematch via Python and not direct C/C++ linkage, correct?

            Show
            ktl Kian-Tat Lim added a comment - - edited Just checking here: you only intend to use spherematch via Python and not direct C/C++ linkage, correct?

              People

              Assignee:
              erykoff Eli Rykoff
              Reporter:
              erykoff Eli Rykoff
              Watchers:
              Colin Slater, Eli Rykoff, Jim Bosch, John Parejko, Kian-Tat Lim, Leanne Guy, Michelle Butler [X] (Inactive), Tim Jenness, Wil O'Mullane, Yusra AlSayyad
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:
                Planned End:

                  Jenkins

                  No builds found.