Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-20837

Develop a plan for cross-matching and/or cross-identification between LSST and external catalogs

    Details

    • Type: Story
    • Status: To Do
    • Resolution: Unresolved
    • Fix Version/s: None
    • Component/s: DM Subsystem Science
    • Labels:
      None
    • Team:
      DM Science

      Description

      In response to LSP Review recommendation 1f (LIT-552)

      Provide baseline "neighbors tables" cross-identifying LSST sources/objects against other major contemporary catalogs, such as the Gaia final data release. These are likely to be required by the commissioning teams, and this will avoid the unnecessary overhead of many users attempting to perform the cross-matching independently. Engage the SCs in defining a suitable baseline.

      DM has committed to study this issue and develop a plan.

      The DM-SST is asked to assign someone to lead this study. The study should consider:

      • purely spatial matching (e.g., identifying the nearest N objects within a maximum angular separation D) vs. "cross-identification" in the sense of making an astrophysical judgement about identity/relatedness;
      • matching to data sources for which this is already required for the purposes of the pipelines themselves (e.g., to Gaia data releases), to ones for which this is required for the purposes of validation / QA / characterization of the LSST data products, and/or to ones which are of purely scientific interest;
      • how the matching might be sourced, e.g., from within project-supported efforts vs. as externally contributed tables; and
      • how the matching might best be represented in database-oriented data products (e.g., in Qserv) and/or in non-database products such as Parquet files.

      The goal of this effort is to develop a plan which can be costed for its impact on DM/Project resources (whether that be just to host the tables, or also to create them).

        Attachments

          Issue Links

            Activity

            Hide
            fritzm Fritz Mueller added a comment -

            In thinking on this a bit over the past few days, one way to generate match tables would be to do so within Qserv itself; after all, it is designed to carry out efficient parallel spatial joins. To wit: ingest director A, ingest director B, engage the machinery for a parallel spatial join, but instead of aggregating and returning the results perform a "turn around ingest" in place on the per-chunk (partitioned) results.

            Such a turn around ingest capability could also be useful in the broader context of user-generated data products. Some complications with generating overlap tables correctly, but seems doable. Spent a little time white-boarding this, and the idea has been loaded into Igor Gaponenko's input hopper as an additional ingredient for ingest sausage...

            Show
            fritzm Fritz Mueller added a comment - In thinking on this a bit over the past few days, one way to generate match tables would be to do so within Qserv itself; after all, it is designed to carry out efficient parallel spatial joins. To wit: ingest director A, ingest director B, engage the machinery for a parallel spatial join, but instead of aggregating and returning the results perform a "turn around ingest" in place on the per-chunk (partitioned) results. Such a turn around ingest capability could also be useful in the broader context of user-generated data products. Some complications with generating overlap tables correctly, but seems doable. Spent a little time white-boarding this, and the idea has been loaded into Igor Gaponenko 's input hopper as an additional ingredient for ingest sausage...
            Hide
            gpdf Gregory Dubois-Felsmann added a comment - - edited

            At a technical level, if the implementation of this capability involves the creation of tables representing N:M relationships between the LSST catalog and external catalogs, sometimes known as "neighbor tables" or "match tables" (and a special case of what are sometimes called "bridge tables" or "join tables" in RDBMS-speak), the Qserv implications of that are addressed in DM-17772, which is an already-pending ticket.

            DM-17772 concerns the successful use of such tables in Qserv.  Their creation (from both algorithmic and production workflow perspectives) is a separate matter.

            It may be possible to perform some tests of this capability between the Gaia and AllWISE catalogs in the 2020 time frame.

            Show
            gpdf Gregory Dubois-Felsmann added a comment - - edited At a technical level, if the implementation of this capability involves the creation of tables representing N:M relationships between the LSST catalog and external catalogs, sometimes known as "neighbor tables" or "match tables" (and a special case of what are sometimes called "bridge tables" or "join tables" in RDBMS-speak), the Qserv implications of that are addressed in DM-17772 , which is an already-pending ticket. DM-17772 concerns the successful use of such tables in Qserv.  Their creation (from both algorithmic and production workflow perspectives) is a separate matter. It may be possible to perform some tests of this capability between the Gaia and AllWISE catalogs in the 2020 time frame.

              People

              • Assignee:
                Unassigned
                Reporter:
                gpdf Gregory Dubois-Felsmann
                Watchers:
                Colin Slater, Fritz Mueller, Gregory Dubois-Felsmann, Leanne Guy
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:

                  Summary Panel