Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-12823

Clarify how the Source table will be sharded in Qserv

    Details

    • Team:
      DM Science

      Description

      A discussion with Fritz Mueller recently uncovered a (rather small) issue in our data model.

      The definition of the Source table in the DPDD (LSE-163) states that there is not required to be an Object for every Source:

      Name Type Unit Description
      objectId uint64   ID of the Object this Source was associated with, if any.

      In formal terms, this violates the Qserv sharding model for the DRP tables, in which the spatial sharding is "directed" by the ra/dec from the Object table, so that Sources and ForcedSources always end up in the same shard as their corresponding Object.

      There are several possible solutions to this. Two that occurred to us immediately are:

      1. Allow a fallback strategy in which orphaned Sources are sharded by their own ra/dec.
      2. Create a "fake Object" for each orphaned Source, with an appropriate value set in its flags attribute and with an ra/dec taken from the Source, and thereby enforce a requirement that, formally, every Source has an Object.

      Note that both of these have the same effect in terms of which shard an orphaned Source ends up in.

      From a database design point of view the second approach may be simpler, as it allows the statement "Object is the director table for Source" to have a more uniform meaning.

      Intuitively I'd expect orphaned Sources to be rare and usually of limited scientific interest (e.g., compared to the corresponding DIASource which would likely exist if it were a real astrophysical single-epoch detection), but the formal problem needs to have a solution documented.

      (There are similar but somewhat more complex questions associated with the DRP DIA* tables. I'll post separately about that.)

      I'm temporarily assigning this to myself as LSP Scientist, but ultimately this needs to be written up in an update to the database design.

        Attachments

          Activity

          Hide
          gpdf Gregory Dubois-Felsmann added a comment -

          A discussion today on Slack:#dm-hsc-reprocessing revealed that this may be a somewhat larger issue than I'd thought - it is possible that there may be a substantial quantity of unassociated Sources.

          Jim Bosch and I will bring this to SST attention so that we can clarify what the long-term baseline plan actually is.

          In the mean time, in the context of the S18 goal of loading outputs of the HSC reprocessing into Qserv: no Source - Object associations will be performed at all on this dataset, so they'll have to be treated as independently sharded tables.

          Show
          gpdf Gregory Dubois-Felsmann added a comment - A discussion today on Slack:#dm-hsc-reprocessing revealed that this may be a somewhat larger issue than I'd thought - it is possible that there may be a substantial quantity of unassociated Sources . Jim Bosch and I will bring this to SST attention so that we can clarify what the long-term baseline plan actually is. In the mean time, in the context of the S18 goal of loading outputs of the HSC reprocessing into Qserv: no Source - Object associations will be performed at all on this dataset , so they'll have to be treated as independently sharded tables.
          Hide
          swinbank John Swinbank added a comment -

          I'm confused about which team is supposed to be responsible for this. Since Gregory Dubois-Felsmann has assigned himself to shepherd the work (at least for now), I'm going with SUIT.

          Show
          swinbank John Swinbank added a comment - I'm confused about which team is supposed to be responsible for this. Since Gregory Dubois-Felsmann has assigned himself to shepherd the work (at least for now), I'm going with SUIT.
          Hide
          gpdf Gregory Dubois-Felsmann added a comment -

          The first action here, in reviving this ticket, may be to determine whether this issue applies for the F19 loading of HSC processing outputs.

          Show
          gpdf Gregory Dubois-Felsmann added a comment - The first action here, in reviving this ticket, may be to determine whether this issue applies for the F19 loading of HSC processing outputs.

            People

            • Assignee:
              ctslater Colin Slater
              Reporter:
              gpdf Gregory Dubois-Felsmann
              Watchers:
              Colin Slater, Fritz Mueller, Gregory Dubois-Felsmann, John Swinbank, Kian-Tat Lim, Robert Lupton, Zeljko Ivezic
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:

                Summary Panel