Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-20230

Add random index column to DPDD/DB Schema

    Details

    • Templates:
    • Team:
      DM Science

      Description

      Recommendation 9A from the LSP design review report states

      The Project should consider including a priority-2 requirement for a mechanism to allow the random sampling of database tables in a reproducible way.

      This would indeed be helpful, and is commonly implemented by adding to each table a column containing a random number in some format. For example, the Gaia data model includes random_index, which "contains a random permutation of the numbers from 0 to N-1, where N is the number of sources in the table." I think one could also do random floats between 0 and 1 to get the same effect. I suspect floats would be easier to implement since individual tracts could generate random numbers in parallel, and we wouldn't have to institute a table-scale process of assigning each new random ID.

      We should add a similar column to the DPDD and the Object table schema. I don't think there's as much reason to add this ForcedSource. Source is a borderline case but I'd lean towards not needing it there either.

        Attachments

          Activity

            People

            • Assignee:
              ctslater Colin Slater
              Reporter:
              ctslater Colin Slater
              Watchers:
              Andy Salnikov, Colin Slater, Fritz Mueller, Gregory Dubois-Felsmann, Kian-Tat Lim, Leanne Guy
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:

                Summary Panel