Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-20230

Add random index column to DPDD/DB Schema


    • Templates:
    • Team:
      DM Science


      Recommendation 9A from the LSP design review report states

      The Project should consider including a priority-2 requirement for a mechanism to allow the random sampling of database tables in a reproducible way.

      This would indeed be helpful, and is commonly implemented by adding to each table a column containing a random number in some format. For example, the Gaia data model includes random_index, which "contains a random permutation of the numbers from 0 to N-1, where N is the number of sources in the table." I think one could also do random floats between 0 and 1 to get the same effect. I suspect floats would be easier to implement since individual tracts could generate random numbers in parallel, and we wouldn't have to institute a table-scale process of assigning each new random ID.

      We should add a similar column to the DPDD and the Object table schema. I don't think there's as much reason to add this ForcedSource. Source is a borderline case but I'd lean towards not needing it there either.




            • Assignee:
              ctslater Colin Slater
              ctslater Colin Slater
              Andy Salnikov, Colin Slater, Fritz Mueller, Gregory Dubois-Felsmann, Kian-Tat Lim, Leanne Guy
            • Votes:
              0 Vote for this issue
              6 Start watching this issue


              • Created:

                Summary Panel