Details
-
Type:
Story
-
Status: To Do
-
Resolution: Unresolved
-
Fix Version/s: None
-
Component/s: Design Documents
-
Team:DM Science
Description
Recommendation 9A from the LSP design review report states
The Project should consider including a priority-2 requirement for a mechanism to allow the random sampling of database tables in a reproducible way.
This would indeed be helpful, and is commonly implemented by adding to each table a column containing a random number in some format. For example, the Gaia data model includes random_index, which "contains a random permutation of the numbers from 0 to N-1, where N is the number of sources in the table." I think one could also do random floats between 0 and 1 to get the same effect. I suspect floats would be easier to implement since individual tracts could generate random numbers in parallel, and we wouldn't have to institute a table-scale process of assigning each new random ID.
We should add a similar column to the DPDD and the Object table schema. I don't think there's as much reason to add this ForcedSource. Source is a borderline case but I'd lean towards not needing it there either.
Pinging this ticket