Details
-
Type:
Story
-
Status: Done
-
Resolution: Done
-
Fix Version/s: None
-
Component/s: Requirements Documents
-
Team:Architecture
Description
In DM-4597 we agreed on a new DM requirement that each data release should include a cross-match table mapping Objects in the current data release to each of the previous data releases. This has now been included in LCR-962 and a request has been made to estimate the impact on staff effort in construction, and the sizing model.
I'd prefer if we baselined as simple a matching algorithm as possible. In that case, the additional storage would be at most storing a subset of the columns of the Object tables from previous DRs (if they're not already available) as well as the storage for the match tables themselves, which should be no more than a few TB (worst case I can think of is 10 DRs, 3 potential matches, 8 byte ids and 4 byte match probabilities, 40G objects = 14 TB).