Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-21231

Refactor Registry handling of dataset and associated tables

    XMLWordPrintable

Details

    • 0
    • Data Release Production

    Description

      This is the second half of the big registry overhaul (the big part being DM-17023).  Plans have been described in detail at https://docs.google.com/presentation/d/1KxVmRN_8S4GskyGxEkeoX7tn5c8b9xViy73JU4dkXOA/edit?usp=sharing.

       

      Goals include:

      • Normalizing the dataset table into different tables for each dataset type.  This should improve query performance and give us flexibility in how we store metadata associated with datasets (possibly including regions and timestamps that are currently restricted to dimensions).
      • Restructuring the registry codebase towards supporting eventual chained-schema registries.
      • Enabling bulk inserts of datasets during ingest.  This will require changes to Datastore as well (particularly its relationship with Registry).
      • Addressing performance and concurrency problems in our usage of transactions.

      Whenever possible, I'll try to split this up into smaller tickets.  The sheer size of DM-17023 has become a problem of its own, though I'm not sure how much I could have split it up.  Happily this ticket should have a much smaller effect on public interfaces, though it will still involve some broad breaking changes.

      Attachments

        Issue Links

          Activity

            jbosch Jim Bosch added a comment - - edited

            Note to my future self: some datasets, like raw and (post DM-17023) reference catalogs should have their "one DatasetType+DataId" constraint valid across all collections, not just any one collection, and that means we don't actually need to use collections when retrieving them when we are explicitly asked for that DatasetType.  This is a potentially important optimization that we should find a way to take advantage of somehow.

            jbosch Jim Bosch added a comment - - edited Note to my future self: some datasets, like raw and (post DM-17023 ) reference catalogs should have their "one DatasetType+DataId" constraint valid across all collections, not just any one collection, and that means we don't actually need to use collections when retrieving them when we are explicitly asked for that DatasetType .  This is a potentially important optimization that we should find a way to take advantage of somehow.
            jbosch Jim Bosch added a comment -

            I'm zeroing out the story points here to reflect the fact that I expect to do all work on links tickets.

            jbosch Jim Bosch added a comment - I'm zeroing out the story points here to reflect the fact that I expect to do all work on links tickets.
            jbosch Jim Bosch added a comment -

            I've removed the tickets here related to StorageClass metadata (which is not a near-term goal) and added the CALIBRATION-type collections one; that makes this ticket an umbrella for everything described on the prototyping confluence page, and something we could expect to close in the not-too-distant future.

            jbosch Jim Bosch added a comment - I've removed the tickets here related to StorageClass metadata (which is not a near-term goal) and added the CALIBRATION-type collections one; that makes this ticket an umbrella for everything described on the prototyping confluence page , and something we could expect to close in the not-too-distant future.
            jbosch Jim Bosch added a comment -

            I've removed DM-24613 (Quantum tables rework) as we decided to not make that a Gen2 deprecation project, and am now closing this umbrella ticket with its last other outstanding child ticket, DM-24432.

            jbosch Jim Bosch added a comment - I've removed DM-24613 (Quantum tables rework) as we decided to not make that a Gen2 deprecation project, and am now closing this umbrella ticket with its last other outstanding child ticket, DM-24432 .

            People

              jbosch Jim Bosch
              jbosch Jim Bosch
              Jim Bosch
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Jenkins

                  No builds found.