Uploaded image for project: 'Request For Comments'
  1. Request For Comments
  2. RFC-777

Change dataset ID type in butler registries to a UUID

    XMLWordPrintable

    Details

    • Type: RFC
    • Status: Implemented
    • Resolution: Done
    • Component/s: DM
    • Labels:
      None

      Description

      Currently dataset IDs in butler registries are stored as auto-incrementing integers. This works fine for a standalone registry that will never receive datasets from other registries.

      The middleware team would like to change the dataset ID to instead use a UUID. This is required to simplify the change we are making to batch processing where batch jobs are given a prepopulated SQLite registry and at the end of the processing the new datasets are merged into the new registry. This process is simplified significantly if the UUIDs generated by the batch job are retained during the merge with the main registry.

      This UUID system will also allow us to ingest raw files predictably such that a UUID in a registry in the OODS (or any other registry) matches that at the data facility even though the file has been ingested independently.

      Since this requires a schema change the RFC will be flagged to DMCCB. The UUID code is implemented and we are currently working on migration scripts. We would want to change over the main NCSA and IDF repositories to enable them to make use of the UUID features.

        Attachments

          Issue Links

            Activity

            Hide
            tjenness Tim Jenness added a comment -

            This is the first schema change request made to DMCCB. We may want to consider how we approach these in the future. For example, should a schema change require two RFCs? One to approve that a change is sufficiently important that it should be worked on at all, and the second to give a timeline for deployment?

            Show
            tjenness Tim Jenness added a comment - This is the first schema change request made to DMCCB. We may want to consider how we approach these in the future. For example, should a schema change require two RFCs? One to approve that a change is sufficiently important that it should be worked on at all, and the second to give a timeline for deployment?
            Hide
            ktl Kian-Tat Lim added a comment -

            In the future, the DM-CCB agreed that changes that are expected to be user-visible and user-breaking should have an RFC written early, before code development advances very far, but changes that are primarily implementation-oriented and that have good forward- and backward-compatibility stories (like this one) can wait until closer to merge/deploy.

            Show
            ktl Kian-Tat Lim added a comment - In the future, the DM-CCB agreed that changes that are expected to be user-visible and user-breaking should have an RFC written early, before code development advances very far, but changes that are primarily implementation-oriented and that have good forward- and backward-compatibility stories (like this one) can wait until closer to merge/deploy.

              People

              Assignee:
              tjenness Tim Jenness
              Reporter:
              tjenness Tim Jenness
              Watchers:
              Andy Salnikov, Colin Slater, Jim Bosch, John Parejko, Kian-Tat Lim, Leanne Guy, Michelle Butler [X] (Inactive), Michelle Gower, Tim Jenness, Wil O'Mullane, Yusra AlSayyad
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:
                Planned End:

                  Jenkins

                  No builds found.