Details
-
Type:
RFC
-
Status: Implemented
-
Resolution: Done
-
Component/s: DM
-
Labels:None
Description
Currently dataset IDs in butler registries are stored as auto-incrementing integers. This works fine for a standalone registry that will never receive datasets from other registries.
The middleware team would like to change the dataset ID to instead use a UUID. This is required to simplify the change we are making to batch processing where batch jobs are given a prepopulated SQLite registry and at the end of the processing the new datasets are merged into the new registry. This process is simplified significantly if the UUIDs generated by the batch job are retained during the merge with the main registry.
This UUID system will also allow us to ingest raw files predictably such that a UUID in a registry in the OODS (or any other registry) matches that at the data facility even though the file has been ingested independently.
Since this requires a schema change the RFC will be flagged to DMCCB. The UUID code is implemented and we are currently working on migration scripts. We would want to change over the main NCSA and IDF repositories to enable them to make use of the UUID features.
Attachments
Issue Links
- is blocked by
-
RFC-779 Add alembic to conda rubin-env
- Implemented
- is triggering
-
DM-29593 Design migration system for data repositories
- Done
-
DM-30186 Implement UUID schema migration
- Done
-
DM-30316 Write UUID migration script for sqlite
- Done
- relates to
-
RFC-778 Determine UUID5 scheme to use for raw data ingest
- Implemented
This is the first schema change request made to DMCCB. We may want to consider how we approach these in the future. For example, should a schema change require two RFCs? One to approve that a change is sufficiently important that it should be worked on at all, and the second to give a timeline for deployment?