Details
-
Type:
Story
-
Status: Done
-
Resolution: Done
-
Fix Version/s: None
-
Component/s: daf_butler
-
Labels:
-
Templates:customfield_11800 458930
-
Story Points:12
-
Epic Link:
-
Sprint:DB_S21_12
-
Team:Data Access and Database
-
Urgent?:No
Description
Syncing registries from different sources (such as the lightweight registry from workflows) will be simplified significantly if we switch from autoincrementing integers to UUIDs for our datasets.
Some things to consider:
- Where are UUIDs calculated? By Registry?
- Should we allow UUIDs to be calculated from a datasetRef (dataset type / dataId and run name). This could be an alternative implementation of
DM-21794, allowing raw data to have predictable IDs. - Should we allow an external source for UUID?
- Is there a case for datastore to allocate IDs itself which can then be used by registry?
Preparing a rebase I'm trying to see how to make pydantic handle UUID. Something is very odd, not sure how it can behave that way:
>>> from pydantic import *
>>> from uuid import *
>>> from typing import *
>>> class SerializedDatasetRef(BaseModel):
... id: Optional[Union[int, UUID]] = None
... run: str = ""
...
# this looks good
>>> SerializedDatasetRef(id=1, run="RUN")
SerializedDatasetRef(id=1, run='RUN')
# this is odd
>>> SerializedDatasetRef(id=uuid4(), run="RUN")
SerializedDatasetRef(id=167854762555046396793552098261208053632, run='RUN')
>>> SerializedDatasetRef(id=uuid4(), run="RUN").json()
'{"id": 180405824369238067234879699192746726284, "run": "RUN"}'
# this works but WTH
>>> SerializedDatasetRef(id=str(uuid4()), run="RUN")
SerializedDatasetRef(id=UUID('ffa1f977-0d31-44a2-b281-17d0a30151d0'), run='RUN')
It looks like it decides for some reason to convert UUID to int