Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-29196

Use UUIDs as dataset_ids in registry

    XMLWordPrintable

    Details

    • Story Points:
      12
    • Sprint:
      DB_S21_12
    • Team:
      Data Access and Database
    • Urgent?:
      No

      Description

      Syncing registries from different sources (such as the lightweight registry from workflows) will be simplified significantly if we switch from autoincrementing integers to UUIDs for our datasets.

      Some things to consider:

      • Where are UUIDs calculated? By Registry?
      • Should we allow UUIDs to be calculated from a datasetRef (dataset type / dataId and run name). This could be an alternative implementation of DM-21794, allowing raw data to have predictable IDs.
      • Should we allow an external source for UUID?
      • Is there a case for datastore to allocate IDs itself which can then be used by registry?

        Attachments

          Issue Links

            Activity

            Hide
            salnikov Andy Salnikov added a comment -

            Preparing a rebase I'm trying to see how to make pydantic handle UUID. Something is very odd, not sure how it can behave that way:

            >>> from pydantic import *
            >>> from uuid import *
            >>> from typing import *
            >>> class SerializedDatasetRef(BaseModel):
            ...     id: Optional[Union[int, UUID]] = None
            ...     run: str = ""
            ...
             
            # this looks good
            >>> SerializedDatasetRef(id=1, run="RUN")
            SerializedDatasetRef(id=1, run='RUN')
             
            # this is odd
            >>> SerializedDatasetRef(id=uuid4(), run="RUN")
            SerializedDatasetRef(id=167854762555046396793552098261208053632, run='RUN')
            >>> SerializedDatasetRef(id=uuid4(), run="RUN").json()
            '{"id": 180405824369238067234879699192746726284, "run": "RUN"}'
             
            # this works but WTH
            >>> SerializedDatasetRef(id=str(uuid4()), run="RUN")
            SerializedDatasetRef(id=UUID('ffa1f977-0d31-44a2-b281-17d0a30151d0'), run='RUN')
            

            It looks like it decides for some reason to convert UUID to int

            Show
            salnikov Andy Salnikov added a comment - Preparing a rebase I'm trying to see how to make pydantic handle UUID. Something is very odd, not sure how it can behave that way: >>> from pydantic import * >>> from uuid import * >>> from typing import * >>> class SerializedDatasetRef(BaseModel): ... id: Optional[Union[int, UUID]] = None ... run: str = "" ...   # this looks good >>> SerializedDatasetRef(id=1, run="RUN") SerializedDatasetRef(id=1, run='RUN')   # this is odd >>> SerializedDatasetRef(id=uuid4(), run="RUN") SerializedDatasetRef(id=167854762555046396793552098261208053632, run='RUN') >>> SerializedDatasetRef(id=uuid4(), run="RUN").json() '{"id": 180405824369238067234879699192746726284, "run": "RUN"}'   # this works but WTH >>> SerializedDatasetRef(id=str(uuid4()), run="RUN") SerializedDatasetRef(id=UUID('ffa1f977-0d31-44a2-b281-17d0a30151d0'), run='RUN') It looks like it decides for some reason to convert UUID to int
            Hide
            salnikov Andy Salnikov added a comment - - edited

            Looks like there are about 10 bug reports filed for this "feature", e.g. https://github.com/samuelcolvin/pydantic/issues/2135.

            Show
            salnikov Andy Salnikov added a comment - - edited Looks like there are about 10 bug reports filed for this "feature", e.g. https://github.com/samuelcolvin/pydantic/issues/2135 .
            Hide
            tjenness Tim Jenness added a comment -

            Does StrictUnion help?

            Show
            tjenness Tim Jenness added a comment - Does StrictUnion help?
            Hide
            salnikov Andy Salnikov added a comment -

            I don't see StrictUnion in our pydantic version (which is 1.8.1), it could appear in 1.9. It will probably help, but current workaround is to re-order types in the Union: Union[uuid.UUID, int]. I'm still testing but it looks like this fixes this particular issue.

            Show
            salnikov Andy Salnikov added a comment - I don't see StrictUnion in our pydantic version (which is 1.8.1), it could appear in 1.9. It will probably help, but current workaround is to re-order types in the Union: Union [uuid.UUID, int] . I'm still testing but it looks like this fixes this particular issue.
            Hide
            salnikov Andy Salnikov added a comment -

            Thanks for all comments, answers and reviews! Finally merged both daf_butler and ci_hsc_gen3. Jenkins is happy and I did lots of tests of ci_hsc_gen3 with sqlite/postgres and both UUIDs and integers.

            Show
            salnikov Andy Salnikov added a comment - Thanks for all comments, answers and reviews! Finally merged both daf_butler and ci_hsc_gen3. Jenkins is happy and I did lots of tests of ci_hsc_gen3 with sqlite/postgres and both UUIDs and integers.

              People

              Assignee:
              salnikov Andy Salnikov
              Reporter:
              tjenness Tim Jenness
              Reviewers:
              Jim Bosch
              Watchers:
              Andy Salnikov, Brian Yanny, Jim Bosch, Michelle Gower, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  CI Builds

                  No builds found.