Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-23024

Support multi-dataset single file ingest in daf_butler

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Story Points:
      5
    • Sprint:
      Arch 2019-12-16
    • Team:
      Architecture

      Description

      DECam stores multiple detectors in a single file so we need to be able to do an ingest of a single file and associate it with multiple datasetRefs. This ticket will implement that support.

      The baseline design is:

      • Change FileDataset to allow a list of DatasetRef per dataset.
      • In Butler.ingest add a dataset for each DatasetRef in each FileDataset.
      • In Datastore.ingest register a dataset for each DatasetRef but associate with the same file.
      • In Datastore.get pass the dataId to formatter.read.
      • In Datastore.remove only delete the file if there are no other datasets associated with that file.

      I don't think we need to have a special StorageClass listing the particular dimensions if we pass in the full dataId and let the formatter work out what it needs to know, so we don't need a special datasetType. I will find out when I start to implement.

        Attachments

          Issue Links

            Activity

            Hide
            jbosch Jim Bosch added a comment -

            This does seem even easier than I'd expected, which is good.  I think the big question is whether

            In Datastore.ingest register a dataset for each DatasetRef but associate with the same file.

            requires a new StorageClass so the parent StorageClass can differ from the child one - I thought that was how we represented this case in the Datastore's internal table - but it's entirely possible my recollection of that is just fuzzy.

            Show
            jbosch Jim Bosch added a comment - This does seem even easier than I'd expected, which is good.  I think the big question is whether In Datastore.ingest register a dataset for each DatasetRef but associate with the same file. requires a new StorageClass so the parent StorageClass can differ from the child one - I thought that was how we represented this case in the Datastore's internal table - but it's entirely possible my recollection of that is just fuzzy.
            Hide
            tjenness Tim Jenness added a comment -

            I think the realization I had is that whilst normally you associate a storage class with a specific formatter in the datastore configuration, in the ingest case the tool doing the ingesting overrides the formatter. If you pass in the entire dataId to the formatter.read method then I don't think I need a new StorageClass. I'll try it.

            Show
            tjenness Tim Jenness added a comment - I think the realization I had is that whilst normally you associate a storage class with a specific formatter in the datastore configuration, in the ingest case the tool doing the ingesting overrides the formatter. If you pass in the entire dataId to the formatter.read method then I don't think I need a new StorageClass. I'll try it.
            Hide
            tjenness Tim Jenness added a comment -

            The proposed approach seems to work. I've added some tests that do ingest and I've written a test formatter that can read subsets from YAML files based on dataId and everything seems to work.

            Show
            tjenness Tim Jenness added a comment - The proposed approach seems to work. I've added some tests that do ingest and I've written a test formatter that can read subsets from YAML files based on dataId and everything seems to work.
            Hide
            jbosch Jim Bosch added a comment -

            Only minor comments, but it looks like we've got an extra butler.py (as well as _butler.py) now.

            Show
            jbosch Jim Bosch added a comment - Only minor comments, but it looks like we've got an extra butler.py (as well as _butler.py) now.
            Hide
            tjenness Tim Jenness added a comment -

            John Parejko hopefully I've unblocked you. When you are creating the FileDataset objects for ingest you will need to associate multiple DatasetRef with the single FileDataset.

            Show
            tjenness Tim Jenness added a comment - John Parejko hopefully I've unblocked you. When you are creating the FileDataset objects for ingest you will need to associate multiple DatasetRef with the single FileDataset.

              People

              Assignee:
              tjenness Tim Jenness
              Reporter:
              tjenness Tim Jenness
              Reviewers:
              Jim Bosch
              Watchers:
              Jim Bosch, John Parejko, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  CI Builds

                  No builds found.