Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-4519

Create and implement system where datasets can refer to other datasets and provide access and caching via Butler

    Details

    • Type: Story
    • Status: To Do
    • Resolution: Unresolved
    • Fix Version/s: None
    • Component/s: butler
    • Labels:
      None
    • Templates:
    • Team:
      Data Access and Database

      Description

      Some of our datasets, particularly exposures, have many subcomponents, such as WCSs, PSFs, and even bounding boxes. So far, we've implemented this by considering these to be "pieces" of an exposure to be extracted without loading the full exposure. The current pipeline flow is starting to demonstrate some of the shortcomings of this approach, however:

      The WCS of an exposure can also be logically considered to be part of other datasets, such as the source table derived from that exposure. (The same is true of the PSF, photometric calibration, background, etc., but I'll use WCS as my example here).
      Ubercal creates a new WCS to be associated with each exposure, and while these may be stored separately on disk, we'd like to be able to create a new dataset that would combine the calexp pixel data with the ubercal WCS etc. (See also Dataset Flavors and Versioning, below).
      Some complex serialized objects, such as CoaddPsf themselves contain many other objects, in this case the Psfs and Wcss of the exposures that went into a coadd. Considering that a the CoaddPsf of a neighboring patch will contain many of the same Psfs and Wcss, it's highly desirable to store each of these only once within a data repo, and moreover, to be able to use an existing in-memory object instead of re-reading it from disk and creating a duplicate.

      Possible implementations include:

      1. links embedded in fits files
      2. associations recorded in the registry
      3. other sym link system

        Attachments

          Container Issues

            Issue Links

              Activity

                People

                • Assignee:
                  Unassigned
                  Reporter:
                  npease Nate Pease
                  Watchers:
                  Jacek Becla, Nate Pease
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  2 Start watching this issue

                  Dates

                  • Created:
                    Updated:

                    Summary Panel