Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-35396

Investigate writing butler metadata to output files

    XMLWordPrintable

    Details

    • Type: Story
    • Status: To Do
    • Resolution: Unresolved
    • Fix Version/s: None
    • Component/s: obs_base
    • Labels:
    • Story Points:
      4
    • Team:
      Architecture
    • Urgent?:
      No

      Description

      Gregory Dubois-Felsmann has requested that we look into writing butler metadata to datasets. The specific request was to include the dataset type since if a file is renamed there is no way for someone who downloads a FITS file from the portal to really know what it came from.

      Without provenance the metadata available to Datastore is a DatasetRef (uuid, run, dataId, datasetType). Formatters currently only have access to the dataId so the API for formatters would have to change to get the other information.

      The next issue is to work out how to get the Butler items into the file. We should start with the specific problem of FITS files and Exposure.writeFits. The simplest approach is to add some HIERARCH BUTLER headers directly into the existing metadata and call writeFits. Jim Bosch is a bit worried that this changes the actual Exposure and if someone does butler.put from a notebook they might not expect their object to have changed. The alternative is to allow Exposure.writeFits to take extra metadata that will be added on write only.

        Attachments

          Activity

          Hide
          gpdf Gregory Dubois-Felsmann added a comment -

          With regard to the dataset type in particular, there are both short-term and long-term concerns here - one is just to make sure that files from the production system have some hint of what they were intended to be when created, just for casual-user-awareness purposes.

          The other is to make sure that, in the very long run, the file artifacts - which, I hope, will outlive us all - can be connected to documentation of the detailed data format long after the pipelines stack may no longer be maintained / maintainable. Someday - I realize that's not now, and probably not realistically a construction deliverable, there will have to be human-readable documentation of what the non-image extensions represent, mathematically, enough to allow them to be used by a sufficiently motivated scientist interested in (at least) long-term variability, even if all other value of the dataset has been superseded by later projects' work.

          That's not how we want users to think about these things now - we really, really, want them to use afw to do anything interesting with that additional data, but the operations organization will have a duty to make sure that generational curation of our dataset is possible.

          Show
          gpdf Gregory Dubois-Felsmann added a comment - With regard to the dataset type in particular, there are both short-term and long-term concerns here - one is just to make sure that files from the production system have some hint of what they were intended to be when created, just for casual-user-awareness purposes. The other is to make sure that, in the very long run, the file artifacts - which, I hope, will outlive us all - can be connected to documentation of the detailed data format long after the pipelines stack may no longer be maintained / maintainable. Someday - I realize that's not now, and probably not realistically a construction deliverable, there will have to be human-readable documentation of what the non-image extensions represent, mathematically, enough to allow them to be used by a sufficiently motivated scientist interested in (at least) long-term variability, even if all other value of the dataset has been superseded by later projects' work. That's not how we want users to think about these things now - we really, really, want them to use afw to do anything interesting with that additional data, but the operations organization will have a duty to make sure that generational curation of our dataset is possible.

            People

            Assignee:
            tjenness Tim Jenness
            Reporter:
            tjenness Tim Jenness
            Watchers:
            Gregory Dubois-Felsmann, Tim Jenness
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated:

                Jenkins Builds

                No builds found.